See everything your voice agent does. Change zero lines of code.
A voice agent with full observability. Composite Voice handles the STT/LLM/TTS pipeline, Tapes captures every API call at the network layer. No SDK instrumentation, no code changes to the voice pipeline — just a reverse proxy that records everything.
┌──────────────────────────────────────┐
│ Browser │
│ │
│ Composite Voice SDK │
│ DeepgramSTT ─ AnthropicLLM ─ DeepgramTTS
│ │ │ │ │
└───────┼──────────────┼────────────┼──┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────┐
│ Express Server (:3010) │
│ │
│ /proxy/deepgram /proxy/anthropic│
│ (direct) (via Tapes proxy) │
└───────┬──────────────┬───────────────┘
│ │
▼ ▼
Deepgram API Tapes Proxy (:8090)
(STT + TTS) ┌──────────────┐
│ records to │
│ SQLite │
└──────┬───────┘
▼
Anthropic API
(Claude Haiku)
You speak ──► Deepgram STT ──► Anthropic LLM ──► Deepgram TTS ──► You hear
(Nova-3) (Haiku 4.5) (Aura-2)
WebSocket HTTP/SSE WebSocket
direct via Tapes direct
│
▼
.tapes/traces.db
tokens, latency, req/res
This project uses a local .tapes/ directory. Voice traces stay separate from your coding agent telemetry.
~/.tapes/ .tapes/ (this project)
├── config.toml ├── config.toml
├── tapes.sqlite ├── traces.db
│ ├── memory.json
│ Global instance ├── observations.json
│ Coding agents └── usage.json
│ Port :8080
│ Voice agent only
│ Separate databases. Port :8090
npm install| Provider | Where | What you need |
|---|---|---|
| Deepgram | console.deepgram.com | API key for STT (Nova-3) and TTS (Aura-2) |
| Anthropic | console.anthropic.com | API key for Claude Haiku 4.5 |
No account on Deepgram yet? Try the Deepgram Playground first.
cp .env.example .envAdd your keys to .env:
DEEPGRAM_API_KEY=your-deepgram-key
ANTHROPIC_API_KEY=your-anthropic-key
If your Anthropic key is already exported in your shell (~/.zshrc), you can skip it in .env.
npm startThree processes start:
| Process | Port | Role |
|---|---|---|
| Tapes proxy | :8090 |
Reverse proxy — records Anthropic calls to SQLite |
| Tapes deck | :8888 |
Dashboard API for session history |
| Express app | :3010 |
Voice proxy + frontend |
Open http://localhost:3010 in Chrome and click the mic.
The right panel has five tabs:
Per-provider breakdown of where time goes. The critical metric is Speech → LLM — the silence between you finishing talking and the agent starting to respond.
┌─────────────────────────────────────────┐
│ STT processing ████████ 120ms │
│ LLM generation ██████████████ 480ms │
│ TTS synthesis ██████ 95ms │
└─────────────────────────────────────────┘
Real token counts parsed from the Anthropic SSE stream. Shows input/output split and per-turn cost at Haiku pricing ($0.80/MTok input, $4.00/MTok output). Cost is per-session — clearing the session resets it.
Persistent memory the agent builds about you. Extracted two ways:
- Heuristic — server scans for patterns like "my name is X", "I work at Y"
- LLM-tagged — model outputs
[MEMORY: ...]tags (stripped before speech)
Memories inject into the system prompt on every turn. Persist to .tapes/memory.json.
Automatic detection from pipeline data:
- Slow pipeline — full round trip > 2s
- Latency spikes — turns above mean + 1.5 stddev
- Token growth — context window expanding between turns
- Cross-session regression — avg latency worse than last session
Raw request log of every API call.
| File | Contents |
|---|---|
.tapes/traces.db |
Anthropic API calls recorded by Tapes (SQLite) |
.tapes/memory.json |
Agent memories about the user |
.tapes/observations.json |
Heuristic observations from usage patterns |
.tapes/usage.json |
Per-session token counts, costs, and timing |
.tapes/config.toml |
Tapes proxy configuration |
# Count recorded calls
sqlite3 .tapes/traces.db "SELECT count(*) FROM nodes"
# Token usage by model
sqlite3 .tapes/traces.db \
"SELECT model, count(*), sum(prompt_tokens), sum(completion_tokens)
FROM nodes GROUP BY model"
# Tapes TUI dashboard
tapes deck -s .tapes/traces.db
# Tapes web dashboard
tapes deck --web -s .tapes/traces.db| Component | Role |
|---|---|
| Composite Voice SDK | Voice pipeline — DeepgramSTT + AnthropicLLM + DeepgramTTS |
| Tapes | Reverse proxy telemetry — records API calls to SQLite |
| Express | Server-side proxy adapter with memory injection |
| Vanilla HTML/JS | No build step frontend |
Zero changes to the Composite Voice SDK. The browser SDK points at the Express proxy (/proxy/anthropic). The Express server has middleware that forwards those requests through the Tapes proxy instead of directly to Anthropic:
// Browser — standard Composite Voice config
new AnthropicLLM({ proxyUrl: '/proxy/anthropic' })
// Server — intercepts and routes through Tapes
app.use('/proxy/anthropic', async (req, res) => {
const targetUrl = `http://localhost:8090${req.url}`; // Tapes proxy
// ... forward request, extract tokens from SSE stream
});The Tapes proxy transparently forwards to api.anthropic.com and records the request/response pair. The voice pipeline has no idea Tapes exists.
- WALKTHROUGH.md — detailed step-by-step guide with troubleshooting