Overview
ElevenLabs provides two STT service implementations:ElevenLabsSTTService(HTTP) — File-based transcription using ElevenLabs’ Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.ElevenLabsRealtimeSTTService(WebSocket) — Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.
ElevenLabs STT API Reference
Pipecat’s API methods for ElevenLabs STT integration
Example Implementation
Complete example with ElevenLabs STT and TTS
ElevenLabs Documentation
Official ElevenLabs STT API documentation
ElevenLabs Platform
Access API keys and speech-to-text models
Installation
To use ElevenLabs STT services, install the required dependencies:Prerequisites
ElevenLabs Account Setup
Before using ElevenLabs STT services, you need:- ElevenLabs Account: Sign up at ElevenLabs Platform
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure access to the Scribe v2 transcription model (default:
scribe_v2) - HTTP Session: Configure aiohttp session for file uploads (HTTP service only)
Required Environment Variables
ELEVENLABS_API_KEY: Your ElevenLabs API key for authentication
Configuration
ElevenLabsSTTService
ElevenLabs API key for authentication.
An aiohttp session for HTTP requests. You must create and manage this yourself.
Base URL for the ElevenLabs API.
Model ID for transcription.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Configuration parameters for the STT service. See InputParams below.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
ElevenLabsRealtimeSTTService
ElevenLabs API key for authentication.
Base URL for the ElevenLabs WebSocket API.
Model ID for real-time transcription.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Configuration parameters for the Realtime STT service. See Realtime InputParams below.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
InputParams
Parameters forElevenLabsSTTService, passed via the params constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | None | Target language for transcription. |
tag_audio_events | bool | True | Include audio events like (laughter), (coughing) in transcription. |
Realtime InputParams
Parameters forElevenLabsRealtimeSTTService, passed via the params constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
language_code | str | None | ISO-639-1 or ISO-639-3 language code. None for auto-detection. |
commit_strategy | CommitStrategy | CommitStrategy.MANUAL | How to segment speech: "manual" (Pipecat VAD) or "vad" (ElevenLabs VAD). |
vad_silence_threshold_secs | float | None | Seconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy. |
vad_threshold | float | None | VAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy. |
min_speech_duration_ms | int | None | Minimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy. |
min_silence_duration_ms | int | None | Minimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy. |
include_timestamps | bool | False | Include word-level timestamps in transcripts. |
enable_logging | bool | False | Enable logging on ElevenLabs’ side. |
include_language_detection | bool | False | Include language detection in transcripts. |
Usage
Basic HTTP Setup
HTTP with Language and Audio Events
Realtime WebSocket Setup
Realtime with Timestamps and Custom Commit Strategy
Notes
- HTTP vs Realtime: The HTTP service (
ElevenLabsSTTService) uploads complete audio segments and is best for VAD-segmented transcription. The Realtime service (ElevenLabsRealtimeSTTService) streams audio over WebSocket for lower latency and provides interim transcripts. - Commit strategies: The Realtime service defaults to
manualcommit strategy, where Pipecat’s VAD controls when transcription segments are committed. Setcommit_strategy=CommitStrategy.VADto let ElevenLabs handle segment boundaries. - Keepalive: The Realtime service sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
- Auto-reconnect: The Realtime service automatically reconnects if the WebSocket connection is closed when new audio arrives.
Event Handlers
ElevenLabsRealtimeSTTService supports the standard service connection events:
| Event | Description |
|---|---|
on_connected | Connected to ElevenLabs Realtime STT WebSocket |
on_disconnected | Disconnected from ElevenLabs Realtime STT WebSocket |
The HTTP service (
ElevenLabsSTTService) does not have connection events since it uses per-request HTTP calls.