Skip to main content

Overview

ElevenLabs provides two STT service implementations:
  • ElevenLabsSTTService (HTTP) — File-based transcription using ElevenLabs’ Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.
  • ElevenLabsRealtimeSTTService (WebSocket) — Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.

Installation

To use ElevenLabs STT services, install the required dependencies:
pip install "pipecat-ai[elevenlabs]"

Prerequisites

ElevenLabs Account Setup

Before using ElevenLabs STT services, you need:
  1. ElevenLabs Account: Sign up at ElevenLabs Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the Scribe v2 transcription model (default: scribe_v2)
  4. HTTP Session: Configure aiohttp session for file uploads (HTTP service only)

Required Environment Variables

  • ELEVENLABS_API_KEY: Your ElevenLabs API key for authentication

Configuration

ElevenLabsSTTService

api_key
str
required
ElevenLabs API key for authentication.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
Base URL for the ElevenLabs API.
model
str
default:"scribe_v2"
Model ID for transcription.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Configuration parameters for the STT service. See InputParams below.
ttfs_p99_latency
float
default:"ELEVENLABS_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

ElevenLabsRealtimeSTTService

api_key
str
required
ElevenLabs API key for authentication.
base_url
str
default:"api.elevenlabs.io"
Base URL for the ElevenLabs WebSocket API.
model
str
default:"scribe_v2_realtime"
Model ID for real-time transcription.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
RealtimeInputParams
default:"None"
Configuration parameters for the Realtime STT service. See Realtime InputParams below.
ttfs_p99_latency
float
default:"ELEVENLABS_REALTIME_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

InputParams

Parameters for ElevenLabsSTTService, passed via the params constructor argument.
ParameterTypeDefaultDescription
languageLanguageNoneTarget language for transcription.
tag_audio_eventsboolTrueInclude audio events like (laughter), (coughing) in transcription.

Realtime InputParams

Parameters for ElevenLabsRealtimeSTTService, passed via the params constructor argument.
ParameterTypeDefaultDescription
language_codestrNoneISO-639-1 or ISO-639-3 language code. None for auto-detection.
commit_strategyCommitStrategyCommitStrategy.MANUALHow to segment speech: "manual" (Pipecat VAD) or "vad" (ElevenLabs VAD).
vad_silence_threshold_secsfloatNoneSeconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy.
vad_thresholdfloatNoneVAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy.
min_speech_duration_msintNoneMinimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy.
min_silence_duration_msintNoneMinimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy.
include_timestampsboolFalseInclude word-level timestamps in transcripts.
enable_loggingboolFalseEnable logging on ElevenLabs’ side.
include_language_detectionboolFalseInclude language detection in transcripts.

Usage

Basic HTTP Setup

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsSTTService

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
    )

HTTP with Language and Audio Events

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsSTTService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
        params=ElevenLabsSTTService.InputParams(
            language=Language.ES,
            tag_audio_events=False,
        ),
    )

Realtime WebSocket Setup

from pipecat.services.elevenlabs import ElevenLabsRealtimeSTTService

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
)

Realtime with Timestamps and Custom Commit Strategy

from pipecat.services.elevenlabs import ElevenLabsRealtimeSTTService
from pipecat.services.elevenlabs.stt import CommitStrategy

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    params=ElevenLabsRealtimeSTTService.InputParams(
        language_code="eng",
        commit_strategy=CommitStrategy.VAD,
        vad_silence_threshold_secs=1.0,
        include_timestamps=True,
    ),
)

Notes

  • HTTP vs Realtime: The HTTP service (ElevenLabsSTTService) uploads complete audio segments and is best for VAD-segmented transcription. The Realtime service (ElevenLabsRealtimeSTTService) streams audio over WebSocket for lower latency and provides interim transcripts.
  • Commit strategies: The Realtime service defaults to manual commit strategy, where Pipecat’s VAD controls when transcription segments are committed. Set commit_strategy=CommitStrategy.VAD to let ElevenLabs handle segment boundaries.
  • Keepalive: The Realtime service sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
  • Auto-reconnect: The Realtime service automatically reconnects if the WebSocket connection is closed when new audio arrives.

Event Handlers

ElevenLabsRealtimeSTTService supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs Realtime STT WebSocket
on_disconnectedDisconnected from ElevenLabs Realtime STT WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs Realtime STT")
The HTTP service (ElevenLabsSTTService) does not have connection events since it uses per-request HTTP calls.