ElevenLabs

Overview

ElevenLabs provides two STT service implementations:

ElevenLabsSTTService (HTTP) — File-based transcription using ElevenLabs’ Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.
ElevenLabsRealtimeSTTService (WebSocket) — Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.

ElevenLabs STT API Reference

Pipecat’s API methods for ElevenLabs STT integration

Example Implementation

Complete example with ElevenLabs STT and TTS

ElevenLabs Documentation

Official ElevenLabs STT API documentation

ElevenLabs Platform

Access API keys and speech-to-text models

Installation

To use ElevenLabs STT services, install the required dependencies:

pip install "pipecat-ai[elevenlabs]"

Prerequisites

ElevenLabs Account Setup

Before using ElevenLabs STT services, you need:

ElevenLabs Account: Sign up at ElevenLabs Platform
API Key: Generate an API key from your account dashboard
Model Access: Ensure access to the Scribe v2 transcription model (default: scribe_v2)
HTTP Session: Configure aiohttp session for file uploads (HTTP service only)

Required Environment Variables

ELEVENLABS_API_KEY: Your ElevenLabs API key for authentication

Configuration

ElevenLabsSTTService

api_key

str

required

ElevenLabs API key for authentication.

aiohttp_session

aiohttp.ClientSession

required

An aiohttp session for HTTP requests. You must create and manage this yourself.

base_url

str

default:"https://api.elevenlabs.io"

Base URL for the ElevenLabs API.

model

str

default:"scribe_v2"

Model ID for transcription.

sample_rate

int

default:"None"

Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

params

InputParams

default:"None"

Configuration parameters for the STT service. See InputParams below.

ttfs_p99_latency

float

default:"ELEVENLABS_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

ElevenLabsRealtimeSTTService

api_key

str

required

ElevenLabs API key for authentication.

base_url

str

default:"api.elevenlabs.io"

Base URL for the ElevenLabs WebSocket API.

model

str

default:"scribe_v2_realtime"

Model ID for real-time transcription.

sample_rate

int

default:"None"

Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

params

RealtimeInputParams

default:"None"

Configuration parameters for the Realtime STT service. See Realtime InputParams below.

ttfs_p99_latency

float

default:"ELEVENLABS_REALTIME_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

InputParams

Parameters for ElevenLabsSTTService, passed via the params constructor argument.

Parameter	Type	Default	Description
`language`	`Language`	`None`	Target language for transcription.
`tag_audio_events`	`bool`	`True`	Include audio events like (laughter), (coughing) in transcription.

Realtime InputParams

Parameters for ElevenLabsRealtimeSTTService, passed via the params constructor argument.

Parameter	Type	Default	Description
`language_code`	`str`	`None`	ISO-639-1 or ISO-639-3 language code. `None` for auto-detection.
`commit_strategy`	`CommitStrategy`	`CommitStrategy.MANUAL`	How to segment speech: `"manual"` (Pipecat VAD) or `"vad"` (ElevenLabs VAD).
`vad_silence_threshold_secs`	`float`	`None`	Seconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy.
`vad_threshold`	`float`	`None`	VAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy.
`min_speech_duration_ms`	`int`	`None`	Minimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy.
`min_silence_duration_ms`	`int`	`None`	Minimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy.
`include_timestamps`	`bool`	`False`	Include word-level timestamps in transcripts.
`enable_logging`	`bool`	`False`	Enable logging on ElevenLabs’ side.
`include_language_detection`	`bool`	`False`	Include language detection in transcripts.

Usage

Basic HTTP Setup

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsSTTService

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
    )

HTTP with Language and Audio Events

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsSTTService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
        params=ElevenLabsSTTService.InputParams(
            language=Language.ES,
            tag_audio_events=False,
        ),
    )

Realtime WebSocket Setup

from pipecat.services.elevenlabs import ElevenLabsRealtimeSTTService

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
)

Realtime with Timestamps and Custom Commit Strategy

from pipecat.services.elevenlabs import ElevenLabsRealtimeSTTService
from pipecat.services.elevenlabs.stt import CommitStrategy

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    params=ElevenLabsRealtimeSTTService.InputParams(
        language_code="eng",
        commit_strategy=CommitStrategy.VAD,
        vad_silence_threshold_secs=1.0,
        include_timestamps=True,
    ),
)

Notes

HTTP vs Realtime: The HTTP service (ElevenLabsSTTService) uploads complete audio segments and is best for VAD-segmented transcription. The Realtime service (ElevenLabsRealtimeSTTService) streams audio over WebSocket for lower latency and provides interim transcripts.
Commit strategies: The Realtime service defaults to manual commit strategy, where Pipecat’s VAD controls when transcription segments are committed. Set commit_strategy=CommitStrategy.VAD to let ElevenLabs handle segment boundaries.
Keepalive: The Realtime service sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
Auto-reconnect: The Realtime service automatically reconnects if the WebSocket connection is closed when new audio arrives.

Event Handlers

ElevenLabsRealtimeSTTService supports the standard service connection events:

Event	Description
`on_connected`	Connected to ElevenLabs Realtime STT WebSocket
`on_disconnected`	Disconnected from ElevenLabs Realtime STT WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs Realtime STT")

The HTTP service (ElevenLabsSTTService) does not have connection events since it uses per-request HTTP calls.

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

ElevenLabs STT API Reference

Example Implementation

ElevenLabs Documentation

ElevenLabs Platform

Installation

Prerequisites

ElevenLabs Account Setup

Required Environment Variables

Configuration

ElevenLabsSTTService

ElevenLabsRealtimeSTTService

InputParams

Realtime InputParams

Usage

Basic HTTP Setup

HTTP with Language and Audio Events

Realtime WebSocket Setup

Realtime with Timestamps and Custom Commit Strategy

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

ElevenLabs STT API Reference

Example Implementation

ElevenLabs Documentation

ElevenLabs Platform

​Installation

​Prerequisites

​ElevenLabs Account Setup

​Required Environment Variables

​Configuration

​ElevenLabsSTTService

​ElevenLabsRealtimeSTTService

​InputParams

​Realtime InputParams

​Usage

​Basic HTTP Setup

​HTTP with Language and Audio Events

​Realtime WebSocket Setup

​Realtime with Timestamps and Custom Commit Strategy

​Notes

​Event Handlers

Overview

Installation

Prerequisites

ElevenLabs Account Setup

Required Environment Variables

Configuration

ElevenLabsSTTService

ElevenLabsRealtimeSTTService

InputParams

Realtime InputParams

Usage

Basic HTTP Setup

HTTP with Language and Audio Events

Realtime WebSocket Setup

Realtime with Timestamps and Custom Commit Strategy

Notes

Event Handlers