Overview
AssemblyAISTTService provides real-time speech recognition using AssemblyAI’s WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters for accurate transcription in conversational AI applications.
AssemblyAI STT API Reference
Pipecat’s API methods for AssemblyAI STT integration
Example Implementation
Complete example with interruption handling
AssemblyAI Documentation
Official AssemblyAI documentation and features
AssemblyAI Console
Access API keys and transcription features
Installation
To use AssemblyAI services, install the required dependency:Prerequisites
AssemblyAI Account Setup
Before using AssemblyAI STT services, you need:- AssemblyAI Account: Sign up at AssemblyAI Console
- API Key: Generate an API key from your dashboard
- Model Selection: Choose from available transcription models and features
Required Environment Variables
ASSEMBLYAI_API_KEY: Your AssemblyAI API key for authentication
Configuration
AssemblyAISTTService
AssemblyAI API key for authentication.
Language code for transcription. AssemblyAI currently supports English.
WebSocket endpoint URL. Override for custom or proxied deployments.
Connection configuration parameters. See AssemblyAIConnectionParams below.
Whether to force turn endpoint on VAD stop. When
True, disables AssemblyAI’s model-based turn detection and relies on external VAD to trigger turn endpoints. Automatically sets end_of_turn_confidence_threshold=1.0 and max_turn_silence=2000 unless explicitly overridden.P99 latency from speech end to final transcript in seconds. Override for your deployment.
AssemblyAIConnectionParams
Connection-level parameters passed via theconnection_params constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
sample_rate | int | 16000 | Audio sample rate in Hz. |
encoding | Literal | "pcm_s16le" | Audio encoding format. Options: "pcm_s16le", "pcm_mulaw". |
formatted_finals | bool | True | Whether to enable transcript formatting. |
word_finalization_max_wait_time | int | None | Maximum time to wait for word finalization in milliseconds. |
end_of_turn_confidence_threshold | float | None | Confidence threshold for end-of-turn detection. |
min_end_of_turn_silence_when_confident | int | None | Minimum silence duration (ms) when confident about end-of-turn. |
max_turn_silence | int | None | Maximum silence duration (ms) before forcing end-of-turn. |
keyterms_prompt | List[str] | None | List of key terms to guide transcription. |
speech_model | Literal | "universal-streaming-english" | Speech model. Options: "universal-streaming-english", "universal-streaming-multilingual". |
Usage
Basic Setup
With Custom Connection Parameters
Notes
- English only by default: AssemblyAI’s default model supports English. Use
speech_model="universal-streaming-multilingual"inconnection_paramsfor multilingual support. - VAD turn endpoint mode: When
vad_force_turn_endpoint=True(the default), AssemblyAI’s model-based turn detection is disabled in favor of external VAD. This sends aForceEndpointmessage when the VAD detects the user has stopped speaking. - Formatted finals: When
formatted_finals=True, the service waits for formatted transcripts before emitting finalTranscriptionFrames. This provides properly formatted text but may introduce a small delay.
Event Handlers
AssemblyAI STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to AssemblyAI WebSocket |
on_disconnected | Disconnected from AssemblyAI WebSocket |