Overview
CartesiaSTTService provides real-time speech recognition using Cartesia’s WebSocket API with the ink-whisper model, supporting streaming transcription with both interim and final results for low-latency applications.
Cartesia STT API Reference
Pipecat’s API methods for Cartesia STT integration
Example Implementation
Complete example with transcription logging
Cartesia Documentation
Official Cartesia STT documentation and features
Cartesia Platform
Access API keys and transcription models
Installation
To use Cartesia services, install the required dependency:Prerequisites
Cartesia Account Setup
Before using Cartesia STT services, you need:- Cartesia Account: Sign up at Cartesia
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure access to the ink-whisper transcription model
Required Environment Variables
CARTESIA_API_KEY: Your Cartesia API key for authentication
Configuration
CartesiaSTTService
Cartesia API key for authentication.
Custom API endpoint URL. Override for proxied deployments.
Audio sample rate in Hz.
Configuration options for the transcription service. See CartesiaLiveOptions below.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
CartesiaLiveOptions
Transcription configuration passed via thelive_options constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "ink-whisper" | The transcription model to use. |
language | str | "en" | Target language for transcription. |
encoding | str | "pcm_s16le" | Audio encoding format. |
sample_rate | int | 16000 | Audio sample rate in Hz. |
Usage
Basic Setup
With Custom Options
Notes
- Inactivity timeout: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
- Auto-reconnect on send: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
- Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a
"finalize"command to flush the transcription session and produce a final result.
Event Handlers
Cartesia STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Cartesia WebSocket |
on_disconnected | Disconnected from Cartesia WebSocket |