Skip to main content

Overview

CartesiaSTTService provides real-time speech recognition using Cartesia’s WebSocket API with the ink-whisper model, supporting streaming transcription with both interim and final results for low-latency applications.

Installation

To use Cartesia services, install the required dependency:
pip install "pipecat-ai[cartesia]"

Prerequisites

Cartesia Account Setup

Before using Cartesia STT services, you need:
  1. Cartesia Account: Sign up at Cartesia
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the ink-whisper transcription model

Required Environment Variables

  • CARTESIA_API_KEY: Your Cartesia API key for authentication

Configuration

CartesiaSTTService

api_key
str
required
Cartesia API key for authentication.
base_url
str
default:"api.cartesia.ai"
Custom API endpoint URL. Override for proxied deployments.
sample_rate
int
default:"16000"
Audio sample rate in Hz.
live_options
CartesiaLiveOptions
default:"None"
Configuration options for the transcription service. See CartesiaLiveOptions below.
ttfs_p99_latency
float
default:"CARTESIA_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

CartesiaLiveOptions

Transcription configuration passed via the live_options constructor argument.
ParameterTypeDefaultDescription
modelstr"ink-whisper"The transcription model to use.
languagestr"en"Target language for transcription.
encodingstr"pcm_s16le"Audio encoding format.
sample_rateint16000Audio sample rate in Hz.

Usage

Basic Setup

from pipecat.services.cartesia import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

With Custom Options

from pipecat.services.cartesia import CartesiaSTTService
from pipecat.services.cartesia.stt import CartesiaLiveOptions

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    live_options=CartesiaLiveOptions(
        model="ink-whisper",
        language="es",
        sample_rate=16000,
    ),
)

Notes

  • Inactivity timeout: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
  • Auto-reconnect on send: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
  • Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a "finalize" command to flush the transcription session and produce a final result.

Event Handlers

Cartesia STT supports the standard service connection events:
EventDescription
on_connectedConnected to Cartesia WebSocket
on_disconnectedDisconnected from Cartesia WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Cartesia STT")