Skip to main content

Overview

SonioxSTTService provides real-time speech-to-text transcription using Soniox’s WebSocket API with support for over 60 languages, custom context, multiple languages in the same conversation, and advanced features for accurate multilingual transcription. By default, Soniox uses the stt-rt-v4 model with vad_force_turn_endpoint=True, which disables Soniox’s native turn detection and relies on Pipecat’s local VAD to finalize transcripts. This configuration significantly reduces the time to final segment (~250ms median). Pipecat enables smart-turn detection by default using LocalSmartTurnAnalyzerV3. To use Soniox’s native turn detection instead, set vad_force_turn_endpoint=False.

Installation

To use Soniox services, install the required dependencies:
pip install "pipecat-ai[soniox]"

Prerequisites

Soniox Account Setup

Before using Soniox STT services, you need:
  1. Soniox Account: Sign up at Soniox Console
  2. API Key: Generate an API key from your console dashboard
  3. Language Selection: Choose from 60+ supported languages and models

Required Environment Variables

  • SONIOX_API_KEY: Your Soniox API key for authentication

Configuration

SonioxSTTService

api_key
str
required
Soniox API key for authentication.
url
str
default:"wss://stt-rt.soniox.com/transcribe-websocket"
Soniox WebSocket API URL.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
SonioxInputParams
default:"None"
Configuration parameters for model, language, and features. See SonioxInputParams below.
vad_force_turn_endpoint
bool
default:"True"
Listen to VADUserStoppedSpeakingFrame to send a finalize message to Soniox. When enabled, Pipecat’s local VAD triggers transcript finalization. When disabled, Soniox detects the end of speech natively.

SonioxInputParams

Settings that can be set at initialization via the params constructor argument.
ParameterTypeDefaultDescription
modelstr"stt-rt-v4"Model to use for transcription.
audio_formatstr"pcm_s16le"Audio format for transcription.
num_channelsint1Number of audio channels.
language_hintslist[Language]NoneLanguage hints for transcription. Helps the model prioritize expected languages.
language_hints_strictboolNoneIf true, strictly enforce language hints (only transcribe in provided languages).
contextSonioxContextObject | strNoneCustomization for transcription. String for models with context_version 1, SonioxContextObject for context_version 2 (stt-rt-v3-preview and higher).
enable_speaker_diarizationboolFalseEnable speaker diarization. Tokens are annotated with speaker IDs.
enable_language_identificationboolFalseEnable language identification. Tokens are annotated with language IDs.
client_reference_idstrNoneClient reference ID for transcription tracking.

Usage

Basic Setup

from pipecat.services.soniox import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)

With Language Hints and Context

from pipecat.services.soniox import SonioxSTTService, SonioxInputParams
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    params=SonioxInputParams(
        model="stt-rt-v4",
        language_hints=[Language.EN, Language.ES],
        language_hints_strict=True,
        enable_language_identification=True,
    ),
)

With Context Object (v3+ models)

from pipecat.services.soniox import (
    SonioxSTTService,
    SonioxInputParams,
    SonioxContextObject,
    SonioxContextGeneralItem,
)

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    params=SonioxInputParams(
        model="stt-rt-v4",
        context=SonioxContextObject(
            general=[
                SonioxContextGeneralItem(key="domain", value="medical"),
            ],
            terms=["Pipecat", "transcription"],
        ),
    ),
)

With Soniox Native Turn Detection

from pipecat.services.soniox import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    vad_force_turn_endpoint=False,  # Use Soniox's native endpoint detection
)

Notes

  • Turn finalization: By default (vad_force_turn_endpoint=True), when Pipecat’s VAD detects the user has stopped speaking, a finalize message is sent to Soniox to get the final transcript immediately. This significantly reduces latency.
  • Keepalive: The service automatically sends protocol-level keepalive messages to maintain the WebSocket connection.
  • Context versions: Use a string for context with older models (context_version 1) and SonioxContextObject for newer models (stt-rt-v3-preview and higher, context_version 2). See the Soniox context documentation for details.

Event Handlers

Soniox STT supports the standard service connection events:
EventDescription
on_connectedConnected to Soniox WebSocket
on_disconnectedDisconnected from Soniox WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox")