Skip to main content

Overview

GladiaSTTService provides real-time speech recognition using Gladia’s WebSocket API with support for 99+ languages, custom vocabulary, translation, sentiment analysis, and advanced audio processing features for comprehensive transcription.

Installation

To use Gladia services, install the required dependency:
pip install "pipecat-ai[gladia]"

Prerequisites

Gladia Account Setup

Before using Gladia STT services, you need:
  1. Gladia Account: Sign up at Gladia
  2. API Key: Generate an API key from your account dashboard
  3. Region Selection: Choose your preferred region (EU-West or US-West)

Required Environment Variables

  • GLADIA_API_KEY: Your Gladia API key for authentication
  • GLADIA_REGION: Your preferred region (optional, defaults to “eu-west”)

Configuration

GladiaSTTService

api_key
str
required
Gladia API key for authentication.
region
Literal['us-west', 'eu-west']
default:"None"
Region used to process audio. Defaults to "eu-west" when None.
url
str
default:"https://api.gladia.io/v2/live"
Gladia API URL for session initialization.
confidence
float
default:"None"
Minimum confidence threshold for transcriptions (0.0-1.0). Deprecated — no confidence threshold is applied.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
model
str
default:"solaria-1"
Model to use for transcription.
params
GladiaInputParams
default:"None"
Additional configuration parameters for the Gladia service. See GladiaInputParams below.
max_buffer_size
int
default:"20971520"
Maximum size of audio buffer in bytes (default 20MB).
should_interrupt
bool
default:"True"
Whether the bot should be interrupted when Gladia VAD detects user speech.
ttfs_p99_latency
float
default:"GLADIA_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

GladiaInputParams

Parameters passed via the params constructor argument. Import directly:
from pipecat.services.gladia.config import GladiaInputParams
ParameterTypeDefaultDescription
encodingstr"wav/pcm"Audio encoding format.
bit_depthint16Audio bit depth.
channelsint1Number of audio channels.
custom_metadataDict[str, Any]NoneAdditional metadata to include with requests.
endpointingfloatNoneSilence duration in seconds to mark end of speech.
maximum_duration_without_endpointingint5Maximum utterance duration (seconds) without silence.
languageLanguageNoneLanguage code for transcription. Deprecated — use language_config instead.
language_configLanguageConfigNoneDetailed language configuration with code switching support.
pre_processingPreProcessingConfigNoneAudio pre-processing options (audio enhancer, speech threshold).
realtime_processingRealtimeProcessingConfigNoneReal-time processing features (custom vocabulary, translation, NER, sentiment).
messages_configMessagesConfigNoneWebSocket message filtering options.
enable_vadboolFalseEnable Gladia VAD for end-of-utterance detection. Use without other VAD in the agent.

Usage

Basic Setup

from pipecat.services.gladia import GladiaSTTService

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
)

With Language Configuration

from pipecat.services.gladia import GladiaSTTService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    region="us-west",
    model="solaria-1",
    params=GladiaInputParams(
        language_config=LanguageConfig(
            languages=["en", "es"],
            code_switching=True,
        ),
    ),
)

With Real-time Processing

from pipecat.services.gladia import GladiaSTTService
from pipecat.services.gladia.config import (
    GladiaInputParams,
    RealtimeProcessingConfig,
    CustomVocabularyConfig,
    CustomVocabularyItem,
    TranslationConfig,
)

stt = GladiaSTTService(
    api_key=os.getenv("GLADIA_API_KEY"),
    params=GladiaInputParams(
        realtime_processing=RealtimeProcessingConfig(
            custom_vocabulary=True,
            custom_vocabulary_config=CustomVocabularyConfig(
                vocabulary=[
                    CustomVocabularyItem(value="Pipecat", intensity=0.8),
                    "Gladia",
                ],
            ),
            translation=True,
            translation_config=TranslationConfig(
                target_languages=["fr", "de"],
                model="enhanced",
            ),
        ),
    ),
)

Notes

  • Session-based connection: Gladia uses a two-step connection process: first an HTTP POST to initialize a session, then a WebSocket connection to the returned session URL. The session URL and ID are managed automatically.
  • Audio buffering: The service buffers audio data locally and sends it when connected. If the connection drops and reconnects, buffered audio is automatically re-sent to minimize transcript gaps.
  • Keepalive: Empty audio chunks are sent periodically to keep the Gladia connection alive (keepalive interval: 5s, timeout: 20s).
  • Built-in VAD: Set enable_vad=True in GladiaInputParams to use Gladia’s server-side VAD, which emits UserStartedSpeakingFrame and UserStoppedSpeakingFrame. When using this, do not enable another VAD in your pipeline.
  • Translation: Gladia supports real-time translation to multiple target languages. Translation results are pushed as TranslationFrames.

Event Handlers

Gladia STT supports the standard service connection events:
EventDescription
on_connectedConnected to Gladia WebSocket
on_disconnectedDisconnected from Gladia WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Gladia")