Overview
GradiumSTTService provides real-time speech recognition using Gradium’s WebSocket API with support for multilingual transcription, semantic voice activity detection for smart turn-taking, and robust performance in noisy environments.
Gradium STT API Reference
Pipecat’s API methods for Gradium STT integration
Example Implementation
Complete example with interruption handling
Gradium Documentation
Official Gradium STT API documentation
Gradium Platform
Access API keys and speech models
Installation
To use Gradium services, install the required dependency:Prerequisites
Gradium Account Setup
Before using Gradium STT services, you need:- Gradium Account: Sign up at Gradium
- API Key: Generate an API key from your account dashboard
- Region Selection: Choose your preferred region (EU or US)
Required Environment Variables
GRADIUM_API_KEY: Your Gradium API key for authentication
Configuration
GradiumSTTService
Gradium API key for authentication.
WebSocket endpoint URL. Override for different regions or custom deployments.
Configuration parameters for language and delay settings. See InputParams below.
Optional JSON configuration string for additional model settings. Deprecated in favor of
params.InputParams
Runtime-configurable parameters that can be set at initialization via theparams constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | None | Expected language of the audio (e.g., Language.EN, Language.ES). Helps ground the model to a specific language and improve transcription quality. |
delay_in_frames | int | None | Delay in audio frames (80ms each) before text is generated. Higher delays allow more context but increase latency. Allowed values: 7, 8, 10, 12, 14, 16, 20, 24, 36, 48. Default is 10 (800ms). |
Usage
Basic Setup
With Language and Delay Configuration
Notes
- Supported languages: German, English, Spanish, French, and Portuguese.
- Silence flushing: When VAD detects the user has stopped speaking, the service sends silence frames to flush the transcription buffer, resulting in faster final transcripts without closing the connection.
- Audio format: Sends audio as 24 kHz 16-bit PCM in 80ms chunks.
Event Handlers
Gradium STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Gradium WebSocket |
on_disconnected | Disconnected from Gradium WebSocket |