Overview
GoogleSTTService provides real-time speech recognition using Google Cloud’s Speech-to-Text V2 API with support for 125+ languages, multiple models, voice activity detection, and advanced features like automatic punctuation and word-level confidence scores.
Google STT API Reference
Pipecat’s API methods for Google Cloud STT integration
Example Implementation
Complete example with Google Cloud services
Google Cloud Documentation
Official Google Cloud Speech-to-Text documentation
Google Cloud Console
Create service accounts and manage API access
Installation
To use Google Cloud Speech services, install the required dependency:Prerequisites
Google Cloud Setup
Before using Google Cloud STT services, you need:- Google Cloud Account: Sign up at Google Cloud Console
- Project Setup: Create a project and enable the Speech-to-Text API
- Service Account: Create a service account with Speech-to-Text permissions
- Authentication: Set up credentials via service account key or Application Default Credentials
Required Environment Variables
GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)- Or use Application Default Credentials for cloud deployments
Configuration
GoogleSTTService
JSON string containing Google Cloud service account credentials.
Path to service account credentials JSON file.
Google Cloud location (e.g.,
"global", "us-central1"). Non-global locations use regional endpoints.Audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Configuration parameters for the STT service. See InputParams below.
P99 latency from speech end to final transcript in seconds. Override for your deployment.
You must provide either
credentials (JSON string), credentials_path (file path), or have Application Default Credentials configured. At least one authentication method is required.InputParams
Parameters passed via theparams constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
languages | Language | List[Language] | [Language.EN_US] | Single language or list of recognition languages. First language is primary. |
model | str | "latest_long" | Speech recognition model to use. |
use_separate_recognition_per_channel | bool | False | Process each audio channel separately. |
enable_automatic_punctuation | bool | True | Add punctuation to transcripts. |
enable_spoken_punctuation | bool | False | Include spoken punctuation in transcript. |
enable_spoken_emojis | bool | False | Include spoken emojis in transcript. |
profanity_filter | bool | False | Filter profanity from transcript. |
enable_word_time_offsets | bool | False | Include timing information for each word. |
enable_word_confidence | bool | False | Include confidence scores for each word. |
enable_interim_results | bool | True | Stream partial recognition results. |
enable_voice_activity_events | bool | False | Detect voice activity in audio. |
Usage
Basic Setup
With Credentials JSON String
With Custom Parameters
Updating Options at Runtime
Google STT supports dynamic option updates via theupdate_options method:
Notes
- Streaming time limit: Google Cloud STT has a 5-minute streaming limit per connection. The service automatically handles stream reconnection at 4 minutes to provide seamless transcription without interruption.
- Multi-language support: Pass a list of
Languagevalues tolanguagesfor multi-language recognition. The first language is the primary language. - Regional endpoints: Use the
locationparameter to route requests through regional endpoints (e.g.,"us-central1","europe-west1") for data residency requirements. The default"global"endpoint works for most use cases. - Stream abort on inactivity: If no audio is sent for ~10 seconds (e.g., when audio frames are blocked by an
STTMuteFilter), Google automatically closes the stream. The service recovers by automatically reconnecting. - Authentication priority: The service checks for credentials in this order:
credentials(JSON string),credentials_path(file), then Application Default Credentials.
Event Handlers
Google STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Google Cloud Speech-to-Text |
on_disconnected | Disconnected from Google Cloud Speech-to-Text |