Skip to main content

Overview

AWSTranscribeSTTService provides real-time speech recognition using Amazon Transcribe’s WebSocket streaming API with support for interim results, multiple languages, and configurable audio processing parameters for enterprise-grade transcription.

Installation

To use AWS Transcribe services, install the required dependency:
pip install "pipecat-ai[aws]"

Prerequisites

AWS Account Setup

Before using AWS Transcribe STT services, you need:
  1. AWS Account: Sign up at AWS Console
  2. IAM User: Create an IAM user with Amazon Transcribe permissions
  3. Credentials: Set up AWS access keys and region configuration

Required Environment Variables

  • AWS_ACCESS_KEY_ID: Your AWS access key ID
  • AWS_SECRET_ACCESS_KEY: Your AWS secret access key
  • AWS_SESSION_TOKEN: Session token (if using temporary credentials)
  • AWS_REGION: AWS region (defaults to “us-east-1”)

Configuration

api_key
str
default:"None"
AWS secret access key. If None, uses AWS_SECRET_ACCESS_KEY environment variable.
aws_access_key_id
str
default:"None"
AWS access key ID. If None, uses AWS_ACCESS_KEY_ID environment variable.
aws_session_token
str
default:"None"
AWS session token for temporary credentials. If None, uses AWS_SESSION_TOKEN environment variable.
region
str
default:"None"
AWS region for the service. If None, uses AWS_REGION environment variable (defaults to "us-east-1").
sample_rate
int
default:"16000"
Audio sample rate in Hz. Must be 8000 or 16000.
language
Language
default:"Language.EN"
Language for transcription. Supports a wide range of languages including English, Spanish, French, German, and many more. See AWS Transcribe supported languages.
ttfs_p99_latency
float
default:"AWS_TRANSCRIBE_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Usage

Basic Setup

from pipecat.services.aws import AWSTranscribeSTTService

stt = AWSTranscribeSTTService(
    api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region=os.getenv("AWS_REGION", "us-east-1"),
)

With Custom Language and Sample Rate

from pipecat.services.aws import AWSTranscribeSTTService
from pipecat.transcriptions.language import Language

stt = AWSTranscribeSTTService(
    api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="eu-west-1",
    language=Language.ES,
    sample_rate=8000,
)

Notes

  • Supported sample rates: AWS Transcribe only supports 8000 Hz and 16000 Hz. If a different rate is provided, the service automatically falls back to 16000 Hz with a warning.
  • Pre-signed URL authentication: The service uses pre-signed URLs for WebSocket authentication rather than passing credentials directly, following AWS best practices.
  • Partial results stabilization: Enabled by default with "high" stability, which reduces changes to interim transcripts at the cost of slightly higher latency.

Event Handlers

AWS Transcribe STT supports the standard service connection events:
EventDescription
on_connectedConnected to AWS Transcribe WebSocket
on_disconnectedDisconnected from AWS Transcribe WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to AWS Transcribe")