Skip to main content

Overview

AWSPollyTTSService provides high-quality text-to-speech synthesis through Amazon Polly with support for standard, neural, and generative engines. The service offers extensive language support, SSML features, and voice customization options including prosody controls for pitch, rate, and volume.

Installation

To use AWS Polly services, install the required dependencies:
pip install "pipecat-ai[aws]"

Prerequisites

AWS Account Setup

Before using AWS Polly TTS services, you need:
  1. AWS Account: Sign up at AWS Console
  2. IAM User: Create an IAM user with Polly permissions
  3. Access Keys: Generate access key ID and secret access key
  4. Voice Selection: Choose from available voices in the voice list

Required Environment Variables

  • AWS_ACCESS_KEY_ID: Your AWS access key ID
  • AWS_SECRET_ACCESS_KEY: Your AWS secret access key
  • AWS_SESSION_TOKEN: Session token (if using temporary credentials)
  • AWS_REGION: AWS region (defaults to “us-east-1”)

Configuration

AWSPollyTTSService

api_key
str
default:"None"
AWS secret access key. If None, uses the AWS_SECRET_ACCESS_KEY environment variable.
aws_access_key_id
str
default:"None"
AWS access key ID. If None, uses the AWS_ACCESS_KEY_ID environment variable.
aws_session_token
str
default:"None"
AWS session token for temporary credentials.
region
str
default:"None"
AWS region for Polly service. Defaults to us-east-1 if not set via environment variable.
voice_id
str
default:"Joanna"
Voice ID to use for synthesis.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate. AWS Polly internally synthesizes at 16kHz and resamples to the target rate.
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams below.

InputParams

Voice and generation settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.
ParameterTypeDefaultDescription
enginestrNoneTTS engine to use ("standard", "neural", "generative", etc.).
languageLanguageLanguage.ENLanguage for synthesis.
pitchstrNoneVoice pitch adjustment (for standard engine only, e.g. "+10%").
ratestrNoneSpeech rate adjustment (e.g. "slow", "fast", "120%").
volumestrNoneVoice volume adjustment (e.g. "loud", "soft", "+6dB").
lexicon_nameslist[str]NoneList of pronunciation lexicon names to apply.

Usage

Basic Setup

from pipecat.services.aws import AWSPollyTTSService

tts = AWSPollyTTSService(
    voice_id="Joanna",
)

With Voice Customization

from pipecat.transcriptions.language import Language

tts = AWSPollyTTSService(
    api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    region="us-east-1",
    voice_id="Matthew",
    params=AWSPollyTTSService.InputParams(
        engine="neural",
        language=Language.EN_US,
        rate="110%",
        volume="loud",
    ),
)

Notes

  • Engine selection: AWS Polly supports "standard", "neural", and "generative" engines. Not all voices support all engines. Check the AWS voice list for compatibility.
  • Pitch control: The pitch parameter only works with the "standard" engine. Neural and generative engines ignore it.
  • Audio resampling: Polly synthesizes PCM at 16kHz internally. The service automatically resamples to match your pipeline’s sample rate.