Overview
Hume provides expressive text-to-speech synthesis using their Octave models, which adapt pronunciation, pitch, speed, and emotional style based on context.HumeTTSService offers real-time streaming with word-level timestamps, custom voice support, and advanced synthesis controls including acting instructions, speed adjustment, and trailing silence configuration.
Hume TTS API Reference
Pipecat’s API methods for Hume TTS integration
Example Implementation
Complete example with word timestamps and interruption handling
Hume Documentation
Official Hume TTS API documentation and features
Voice Library
Browse and manage available voices
Installation
To use Hume services, install the required dependencies:Prerequisites
Hume Account Setup
Before using Hume TTS services, you need:- Hume Account: Sign up at Hume AI
- API Key: Generate an API key from your account dashboard
- Voice Selection: Choose voice IDs from the voice library or create custom voices
Required Environment Variables
HUME_API_KEY: Your Hume API key for authentication
Configuration
HumeTTSService
Hume API key. If omitted, reads the
HUME_API_KEY environment variable.ID of the voice to use. Only voice IDs are supported; voice names are not.
Output sample rate for PCM frames. Hume TTS streams at 48kHz.
Runtime-configurable synthesis controls. See InputParams below.
InputParams
Synthesis parameters that can be set at initialization via theparams constructor argument, or changed at runtime via UpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
description | str | None | Natural-language acting directions (up to 100 characters). |
speed | float | None | Speaking-rate multiplier (0.5-2.0). |
trailing_silence | float | None | Seconds of silence to append at the end (0-5). |
Usage
Basic Setup
With Acting Directions
Updating Settings at Runtime
Voice and synthesis parameters can be changed mid-conversation usingUpdateSettingsFrame:
Notes
- Fixed sample rate: Hume TTS streams at 48kHz. Setting a different
sample_ratewill produce a warning. - Word timestamps: The service provides word-level timestamps for synchronized text display. Timestamps are tracked cumulatively across utterances within a turn.
- Description versions: When
descriptionis provided, the service uses Hume API version"1". Without a description, it uses the newer version"2". - Audio buffering: Audio is buffered internally until a minimum chunk size is reached before being pushed as frames, reducing audio glitches.