Overview
MiniMaxTTSService provides high-quality text-to-speech synthesis using MiniMax’s T2A (Text-to-Audio) API with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.
MiniMax TTS API Reference
Pipecat’s API methods for MiniMax TTS integration
Example Implementation
Complete example with emotional voice settings
MiniMax Documentation
Official MiniMax T2A API documentation
MiniMax Platform
Access voice models and API credentials
Installation
To use MiniMax services, no additional dependencies are required beyond the base installation:Prerequisites
MiniMax Account Setup
Before using MiniMax TTS services, you need:- MiniMax Account: Sign up at MiniMax Platform
- API Credentials: Get your API key and Group ID from the platform
- Voice Selection: Choose from available voice models and emotional settings
Required Environment Variables
MINIMAX_API_KEY: Your MiniMax API key for authenticationMINIMAX_GROUP_ID: Your MiniMax group ID
Configuration
MiniMaxHttpTTSService
MiniMax API key for authentication.
MiniMax Group ID to identify project.
Voice identifier for synthesis.
TTS model name. Options include
speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.API base URL. Use
https://api.minimaxi.chat/v1/t2a_v2 for mainland China or https://api-uw.minimax.io/v1/t2a_v2 for western United States.An aiohttp session for HTTP requests.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured sample rate.Runtime-configurable voice and generation settings. See InputParams below.
InputParams
Voice and generation settings that can be set at initialization via theparams constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | Language.EN | Language for TTS generation. Supports 40+ languages. Filipino, Tamil, and Persian require speech-2.6-* models. |
speed | float | 1.0 | Speech speed (0.5 to 2.0). |
volume | float | 1.0 | Speech volume (0 to 10). |
pitch | int | 0 | Pitch adjustment (-12 to 12). |
emotion | str | None | Emotional tone: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm", or "fluent". |
english_normalization | bool | None | Deprecated. Use text_normalization instead. |
text_normalization | bool | None | Enable text normalization (Chinese/English). |
latex_read | bool | None | Enable LaTeX formula reading. |
exclude_aggregated_audio | bool | None | Whether to exclude aggregated audio in final chunk. |
Usage
Basic Setup
With Voice Customization
Notes
- HTTP-based streaming: MiniMax uses an HTTP streaming API, not WebSocket. Audio data is returned in hex-encoded PCM chunks.
- Emotional voice control: The
emotionparameter lets you adjust the emotional tone of the voice without changing the voice model itself. - Model selection: The
speech-2.6-*models are the latest and support additional languages (Filipino, Tamil, Persian). Useturbovariants for lower latency orhdvariants for higher quality. - The Python class is named
MiniMaxHttpTTSService, notMiniMaxTTSService.