Skip to main content

Overview

MiniMaxTTSService provides high-quality text-to-speech synthesis using MiniMax’s T2A (Text-to-Audio) API with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.

Installation

To use MiniMax services, no additional dependencies are required beyond the base installation:
pip install "pipecat-ai"

Prerequisites

MiniMax Account Setup

Before using MiniMax TTS services, you need:
  1. MiniMax Account: Sign up at MiniMax Platform
  2. API Credentials: Get your API key and Group ID from the platform
  3. Voice Selection: Choose from available voice models and emotional settings

Required Environment Variables

  • MINIMAX_API_KEY: Your MiniMax API key for authentication
  • MINIMAX_GROUP_ID: Your MiniMax group ID

Configuration

MiniMaxHttpTTSService

api_key
str
required
MiniMax API key for authentication.
group_id
str
required
MiniMax Group ID to identify project.
voice_id
str
default:"Calm_Woman"
Voice identifier for synthesis.
model
str
default:"speech-02-turbo"
TTS model name. Options include speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.
base_url
str
default:"https://api.minimax.io/v1/t2a_v2"
API base URL. Use https://api.minimaxi.chat/v1/t2a_v2 for mainland China or https://api-uw.minimax.io/v1/t2a_v2 for western United States.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams below.

InputParams

Voice and generation settings that can be set at initialization via the params constructor argument.
ParameterTypeDefaultDescription
languageLanguageLanguage.ENLanguage for TTS generation. Supports 40+ languages. Filipino, Tamil, and Persian require speech-2.6-* models.
speedfloat1.0Speech speed (0.5 to 2.0).
volumefloat1.0Speech volume (0 to 10).
pitchint0Pitch adjustment (-12 to 12).
emotionstrNoneEmotional tone: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "calm", or "fluent".
english_normalizationboolNoneDeprecated. Use text_normalization instead.
text_normalizationboolNoneEnable text normalization (Chinese/English).
latex_readboolNoneEnable LaTeX formula reading.
exclude_aggregated_audioboolNoneWhether to exclude aggregated audio in final chunk.

Usage

Basic Setup

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        aiohttp_session=session,
    )

With Voice Customization

import aiohttp
from pipecat.services.minimax import MiniMaxHttpTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = MiniMaxHttpTTSService(
        api_key=os.getenv("MINIMAX_API_KEY"),
        group_id=os.getenv("MINIMAX_GROUP_ID"),
        voice_id="Calm_Woman",
        model="speech-02-hd",
        aiohttp_session=session,
        params=MiniMaxHttpTTSService.InputParams(
            language=Language.ZH,
            speed=1.2,
            emotion="happy",
        ),
    )

Notes

  • HTTP-based streaming: MiniMax uses an HTTP streaming API, not WebSocket. Audio data is returned in hex-encoded PCM chunks.
  • Emotional voice control: The emotion parameter lets you adjust the emotional tone of the voice without changing the voice model itself.
  • Model selection: The speech-2.6-* models are the latest and support additional languages (Filipino, Tamil, Persian). Use turbo variants for lower latency or hd variants for higher quality.
  • The Python class is named MiniMaxHttpTTSService, not MiniMaxTTSService.