Skip to main content

Overview

SarvamTTSService provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content. The bulbul:v3-beta model adds temperature control and 25 new speaker voices.

Installation

To use Sarvam AI services, no additional dependencies are required beyond the base installation:
pip install "pipecat-ai"

Prerequisites

Sarvam AI Account Setup

Before using Sarvam AI TTS services, you need:
  1. Sarvam AI Account: Sign up at Sarvam AI Console
  2. API Key: Generate an API key from your account dashboard
  3. Language Selection: Choose from available Indian language voices

Required Environment Variables

  • SARVAM_API_KEY: Your Sarvam AI API key for authentication

Configuration

Sarvam offers two service implementations: SarvamTTSService (WebSocket) for real-time streaming and SarvamHttpTTSService (HTTP) for simpler batch synthesis.

SarvamTTSService

api_key
str
required
Sarvam AI API subscription key.
model
str
default:"bulbul:v2"
TTS model to use. Options: bulbul:v2, bulbul:v3-beta, bulbul:v3.
voice_id
str
default:"None"
Speaker voice ID. If None, uses the model-appropriate default (anushka for v2, shubh for v3).
url
str
default:"wss://api.sarvam.ai/text-to-speech/ws"
WebSocket URL for the TTS backend.
aggregate_sentences
bool
default:"True"
Buffer text until sentence boundaries before sending.
sample_rate
int
default:"None"
Audio sample rate in Hz (8000, 16000, 22050, 24000). If None, uses model-specific default (22050 for v2, 24000 for v3).
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams (WebSocket) below.

SarvamHttpTTSService

api_key
str
required
Sarvam AI API subscription key.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests.
model
str
default:"bulbul:v2"
TTS model to use. Options: bulbul:v2, bulbul:v3-beta, bulbul:v3.
voice_id
str
default:"None"
Speaker voice ID. If None, uses the model-appropriate default.
base_url
str
default:"https://api.sarvam.ai"
Sarvam AI API base URL.
sample_rate
int
default:"None"
Audio sample rate in Hz (8000, 16000, 22050, 24000). If None, uses model-specific default.
params
InputParams
default:"None"
Runtime-configurable voice and generation settings. See InputParams (HTTP) below.

InputParams (WebSocket)

ParameterTypeDefaultDescription
languageLanguageLanguage.ENTarget language for synthesis. Supports Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu.
pitchfloat0.0Voice pitch adjustment (-0.75 to 0.75). Only for bulbul:v2.
pacefloat1.0Speech pace multiplier. v2: 0.3-3.0, v3: 0.5-2.0.
loudnessfloat1.0Volume multiplier (0.3 to 3.0). Only for bulbul:v2.
enable_preprocessingboolFalseEnable text preprocessing. Always enabled for v3 models.
min_buffer_sizeint50Minimum characters to buffer before TTS processing.
max_chunk_lengthint150Maximum characters processed in a single chunk.
output_audio_codecstr"linear16"Audio codec: linear16, mulaw, alaw, opus, flac, aac, wav, mp3.
output_audio_bitratestr"128k"Audio bitrate: 32k, 64k, 96k, 128k, 192k.
temperaturefloat0.6Output randomness for v3 models (0.01-1.0). Ignored for v2.

InputParams (HTTP)

ParameterTypeDefaultDescription
languageLanguageLanguage.ENTarget language for synthesis.
pitchfloat0.0Voice pitch adjustment (-0.75 to 0.75). Only for bulbul:v2.
pacefloat1.0Speech pace multiplier. v2: 0.3-3.0, v3: 0.5-2.0.
loudnessfloat1.0Volume multiplier (0.3 to 3.0). Only for bulbul:v2.
enable_preprocessingboolFalseEnable text preprocessing. Always enabled for v3 models.
temperaturefloat0.6Output randomness for v3 models (0.01-1.0). Ignored for v2.

Usage

Basic Setup (WebSocket)

from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    voice_id="anushka",
    params=SarvamTTSService.InputParams(
        language=Language.HI,
    ),
)

With v3 Model and Temperature Control

from pipecat.services.sarvam import SarvamTTSService
from pipecat.transcriptions.language import Language

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    voice_id="aditya",
    model="bulbul:v3-beta",
    params=SarvamTTSService.InputParams(
        language=Language.HI,
        pace=1.2,
        temperature=0.8,
    ),
)

HTTP Service

import aiohttp
from pipecat.services.sarvam import SarvamHttpTTSService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    tts = SarvamHttpTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
        aiohttp_session=session,
        voice_id="anushka",
        params=SarvamHttpTTSService.InputParams(
            language=Language.HI,
            pitch=0.1,
            pace=1.2,
            loudness=1.5,
        ),
    )

Notes

  • Model differences: bulbul:v2 supports pitch and loudness control; bulbul:v3-beta and bulbul:v3 add temperature control but do not support pitch or loudness. Setting unsupported parameters for a model will log a warning.
  • Default speakers vary by model: v2 defaults to anushka; v3 models default to shubh.
  • Default sample rates vary by model: v2 defaults to 22050 Hz; v3 models default to 24000 Hz.
  • Indian language focus: Sarvam AI specializes in Indian languages, supporting Bengali, English (India), Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
  • Pace ranges differ: bulbul:v2 supports pace from 0.3 to 3.0, while v3 models support 0.5 to 2.0. Values outside the range are clamped automatically.

Event Handlers

Sarvam WebSocket TTS supports the standard service connection events:
EventDescription
on_connectedConnected to Sarvam WebSocket
on_disconnectedDisconnected from Sarvam WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Sarvam")