Overview
UltravoxRealtimeLLMService provides real-time conversational AI capabilities using Ultravox’s Realtime API. It supports both text and audio modalities with voice transcription, streaming responses, and tool usage for creating interactive AI experiences.
Ultravox Realtime API Reference
Pipecat’s API methods for Ultravox Realtime integration
Example Implementation
Complete Ultravox Realtime conversation example
Ultravox Documentation
Official Ultravox API documentation
Ultravox Console
Access Ultravox models and manage API keys
Installation
To use Ultravox Realtime services, install the required dependencies:Prerequisites
Ultravox Account Setup
Before using Ultravox Realtime services, you need:- Ultravox Account: Sign up at Ultravox Console
- API Key: Generate an Ultravox API key from your account dashboard
- Model Access: Ensure access to Ultravox Realtime models
- Usage Limits: Configure appropriate usage limits and billing
Required Environment Variables
ULTRAVOX_API_KEY: Your Ultravox API key for authentication
Key Features
- Audio-Native Model: Ultravox is an audio-native model for natural voice interactions
- Real-time Streaming: Low-latency audio processing and streaming responses
- Multiple Input Modes: Support for Agent, One-Shot, and Join URL input parameters
- Voice Transcription: Built-in transcription with streaming output
- Function Calling: Support for tool integration and API calling
- Configurable Duration: Set maximum call duration limits
Configuration
UltravoxRealtimeLLMService
Configuration parameters for connecting to Ultravox. One of three input parameter types must be provided. See Input Parameter Types below.
Tools to use with a one-shot call. May only be set when using
OneShotInputParams.Input Parameter Types
Ultravox supports three different ways to create or join a call:AgentInputParams
Use a pre-configured Ultravox Agent to handle calls consistently.| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | required | Ultravox API key for authentication. |
agent_id | UUID | required | The ID of the Ultravox agent. Create and edit agents in the Ultravox Console. |
template_context | Dict[str, Any] | {} | Context variables for agent template instantiation. |
metadata | Dict[str, str] | {} | Metadata to attach to the call. |
max_duration | timedelta | None | Maximum call duration (10s to 1h). None uses the agent’s default. |
extra | Dict[str, Any] | {} | Extra parameters for the agent call creation request. |
OneShotInputParams
Create a one-off call with inline configuration.| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | required | Ultravox API key for authentication. |
system_prompt | str | None | System prompt to guide the model’s behavior. |
temperature | float | 0.0 | Sampling temperature for response generation (0.0-1.0). |
model | str | None | Model identifier to use (e.g., "fixie-ai/ultravox"). |
voice | UUID | None | Voice identifier for speech generation. |
metadata | Dict[str, str] | {} | Metadata to attach to the call. |
max_duration | timedelta | 1 hour | Maximum call duration (10s to 1h). |
extra | Dict[str, Any] | {} | Extra parameters for the call creation request. |
JoinUrlInputParams
Join an existing Ultravox call using a join URL.| Parameter | Type | Default | Description |
|---|---|---|---|
join_url | str | required | The join URL for the existing Ultravox Realtime call. |
Usage
Basic Setup with Agent
One-Shot Call
One-Shot with Tools
Join Existing Call
Switching Output Medium at Runtime
Notes
- Audio-native model: Ultravox processes audio directly rather than relying on a separate STT step. Voice transcriptions are provided for reference but may not always align with the model’s understanding of user input.
- Server-side context management: Ultravox handles conversation context server-side. The LLM context in Pipecat is only used for passing function call results back to the service.
- Audio sample rate: The service uses a 48kHz sample rate. Input audio at different sample rates is automatically resampled.
- Output medium: The service supports both
"voice"and"text"output modes, switchable at runtime usingLLMUpdateSettingsFrame. - Call duration limits: When using
AgentInputParamsorOneShotInputParams, you can set a maximum call duration between 10 seconds and 1 hour. - Tools with agents: When using
AgentInputParams, tools are configured on the agent itself. Useone_shot_selected_toolsonly withOneShotInputParams.