Skip to main content

Overview

UltravoxRealtimeLLMService provides real-time conversational AI capabilities using Ultravox’s Realtime API. It supports both text and audio modalities with voice transcription, streaming responses, and tool usage for creating interactive AI experiences.

Installation

To use Ultravox Realtime services, install the required dependencies:
pip install "pipecat-ai[ultravox]"

Prerequisites

Ultravox Account Setup

Before using Ultravox Realtime services, you need:
  1. Ultravox Account: Sign up at Ultravox Console
  2. API Key: Generate an Ultravox API key from your account dashboard
  3. Model Access: Ensure access to Ultravox Realtime models
  4. Usage Limits: Configure appropriate usage limits and billing

Required Environment Variables

  • ULTRAVOX_API_KEY: Your Ultravox API key for authentication

Key Features

  • Audio-Native Model: Ultravox is an audio-native model for natural voice interactions
  • Real-time Streaming: Low-latency audio processing and streaming responses
  • Multiple Input Modes: Support for Agent, One-Shot, and Join URL input parameters
  • Voice Transcription: Built-in transcription with streaming output
  • Function Calling: Support for tool integration and API calling
  • Configurable Duration: Set maximum call duration limits

Configuration

UltravoxRealtimeLLMService

params
AgentInputParams | OneShotInputParams | JoinUrlInputParams
required
Configuration parameters for connecting to Ultravox. One of three input parameter types must be provided. See Input Parameter Types below.
one_shot_selected_tools
ToolsSchema
default:"None"
Tools to use with a one-shot call. May only be set when using OneShotInputParams.

Input Parameter Types

Ultravox supports three different ways to create or join a call:

AgentInputParams

Use a pre-configured Ultravox Agent to handle calls consistently.
ParameterTypeDefaultDescription
api_keystrrequiredUltravox API key for authentication.
agent_idUUIDrequiredThe ID of the Ultravox agent. Create and edit agents in the Ultravox Console.
template_contextDict[str, Any]{}Context variables for agent template instantiation.
metadataDict[str, str]{}Metadata to attach to the call.
max_durationtimedeltaNoneMaximum call duration (10s to 1h). None uses the agent’s default.
extraDict[str, Any]{}Extra parameters for the agent call creation request.

OneShotInputParams

Create a one-off call with inline configuration.
ParameterTypeDefaultDescription
api_keystrrequiredUltravox API key for authentication.
system_promptstrNoneSystem prompt to guide the model’s behavior.
temperaturefloat0.0Sampling temperature for response generation (0.0-1.0).
modelstrNoneModel identifier to use (e.g., "fixie-ai/ultravox").
voiceUUIDNoneVoice identifier for speech generation.
metadataDict[str, str]{}Metadata to attach to the call.
max_durationtimedelta1 hourMaximum call duration (10s to 1h).
extraDict[str, Any]{}Extra parameters for the call creation request.

JoinUrlInputParams

Join an existing Ultravox call using a join URL.
ParameterTypeDefaultDescription
join_urlstrrequiredThe join URL for the existing Ultravox Realtime call.

Usage

Basic Setup with Agent

import os
import uuid
from pipecat.services.ultravox import UltravoxRealtimeLLMService, AgentInputParams

llm = UltravoxRealtimeLLMService(
    params=AgentInputParams(
        api_key=os.getenv("ULTRAVOX_API_KEY"),
        agent_id=uuid.UUID("your-agent-id-here"),
    ),
)

One-Shot Call

from pipecat.services.ultravox import UltravoxRealtimeLLMService, OneShotInputParams

llm = UltravoxRealtimeLLMService(
    params=OneShotInputParams(
        api_key=os.getenv("ULTRAVOX_API_KEY"),
        system_prompt="You are a helpful assistant.",
        temperature=0.3,
        model="fixie-ai/ultravox",
    ),
)

One-Shot with Tools

from pipecat.services.ultravox import UltravoxRealtimeLLMService, OneShotInputParams

llm = UltravoxRealtimeLLMService(
    params=OneShotInputParams(
        api_key=os.getenv("ULTRAVOX_API_KEY"),
        system_prompt="You are a helpful assistant that can check the weather.",
    ),
    one_shot_selected_tools=tools,  # ToolsSchema instance
)

@llm.function("get_weather")
async def get_weather(function_name, tool_call_id, args, llm, context, result_callback):
    location = args.get("location", "unknown")
    await result_callback({"temperature": 72, "condition": "sunny", "location": location})

Join Existing Call

from pipecat.services.ultravox import UltravoxRealtimeLLMService, JoinUrlInputParams

llm = UltravoxRealtimeLLMService(
    params=JoinUrlInputParams(
        join_url="wss://your-ultravox-join-url",
    ),
)

Switching Output Medium at Runtime

from pipecat.frames.frames import LLMUpdateSettingsFrame

# Switch to text-only output
await task.queue_frame(
    LLMUpdateSettingsFrame(
        settings={"output_medium": "text"}
    )
)

# Switch back to voice output
await task.queue_frame(
    LLMUpdateSettingsFrame(
        settings={"output_medium": "voice"}
    )
)

Notes

  • Audio-native model: Ultravox processes audio directly rather than relying on a separate STT step. Voice transcriptions are provided for reference but may not always align with the model’s understanding of user input.
  • Server-side context management: Ultravox handles conversation context server-side. The LLM context in Pipecat is only used for passing function call results back to the service.
  • Audio sample rate: The service uses a 48kHz sample rate. Input audio at different sample rates is automatically resampled.
  • Output medium: The service supports both "voice" and "text" output modes, switchable at runtime using LLMUpdateSettingsFrame.
  • Call duration limits: When using AgentInputParams or OneShotInputParams, you can set a maximum call duration between 10 seconds and 1 hour.
  • Tools with agents: When using AgentInputParams, tools are configured on the agent itself. Use one_shot_selected_tools only with OneShotInputParams.