Skip to main content

Overview

VonageFrameSerializer enables integration with the Vonage Video API Audio Connector WebSocket protocol, allowing Pipecat applications to process real-time audio streams from active Vonage video sessions.

Installation

The VonageFrameSerializer does not require any additional dependencies beyond the core Pipecat library:
pip install "pipecat-ai"

Prerequisites

Vonage Video API Account Setup

Before using VonageFrameSerializer, you need:
  1. Vonage (TokBox) Account: Sign up at Vonage Video API Console
  2. Vonage Video API Project: Create a project to obtain Project API Key and Project Secret
  3. Existing Vonage Video Session: A Vonage session must already exist. Sessions can be created using TokBox Playground or Vonage Video API SDKs

Required Environment Variables

  • VONAGE_API_KEY: Your Vonage Video API project key
  • VONAGE_API_SECRET: Your Vonage Video API project secret
  • VONAGE_SESSION_ID: The existing routed session ID
  • WS_URI: Public WebSocket endpoint URI of the server application running Pipecat (e.g. via ngrok)

Required Configuration

  • WebSocket Endpoint (/ws): A WebSocket server application (e.g. FastAPI) running Pipecat that accepts raw PCM audio frames.
  • Audio Connector /connect Request: Triggers Vonage to open a WebSocket connection to your server and begin streaming audio from the active session.

Key Features

  • Bidirectional Audio: Convert between Pipecat and Vonage Audio Connector formats
  • Real-Time AI Pipelines: Stream live audio into Pipecat and process it through any real-time pipeline configuration supported by the framework
  • Session Control Events: Handle Vonage Audio Connector JSON events
  • Linear PCM Audio: Handle raw 16-bit linear PCM audio streams used by the Vonage Video API Audio Connector

Configuration

params
InputParams
default:"None"
Configuration parameters for audio settings. See InputParams below.

InputParams

ParameterTypeDefaultDescription
vonage_sample_rateint16000Sample rate used by Vonage (Hz). Common values: 8000, 16000, 24000.
sample_rateintNoneOptional override for pipeline input sample rate. When None, uses the pipeline’s configured rate.
ignore_rtvi_messagesboolTrueWhether to ignore RTVI protocol messages during serialization.

Usage

Basic Setup

from pipecat.serializers.vonage import VonageFrameSerializer
from pipecat.transports.network.websocket_server import WebSocketServerTransport

serializer = VonageFrameSerializer()

transport = WebSocketServerTransport(
    params=WebSocketServerParams(
        audio_out_enabled=True,
        add_wav_header=False,
        serializer=serializer,
    )
)

With Custom Sample Rate

serializer = VonageFrameSerializer(
    params=VonageFrameSerializer.InputParams(
        vonage_sample_rate=8000,
    ),
)

Notes

  • Linear PCM audio: Unlike Twilio and Plivo, Vonage uses raw 16-bit linear PCM audio (not mu-law encoded). Audio data is sent as binary WebSocket messages rather than base64-encoded JSON.
  • No auto hang-up: The Vonage serializer does not include automatic call termination. Session lifecycle is managed through the Vonage Video API.
  • Event handling: The serializer handles Vonage-specific WebSocket events including websocket:connected, websocket:cleared, websocket:notify, and websocket:dtmf.
  • DTMF support: Touch-tone digit events are converted to InputDTMFFrame objects.