Gemini Live

Overview

GeminiLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with multimodal capabilities including audio, video, and text processing.

Want to start building? Check out our Gemini Live Guide.

Gemini Live API Reference

Pipecat’s API methods for Gemini Live integration

Example Implementation

Complete Gemini Live function calling example

Gemini Documentation

Official Google Gemini Live API documentation

Gemini Live Model Card

Gemini Live available models

Installation

To use Gemini Live services, install the required dependencies:

pip install "pipecat-ai[google]"

Prerequisites

Google AI Setup

Before using Gemini Live services, you need:

Google Account: Set up at Google AI Studio
API Key: Generate a Gemini API key from AI Studio
Model Access: Ensure access to Gemini Live models
Multimodal Configuration: Set up audio, video, and text modalities

Required Environment Variables

GOOGLE_API_KEY: Your Google Gemini API key for authentication

Key Features

Multimodal Processing: Handle audio, video, and text inputs simultaneously
Real-time Streaming: Low-latency audio and video processing
Voice Activity Detection: Automatic speech detection and turn management
Function Calling: Advanced tool integration and API calling capabilities
Context Management: Intelligent conversation history and system instruction handling

Configuration

GeminiLiveLLMService

api_key

str

required

Google AI API key for authentication.

model

str

Gemini model identifier to use.

voice_id

str

default:"Charon"

TTS voice identifier for audio responses.

system_instruction

str

default:"None"

System prompt for the model. Can also be provided via the LLM context.

tools

List[dict] | ToolsSchema

default:"None"

Tools/functions available to the model. Can also be provided via the LLM context.

params

InputParams

default:"InputParams()"

Runtime-configurable generation and session settings. See InputParams below.

start_audio_paused

bool

default:"False"

Whether to start with audio input paused.

start_video_paused

bool

default:"False"

Whether to start with video input paused.

inference_on_context_initialization

bool

default:"True"

Whether to generate a response when context is first set. Set to False to wait for user input before the model responds.

http_options

HttpOptions

default:"None"

HTTP options for the Google API client. Use this to set API version (e.g. HttpOptions(api_version="v1alpha")) or other request options.

file_api_base_url

str

Base URL for the Gemini File API.

InputParams

Generation and session settings that can be set at initialization via the params constructor argument.

Parameter	Type	Default	Description
`frequency_penalty`	`float`	`None`	Frequency penalty for generation (0.0-2.0).
`max_tokens`	`int`	`4096`	Maximum tokens to generate.
`presence_penalty`	`float`	`None`	Presence penalty for generation (0.0-2.0).
`temperature`	`float`	`None`	Sampling temperature (0.0-2.0).
`top_k`	`int`	`None`	Top-k sampling parameter.
`top_p`	`float`	`None`	Top-p (nucleus) sampling parameter (0.0-1.0).
`modalities`	`GeminiModalities`	`AUDIO`	Response modality: `GeminiModalities.AUDIO` or `GeminiModalities.TEXT`.
`language`	`Language`	`EN_US`	Language for generation and transcription.
`media_resolution`	`GeminiMediaResolution`	`UNSPECIFIED`	Media resolution for video input: `UNSPECIFIED`, `LOW` (64 tokens), `MEDIUM` (256 tokens), or `HIGH` (256 tokens with zoom).
`vad`	`GeminiVADParams`	`None`	Voice activity detection parameters. See GeminiVADParams below.
`context_window_compression`	`ContextWindowCompressionParams`	`None`	Context window compression settings.
`thinking`	`ThinkingConfig`	`None`	Thinking/reasoning configuration. Requires a model that supports it.
`enable_affective_dialog`	`bool`	`None`	Enable affective dialog for expression and tone adaptation. Requires a supporting model and API version (e.g. `v1alpha`).
`proactivity`	`ProactivityConfig`	`None`	Proactivity settings for model behavior. Requires a supporting model and API version.
`extra`	`Dict[str, Any]`	`{}`	Additional parameters passed to the API.

GeminiVADParams

Voice activity detection configuration passed via InputParams.vad:

Parameter	Type	Default	Description
`disabled`	`bool`	`None`	Whether to disable server-side VAD entirely.
`start_sensitivity`	`StartSensitivity`	`None`	Sensitivity for speech start detection.
`end_sensitivity`	`EndSensitivity`	`None`	Sensitivity for speech end detection.
`prefix_padding_ms`	`int`	`None`	Padding before speech starts in milliseconds.
`silence_duration_ms`	`int`	`None`	Silence duration threshold in milliseconds to detect speech end.

ContextWindowCompressionParams

Parameter	Type	Default	Description
`enabled`	`bool`	`False`	Whether context window compression is enabled.
`trigger_tokens`	`int`	`None`	Token count to trigger compression. `None` uses the default (80% of context window).

Usage

Basic Setup

import os
from pipecat.services.google.gemini_live import GeminiLiveLLMService

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    voice_id="Charon",
    system_instruction="You are a helpful assistant.",
)

With Custom Parameters

from pipecat.services.google.gemini_live import (
    GeminiLiveLLMService,
    InputParams,
    GeminiVADParams,
    ContextWindowCompressionParams,
)
from pipecat.transcriptions.language import Language

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="models/gemini-2.5-flash-native-audio-preview-12-2025",
    voice_id="Puck",
    system_instruction="You are a helpful assistant.",
    params=InputParams(
        temperature=0.7,
        max_tokens=2048,
        language=Language.EN_US,
        vad=GeminiVADParams(
            silence_duration_ms=500,
        ),
        context_window_compression=ContextWindowCompressionParams(
            enabled=True,
        ),
    ),
)

Text-Only Mode

from pipecat.services.google.gemini_live import (
    GeminiLiveLLMService,
    InputParams,
    GeminiModalities,
)

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    system_instruction="You are a helpful assistant.",
    params=InputParams(
        modalities=GeminiModalities.TEXT,
    ),
)

With Thinking Enabled

from google.genai.types import ThinkingConfig

llm = GeminiLiveLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="models/gemini-2.5-flash-native-audio-preview-12-2025",
    system_instruction="You are a helpful assistant.",
    params=InputParams(
        thinking=ThinkingConfig(include_thoughts=True),
    ),
)

Notes

System instruction precedence: If a system instruction is provided both at init time and in the LLM context, the context-provided value takes precedence.
Tools precedence: Similarly, tools provided in the context override tools provided at init time.
Transcription aggregation: Gemini Live sends user transcriptions in small chunks. The service aggregates them into complete sentences using end-of-sentence detection with a 0.5-second timeout fallback.
Session resumption: The service automatically handles session resumption on reconnection using session resumption handles.
Connection resilience: The service will attempt up to 3 consecutive reconnections before treating a connection failure as fatal.
Video frame rate: Video frames are throttled to a maximum of one per second.
Affective dialog and proactivity: These features require both a supporting model and API version (v1alpha).

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Gemini Live API Reference

Example Implementation

Gemini Documentation

Gemini Live Model Card

Installation

Prerequisites

Google AI Setup

Required Environment Variables

Key Features

Configuration

GeminiLiveLLMService

InputParams

GeminiVADParams

ContextWindowCompressionParams

Usage

Basic Setup

With Custom Parameters

Text-Only Mode

With Thinking Enabled

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Gemini Live API Reference

Example Implementation

Gemini Documentation

Gemini Live Model Card

​Installation

​Prerequisites

​Google AI Setup

​Required Environment Variables

​Key Features

​Configuration

​GeminiLiveLLMService

​InputParams

​GeminiVADParams

​ContextWindowCompressionParams

​Usage

​Basic Setup

​With Custom Parameters

​Text-Only Mode

​With Thinking Enabled

​Notes

Overview

Installation

Prerequisites

Google AI Setup

Required Environment Variables

Key Features

Configuration

GeminiLiveLLMService

InputParams

GeminiVADParams

ContextWindowCompressionParams

Usage

Basic Setup

With Custom Parameters

Text-Only Mode

With Thinking Enabled

Notes