Google Gemini

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

Gemini LLM API Reference

Pipecat’s API methods for Google Gemini integration

Example Implementation

Complete example with function calling

Gemini Documentation

Official Google Gemini API documentation and features

Google AI Studio

Access Gemini models and manage API keys

Installation

To use Google Gemini services, install the required dependencies:

pip install "pipecat-ai[google]"

Prerequisites

Google Gemini Setup

Before using Google Gemini LLM services, you need:

Google Account: Sign up at Google AI Studio
API Key: Generate a Gemini API key from AI Studio
Model Selection: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)

Required Environment Variables

GOOGLE_API_KEY: Your Google Gemini API key for authentication

Configuration

api_key

str

required

Google AI API key for authentication.

model

str

default:"gemini-2.5-flash"

Gemini model name to use (e.g., "gemini-2.5-flash", "gemini-2.5-pro").

params

InputParams

default:"None"

Runtime-configurable model settings. See InputParams below.

system_instruction

str

default:"None"

System instruction/prompt for the model. Sets the overall behavior and context.

tools

List[Dict[str, Any]]

default:"None"

List of available tools/functions for the model to call.

tool_config

Dict[str, Any]

default:"None"

Configuration for tool usage behavior.

http_options

HttpOptions

default:"None"

HTTP options for the Google API client.

InputParams

Model inference settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.

Parameter	Type	Default	Description
`max_tokens`	`int`	`4096`	Maximum number of tokens to generate. Must be at least 1.
`temperature`	`float`	`None`	Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
`top_k`	`int`	`None`	Top-k sampling parameter. Limits tokens to the top k most likely.
`top_p`	`float`	`None`	Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
`thinking`	`ThinkingConfig`	`None`	Thinking configuration. See ThinkingConfig below.
`extra`	`dict`	`{}`	Additional parameters passed directly to the API.

None values are omitted from the API request, letting the Gemini API use its own defaults. If thinking is not provided, Pipecat disables thinking for Gemini 2.5 Flash models (where possible) to reduce latency.

ThinkingConfig

Configuration for controlling the model’s internal thinking process. Gemini 2.5 and 3 series models support this feature.

Parameter	Type	Default	Description
`thinking_budget`	`int`	`None`	Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768).
`thinking_level`	`str`	`None`	Thinking level for Gemini 3 models. `"low"`, `"high"` for 3 Pro; `"minimal"`, `"low"`, `"medium"`, `"high"` for 3 Flash.
`include_thoughts`	`bool`	`None`	Whether to include thought summaries in the response.

Gemini 2.5 series models use thinking_budget, while Gemini 3 models use thinking_level. Do not mix these parameters across model generations.

Usage

Basic Setup

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-flash",
)

With Custom Parameters

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-pro",
    system_instruction="You are a helpful assistant.",
    params=GoogleLLMService.InputParams(
        temperature=0.7,
        max_tokens=2048,
        top_p=0.9,
    ),
)

With Thinking Configuration

# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-pro",
    params=GoogleLLMService.InputParams(
        max_tokens=8192,
        thinking=GoogleLLMService.ThinkingConfig(
            thinking_budget=4096,
            include_thoughts=True,
        ),
    ),
)

# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-3-flash",
    params=GoogleLLMService.InputParams(
        max_tokens=8192,
        thinking=GoogleLLMService.ThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using UpdateSettingsFrame:

from pipecat.frames.frames import UpdateSettingsFrame

await task.queue_frame(
    UpdateSettingsFrame(
        settings={
            "llm": {
                "temperature": 0.3,
                "max_tokens": 1024,
            }
        }
    )
)

Notes

Thinking defaults: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a ThinkingConfig via params.
Multimodal support: Gemini models natively support image and audio inputs through Google’s Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
Grounding with Google Search: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits LLMSearchResponseFrame with search results and source attributions.
Context format: The service automatically converts between OpenAI-style message formats and Google’s native Content/Part format, so you can use either.

Event Handlers

GoogleLLMService supports the following event handlers, inherited from LLMService:

Event	Description
`on_completion_timeout`	Called when an LLM completion request times out (Google `DeadlineExceeded`)
`on_function_calls_started`	Called when function calls are received and execution is about to start

@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Gemini LLM API Reference

Example Implementation

Gemini Documentation

Google AI Studio

Installation

Prerequisites

Google Gemini Setup

Required Environment Variables

Configuration

InputParams

ThinkingConfig

Usage

Basic Setup

With Custom Parameters

With Thinking Configuration

Updating Settings at Runtime

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Gemini LLM API Reference

Example Implementation

Gemini Documentation

Google AI Studio

​Installation

​Prerequisites

​Google Gemini Setup

​Required Environment Variables

​Configuration

​InputParams

​ThinkingConfig

​Usage

​Basic Setup

​With Custom Parameters

​With Thinking Configuration

​Updating Settings at Runtime

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Google Gemini Setup

Required Environment Variables

Configuration

InputParams

ThinkingConfig

Usage

Basic Setup

With Custom Parameters

With Thinking Configuration

Updating Settings at Runtime

Notes

Event Handlers