Overview
GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.
Gemini LLM API Reference
Pipecat’s API methods for Google Gemini integration
Example Implementation
Complete example with function calling
Gemini Documentation
Official Google Gemini API documentation and features
Google AI Studio
Access Gemini models and manage API keys
Installation
To use Google Gemini services, install the required dependencies:Prerequisites
Google Gemini Setup
Before using Google Gemini LLM services, you need:- Google Account: Sign up at Google AI Studio
- API Key: Generate a Gemini API key from AI Studio
- Model Selection: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)
Required Environment Variables
GOOGLE_API_KEY: Your Google Gemini API key for authentication
Configuration
Google AI API key for authentication.
Gemini model name to use (e.g.,
"gemini-2.5-flash", "gemini-2.5-pro").Runtime-configurable model settings. See InputParams below.
System instruction/prompt for the model. Sets the overall behavior and context.
List of available tools/functions for the model to call.
Configuration for tool usage behavior.
HTTP options for the Google API client.
InputParams
Model inference settings that can be set at initialization via theparams constructor argument, or changed at runtime via UpdateSettingsFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_tokens | int | 4096 | Maximum number of tokens to generate. Must be at least 1. |
temperature | float | None | Sampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative. |
top_k | int | None | Top-k sampling parameter. Limits tokens to the top k most likely. |
top_p | float | None | Top-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output. |
thinking | ThinkingConfig | None | Thinking configuration. See ThinkingConfig below. |
extra | dict | {} | Additional parameters passed directly to the API. |
None values are omitted from the API request, letting the Gemini API use its own defaults. If thinking is not provided, Pipecat disables thinking for Gemini 2.5 Flash models (where possible) to reduce latency.ThinkingConfig
Configuration for controlling the model’s internal thinking process. Gemini 2.5 and 3 series models support this feature.| Parameter | Type | Default | Description |
|---|---|---|---|
thinking_budget | int | None | Token budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768). |
thinking_level | str | None | Thinking level for Gemini 3 models. "low", "high" for 3 Pro; "minimal", "low", "medium", "high" for 3 Flash. |
include_thoughts | bool | None | Whether to include thought summaries in the response. |
Gemini 2.5 series models use
thinking_budget, while Gemini 3 models use thinking_level. Do not mix these parameters across model generations.Usage
Basic Setup
With Custom Parameters
With Thinking Configuration
Updating Settings at Runtime
Model settings can be changed mid-conversation usingUpdateSettingsFrame:
Notes
- Thinking defaults: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a
ThinkingConfigviaparams. - Multimodal support: Gemini models natively support image and audio inputs through Google’s Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
- Grounding with Google Search: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits
LLMSearchResponseFramewith search results and source attributions. - Context format: The service automatically converts between OpenAI-style message formats and Google’s native Content/Part format, so you can use either.
Event Handlers
GoogleLLMService supports the following event handlers, inherited from LLMService:
| Event | Description |
|---|---|
on_completion_timeout | Called when an LLM completion request times out (Google DeadlineExceeded) |
on_function_calls_started | Called when function calls are received and execution is about to start |