Skip to main content

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

Installation

To use Google Gemini services, install the required dependencies:
pip install "pipecat-ai[google]"

Prerequisites

Google Gemini Setup

Before using Google Gemini LLM services, you need:
  1. Google Account: Sign up at Google AI Studio
  2. API Key: Generate a Gemini API key from AI Studio
  3. Model Selection: Choose from available Gemini models (Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)

Required Environment Variables

  • GOOGLE_API_KEY: Your Google Gemini API key for authentication

Configuration

api_key
str
required
Google AI API key for authentication.
model
str
default:"gemini-2.5-flash"
Gemini model name to use (e.g., "gemini-2.5-flash", "gemini-2.5-pro").
params
InputParams
default:"None"
Runtime-configurable model settings. See InputParams below.
system_instruction
str
default:"None"
System instruction/prompt for the model. Sets the overall behavior and context.
tools
List[Dict[str, Any]]
default:"None"
List of available tools/functions for the model to call.
tool_config
Dict[str, Any]
default:"None"
Configuration for tool usage behavior.
http_options
HttpOptions
default:"None"
HTTP options for the Google API client.

InputParams

Model inference settings that can be set at initialization via the params constructor argument, or changed at runtime via UpdateSettingsFrame.
ParameterTypeDefaultDescription
max_tokensint4096Maximum number of tokens to generate. Must be at least 1.
temperaturefloatNoneSampling temperature (0.0 to 2.0). Lower values are more focused, higher values are more creative.
top_kintNoneTop-k sampling parameter. Limits tokens to the top k most likely.
top_pfloatNoneTop-p (nucleus) sampling (0.0 to 1.0). Controls diversity of output.
thinkingThinkingConfigNoneThinking configuration. See ThinkingConfig below.
extradict{}Additional parameters passed directly to the API.
None values are omitted from the API request, letting the Gemini API use its own defaults. If thinking is not provided, Pipecat disables thinking for Gemini 2.5 Flash models (where possible) to reduce latency.

ThinkingConfig

Configuration for controlling the model’s internal thinking process. Gemini 2.5 and 3 series models support this feature.
ParameterTypeDefaultDescription
thinking_budgetintNoneToken budget for thinking (Gemini 2.5 series). -1 for dynamic, 0 to disable, or a specific count (e.g., 128-32768).
thinking_levelstrNoneThinking level for Gemini 3 models. "low", "high" for 3 Pro; "minimal", "low", "medium", "high" for 3 Flash.
include_thoughtsboolNoneWhether to include thought summaries in the response.
Gemini 2.5 series models use thinking_budget, while Gemini 3 models use thinking_level. Do not mix these parameters across model generations.

Usage

Basic Setup

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-flash",
)

With Custom Parameters

from pipecat.services.google import GoogleLLMService

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-pro",
    system_instruction="You are a helpful assistant.",
    params=GoogleLLMService.InputParams(
        temperature=0.7,
        max_tokens=2048,
        top_p=0.9,
    ),
)

With Thinking Configuration

# Gemini 2.5 series (using thinking_budget)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.5-pro",
    params=GoogleLLMService.InputParams(
        max_tokens=8192,
        thinking=GoogleLLMService.ThinkingConfig(
            thinking_budget=4096,
            include_thoughts=True,
        ),
    ),
)

# Gemini 3 series (using thinking_level)
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-3-flash",
    params=GoogleLLMService.InputParams(
        max_tokens=8192,
        thinking=GoogleLLMService.ThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)

Updating Settings at Runtime

Model settings can be changed mid-conversation using UpdateSettingsFrame:
from pipecat.frames.frames import UpdateSettingsFrame

await task.queue_frame(
    UpdateSettingsFrame(
        settings={
            "llm": {
                "temperature": 0.3,
                "max_tokens": 1024,
            }
        }
    )
)

Notes

  • Thinking defaults: By default, Pipecat disables thinking for Gemini 2.5 Flash models to reduce latency. To enable it, explicitly pass a ThinkingConfig via params.
  • Multimodal support: Gemini models natively support image and audio inputs through Google’s Content/Part format. Images and audio are automatically converted from OpenAI-style contexts.
  • Grounding with Google Search: When grounding metadata is present in the response (e.g., from Google Search tool), the service emits LLMSearchResponseFrame with search results and source attributions.
  • Context format: The service automatically converts between OpenAI-style message formats and Google’s native Content/Part format, so you can use either.

Event Handlers

GoogleLLMService supports the following event handlers, inherited from LLMService:
EventDescription
on_completion_timeoutCalled when an LLM completion request times out (Google DeadlineExceeded)
on_function_calls_startedCalled when function calls are received and execution is about to start
@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    print("LLM completion timed out")