Skip to main content

Overview

HeyGenVideoService integrates with HeyGen LiveAvatar to create interactive AI-powered video avatars that respond naturally in real-time conversations. The service handles bidirectional audio/video streaming, avatar animations, voice activity detection, and conversation interruptions to deliver engaging conversational AI experiences with lifelike visual presence.

Installation

To use HeyGen services, install the required dependency:
pip install "pipecat-ai[heygen]"

Prerequisites

HeyGen Account Setup

Before using HeyGen video services, you need:
  1. HeyGen Account: Sign up at HeyGen Platform
  2. API Key: Generate an API key from your account dashboard
  3. Avatar Selection: Choose from available interactive avatars
  4. Streaming Setup: Configure real-time avatar streaming capabilities

Required Environment Variables

  • HEYGEN_LIVE_AVATAR_API_KEY: Your HeyGen LiveAvatar API key for authentication

Configuration

api_key
str
required
HeyGen API key for authentication.
session
aiohttp.ClientSession
required
HTTP client session for API requests.
session_request
LiveAvatarNewSessionRequest | NewSessionRequest
default:"None"
Configuration for the HeyGen session. When None, defaults to using the "Shawn_Therapist_public" avatar.
service_type
ServiceType
default:"None"
Service type for the avatar session.

Usage

Basic Setup

import aiohttp
from pipecat.services.heygen import HeyGenVideoService

async with aiohttp.ClientSession() as session:
    heygen = HeyGenVideoService(
        api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
        session=session,
    )

With Custom Session Request

from pipecat.services.heygen.api_liveavatar import LiveAvatarNewSessionRequest

heygen = HeyGenVideoService(
    api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
    session=session,
    session_request=LiveAvatarNewSessionRequest(
        avatar_id="your_avatar_id",
        version="v2",
    ),
)

Notes

  • Bidirectional streaming: The service manages both sending audio to HeyGen and receiving avatar video/audio back through WebRTC.
  • Interruption handling: When a user starts speaking, the service interrupts the avatar’s current speech, cancels ongoing audio tasks, and activates the avatar’s listening animation.
  • Metrics support: The service supports TTFB metrics tracking using TTSStartedFrame and BotStartedSpeakingFrame signals.