Skip to main content

Overview

FalSTTService provides speech-to-text capabilities using Fal’s Wizper API with Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time for efficient transcription.

Installation

To use Fal services, install the required dependency:
pip install "pipecat-ai[fal]"

Prerequisites

Fal Account Setup

Before using Fal STT services, you need:
  1. Fal Account: Sign up at Fal Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the Wizper transcription model

Required Environment Variables

  • FAL_KEY: Your Fal API key for authentication

Configuration

FalSTTService

api_key
str
default:"None"
Fal API key. If not provided, uses FAL_KEY environment variable.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"None"
Configuration parameters for the Wizper API. See InputParams below.
ttfs_p99_latency
float
default:"FAL_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

InputParams

Parameters passed via the params constructor argument.
ParameterTypeDefaultDescription
languageLanguageLanguage.ENLanguage of the audio input.
taskstr"transcribe"Task to perform: "transcribe" or "translate".
chunk_levelstr"segment"Level of chunking for the response.
versionstr"3"Version of the Wizper model to use.

Usage

Basic Setup

from pipecat.services.fal import FalSTTService

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
)

With Custom Parameters

from pipecat.services.fal import FalSTTService
from pipecat.transcriptions.language import Language

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
    params=FalSTTService.InputParams(
        language=Language.ES,
        task="transcribe",
        version="3",
    ),
)

Translation Mode

stt = FalSTTService(
    api_key=os.getenv("FAL_KEY"),
    params=FalSTTService.InputParams(
        language=Language.FR,
        task="translate",  # Translates to English
    ),
)

Notes

  • Segmented processing: FalSTTService inherits from SegmentedSTTService, which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
  • Translation support: Set task="translate" to translate audio into English, regardless of the input language.
  • Wizper versions: The version parameter selects the underlying Whisper model version. Version "3" is the default and recommended for best accuracy.