Whisper

Overview

WhisperSTTService provides offline speech recognition using OpenAI’s Whisper models running locally. Supports multiple model sizes and hardware acceleration options including CPU, CUDA, and Apple Silicon (MLX) for privacy-focused transcription without external API calls.

Whisper STT API Reference

Pipecat’s API methods for Whisper STT integration

Standard Whisper Example

Complete example with standard Whisper

Whisper Documentation

OpenAI’s Whisper research paper and model details

MLX Whisper Example

Apple Silicon optimized example

Installation

Choose your installation based on your hardware:

Standard Whisper (CPU/CUDA)

pip install "pipecat-ai[whisper]"

MLX Whisper (Apple Silicon)

pip install "pipecat-ai[mlx-whisper]"

Prerequisites

Local Model Setup

Before using Whisper STT services, you need:

Model Selection: Choose appropriate Whisper model size (tiny, base, small, medium, large)
Hardware Configuration: Set up CPU, CUDA, or Apple Silicon acceleration
Storage Space: Ensure sufficient disk space for model downloads

Configuration Options

Model Size: Balance between accuracy and performance based on your hardware
Hardware Acceleration: Configure CUDA for NVIDIA GPUs or MLX for Apple Silicon
Language Support: Whisper supports 99+ languages out of the box

No API keys required - Whisper runs entirely locally for complete privacy.

Configuration

WhisperSTTService

Uses Faster Whisper for efficient local transcription on CPU or CUDA devices.

model

str | Model

default:"Model.DISTIL_MEDIUM_EN"

Whisper model to use. Can be a Model enum value or a string. Available models: TINY, BASE, SMALL, MEDIUM, LARGE (large-v3), LARGE_V3_TURBO, DISTIL_LARGE_V2, DISTIL_MEDIUM_EN (English-only).

device

str

default:"auto"

Device for inference. Options: "cpu", "cuda", or "auto" (auto-detect).

compute_type

str

default:"default"

Compute type for inference. Options include "default", "int8", "int8_float16", "float16", etc.

no_speech_prob

float

default:"0.4"

Probability threshold for filtering out non-speech segments. Segments with a no-speech probability above this value are excluded.

language

Language

default:"Language.EN"

Default language for transcription.

WhisperSTTServiceMLX

Optimized for Apple Silicon using MLX Whisper. Models are loaded on demand.

model

str | MLXModel

default:"MLXModel.TINY"

MLX Whisper model to use. Can be an MLXModel enum value or a string. Available models: TINY, MEDIUM, LARGE_V3, LARGE_V3_TURBO, DISTIL_LARGE_V3, LARGE_V3_TURBO_Q4 (quantized).

no_speech_prob

float

default:"0.6"

Probability threshold for filtering out non-speech segments.

language

Language

default:"Language.EN"

Default language for transcription.

temperature

float

default:"0.0"

Sampling temperature. Lower values produce more deterministic results.

Usage

Basic Faster Whisper Setup

from pipecat.services.whisper import WhisperSTTService

stt = WhisperSTTService(
    model="base",
)

With CUDA Acceleration

from pipecat.services.whisper import WhisperSTTService, Model

stt = WhisperSTTService(
    model=Model.LARGE,
    device="cuda",
    compute_type="float16",
)

With Custom Language

from pipecat.services.whisper import WhisperSTTService, Model
from pipecat.transcriptions.language import Language

stt = WhisperSTTService(
    model=Model.MEDIUM,
    language=Language.FR,
    no_speech_prob=0.5,
)

MLX Whisper on Apple Silicon

from pipecat.services.whisper import WhisperSTTServiceMLX, MLXModel
from pipecat.transcriptions.language import Language

stt = WhisperSTTServiceMLX(
    model=MLXModel.LARGE_V3_TURBO,
    language=Language.EN,
    temperature=0.0,
)

Notes

First run downloads: If the selected model hasn’t been downloaded previously, the first run will download it from the Hugging Face model hub. This may take significant time depending on model size.
Segmented transcription: Both WhisperSTTService and WhisperSTTServiceMLX extend SegmentedSTTService, meaning they process complete audio segments after VAD detects the user has stopped speaking.
No-speech filtering: The no_speech_prob threshold helps filter out hallucinations. Increase it to be more permissive, decrease it to filter more aggressively.
MLX quantization: The LARGE_V3_TURBO_Q4 model provides reduced memory usage with minimal quality loss on Apple Silicon.
Language support: Whisper supports 99+ languages. Use the Language enum for type-safe language selection.

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Whisper STT API Reference

Standard Whisper Example

Whisper Documentation

MLX Whisper Example

Installation

Standard Whisper (CPU/CUDA)

MLX Whisper (Apple Silicon)

Prerequisites

Local Model Setup

Configuration Options

Configuration

WhisperSTTService

WhisperSTTServiceMLX

Usage

Basic Faster Whisper Setup

With CUDA Acceleration

With Custom Language

MLX Whisper on Apple Silicon

Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Whisper STT API Reference

Standard Whisper Example

Whisper Documentation

MLX Whisper Example

​Installation

​Standard Whisper (CPU/CUDA)

​MLX Whisper (Apple Silicon)

​Prerequisites

​Local Model Setup

​Configuration Options

​Configuration

​WhisperSTTService

​WhisperSTTServiceMLX

​Usage

​Basic Faster Whisper Setup

​With CUDA Acceleration

​With Custom Language

​MLX Whisper on Apple Silicon

​Notes

Overview

Installation

Standard Whisper (CPU/CUDA)

MLX Whisper (Apple Silicon)

Prerequisites

Local Model Setup

Configuration Options

Configuration

WhisperSTTService

WhisperSTTServiceMLX

Usage

Basic Faster Whisper Setup

With CUDA Acceleration

With Custom Language

MLX Whisper on Apple Silicon

Notes