Smallest AI + Pipecat: The Voice Stack You've Been Waiting to Build On.

Smallest AI + Pipecat: The Voice Stack You've Been Waiting to Build On.

Smallest AI + Pipecat: The Voice Stack You've Been Waiting to Build On.

Real-time voice AI just got a whole lot faster. Lightning TTS and Pulse STT now plug directly into Pipecat and building a production-ready voice agent has never been this straightforward.

Harshita Jain

Updated on

Text to Speech Tools in 2026: Best TTS Options for Real-World Use.

We've been building toward this for a while.

Voice AI is having a moment, but most of the tooling around it is still fragmented. You stitch together a transcription service here, a TTS provider there, wire up your own interruption handling, manage your own WebSocket connections, and by the time you've got something working end-to-end, you've written more infrastructure than product.

That's the problem Pipecat was built to solve. And it's why we went deep on this integration.

We are proud to announce from this month onwards, Smallest AI is natively supported in Pipecat. Lightning TTS and Pulse STT are first-class services in the framework- meaning you get low-latency, real-time voice in and voice out, fully interruptible, with a single pip install and a handful of lines of code.

No custom wrappers. No glue code. No reinventing the wheel.

What is Pipecat?

If you haven't come across it yet, Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents.

What makes it different from just rolling your own async pipeline is its frame-based architecture. Every piece of data flowing through a Pipecat pipeline — audio chunks, transcriptions, LLM tokens, synthesized speech, is a typed frame. Services consume frames, process them, and emit new ones downstream. This gives you a clean, composable system where every component has a well-defined contract.

More importantly, Pipecat handles the genuinely hard parts of real-time voice: turn detection, interruption handling, VAD integration, transport management. The things that take weeks to get right when you're building from scratch- Pipecat has already figured them out.

What We Built Together

We worked with the Pipecat team to ship two native services:

SmallestSTTService — powered by Pulse, our real-time speech-to-text engine, delivering transcriptions at 64ms TTFT over a persistent WebSocket connection.

SmallestTTSService — powered by Lightning, our streaming TTS engine, synthesizing and streaming audio back to the user with minimal latency over WebSocket.

Both services are first-class citizens in the Pipecat ecosystem. They follow the same patterns as every other service in the framework, which means they compose cleanly with everything else - Daily for transport, OpenAI for the LLM, Silero for VAD, and anything else you want to bring into your pipeline.

How It Works

Install

The smallest extra installs both services in one shot:

bash

pip install "pipecat-ai[smallest]"
pip install "pipecat-ai[smallest]"
pip install "pipecat-ai[smallest]"

For a full voice agent with transport, LLM, and VAD:

bash

pip install "pipecat-ai[smallest,daily,openai,silero,runner]"
pip install "pipecat-ai[smallest,daily,openai,silero,runner]"
pip install "pipecat-ai[smallest,daily,openai,silero,runner]"

Set up your environment

SMALLEST_API_KEY=...
DAILY_API_KEY=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...
DAILY_API_KEY=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...
DAILY_API_KEY=...
OPENAI_API_KEY=...

Wire up STT

Pulse connects to our real-time WebSocket endpoint and streams audio frames from the pipeline, returning transcriptions as fast as you speak:

python

from pipecat.services.smallest.stt import SmallestSTTService
from pipecat.transcriptions.language import Language

stt = SmallestSTTService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestSTTService.Settings(
        language=Language.EN,),)
from pipecat.services.smallest.stt import SmallestSTTService
from pipecat.transcriptions.language import Language

stt = SmallestSTTService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestSTTService.Settings(
        language=Language.EN,),)
from pipecat.services.smallest.stt import SmallestSTTService
from pipecat.transcriptions.language import Language

stt = SmallestSTTService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestSTTService.Settings(
        language=Language.EN,),)

Wire up TTS

Lightning streams synthesized audio back over WebSocket the moment tokens start arriving from the LLM:

python


from pipecat.services.smallest.tts import SmallestTTSService

tts = SmallestTTSService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestTTSService.Settings(
        voice="sophia",),)
from pipecat.services.smallest.tts import SmallestTTSService

tts = SmallestTTSService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestTTSService.Settings(
        voice="sophia",),)
from pipecat.services.smallest.tts import SmallestTTSService

tts = SmallestTTSService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestTTSService.Settings(
        voice="sophia",),)

That's the whole setup. Everything else, interruption handling, VAD, turn management is already taken care of by Pipecat.


The Part Everyone Gets Wrong: Interruptions

Here's a scenario you've probably lived through.

You build a voice bot. The demo goes well — you ask it a question, it answers, everything flows nicely. Then someone on the team tries to cut it off mid-sentence. The bot keeps talking. They try again. Still talking. By the time it finally stops, it's already three sentences deep into an answer nobody wanted, and the whole thing feels like shouting at an elevator.

This isn't a niche edge case. It's the thing that separates a voice bot from a voice conversation. And it's genuinely difficult to get right — not because the concept is complicated, but because real-time audio is unforgiving. You're working in milliseconds. Any lag between the user starting to speak and the pipeline responding to that is felt immediately. There's no hiding it.

Pipecat handles this at the framework level. Silero VAD runs continuously alongside your pipeline, and the moment it detects voice activity, an interruption event propagates through the system. Audio stops. The LLM gets re-engaged. The user experience stays intact. You don't configure any of this — it's just how Pipecat works.


Why We Built This

In 2026, building voice agents is still way harder than it should be. 

Not the AI part, that's largely a solved problem. The hard part is everything around it. The milliseconds you lose at every integration boundary. The interruption logic that works in staging and breaks in production. The WebSocket connection that drops at 2am and takes your whole pipeline with it. The three days you spend debugging audio buffering issues that have nothing to do with what you're actually trying to build.

This integration exists because we got tired of watching good teams lose weeks to infrastructure that should be commodity. Pulse and Lightning are fast- 64ms TTFT isn't a marketing number, it's what real-time conversation actually requires. And Pipecat is the right frame to run them in, because it was built by people who understand that the hard problems in voice aren't the models, they're the plumbing.

Put them together and you get a stack where the latency is genuinely low, the interruptions genuinely work, and you can go from zero to a talking bot in an afternoon. That's the whole point.


A Complete Voice Agent, End to End

Here's what the full stack looks like:

  • Smallest AI Pulse STT — transcription at 64ms TTFT

  • OpenAI — LLM for the conversation

  • Smallest AI Lightning TTS — streaming audio synthesis

  • Silero VAD — voice activity detection and interruption handling

  • Daily — WebRTC transport for audio rooms

The full working example lives in the Pipecat repository:


bash


git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat/examples/voice
python voice-smallest.py -t daily
git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat/examples/voice
python voice-smallest.py -t daily
git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat/examples/voice
python voice-smallest.py -t daily

Open http://localhost:7860. The runner spins up a Daily room automatically. You're in.

Prefer to skip the web server for quick testing:

bash


python voice-smallest.py -d
python voice-smallest.py -d
python voice-smallest.py -d

Room URL prints to your terminal. Open it and go.


Get Started

Docsdocs.smallest.ai

GitHubpipecat-ai/pipecat — voice-smallest.py

Questions? Find us on Discord or open an issue in the Pipecat repo.

Smallest AI builds the fastest speech infrastructure on the internet. Lightning TTS and Pulse STT are available via API today.


Automate your Contact Centers with Us

Experience fast latency, strong security, and unlimited speech generation.

Automate Now

No headings found on page

Build Voice AI with Pipecat Today

Built for Developers

Learn More