Smallest AI is now a native LiveKit plugin. Add Pulse STT and Lightning TTS to your LiveKit voice agent pipeline -64ms transcription, ~100ms synthesis, fully interruptible, one install

Prithvi Bharadwaj
Updated on

The livekit-plugins-smallestai package is now live. Pulse STT and Lightning TTS are first-class plugins in LiveKit Agents, bringing real-time speech-to-text and ultra-low-latency synthesis into your voice pipeline without custom adapters or integration overhead.
One package. Both services. Production-ready from day one.
What is LiveKit Agents?
LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC. If you've looked at the voice agent landscape and found most options either too opinionated or too low-level, LiveKit sits in a genuinely useful middle ground — it gives you the building blocks of a real-time voice pipeline without dictating what every piece has to be.
Its plugin architecture is what makes it worth building on. STT, TTS, LLM, VAD — each is a swappable component. You pick the services you want, wire them into an AgentSession, and the framework handles the rest. Turns, interruptions, audio transport over WebRTC — all taken care of.
What We Shipped
The livekit-plugins-smallestai package is live on PyPI. It gives you two services:
smallestai.STT — powered by Pulse, our real-time speech-to-text engine. Streams over WebSocket with ~64ms TTFT, supports 39 languages with automatic detection, word-level timestamps, and speaker diarization.
smallestai.TTS — powered by Lightning v3.1, our ultra-low-latency TTS engine. ~100ms latency, 80+ voices, and output in whatever format your pipeline needs.
Both plug directly into LiveKit's plugin system. They behave exactly like any other LiveKit plugin — which means every part of the ecosystem that works with LiveKit works with these too.
How It Works
Install
bash
Set up your environment
STT- Pulse, streaming over WebSocket
python
Pulse connects to our WebSocket endpoint for real-time streaming, with full support for interim and final transcripts. Word-level timestamps and confidence scores come included.
TTS- Lightning, sentence-streamed
python
One thing worth knowing: Lightning synthesizes audio per request rather than token-by-token, so wrapping it in LiveKit's StreamAdapter with a SentenceTokenizer is the right move. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk — keeping first-audio latency low without waiting for the entire LLM response to complete before a single word plays.
A Full Agent in One File
Here's a complete, production-ready voice agent using Smallest AI for both STT and TTS:
python
Run it:
bash
Open the LiveKit Agents Playground, drop in your LiveKit credentials, and your agent greets you immediately on session start.
Here's a scenario you've probably lived through.
You build a voice bot. The demo goes well — you ask it a question, it answers, everything flows nicely. Then someone tries to cut it off mid-sentence. The bot keeps talking. They try again. Still going. By the time it finally stops, it's already three sentences deep into an answer nobody wanted, and the whole thing feels like shouting at an elevator.
This isn't a niche edge case. It's the thing that separates a voice bot from a voice conversation. And it's genuinely difficult to get right — not because the concept is complicated, but because real-time audio is unforgiving. Any lag between the user starting to speak and the pipeline responding to that is felt immediately. There's no hiding it.
LiveKit handles this at the framework level using Silero VAD — continuous voice activity detection that runs alongside your pipeline and fires an interruption event the moment it detects speech. Audio halts immediately. The agent re-engages. The conversation doesn't skip a beat.
One detail worth keeping in mind: leave eou_timeout_ms at its default of 0. This disables server-side end-of-utterance detection on the Pulse side and lets LiveKit's own turn detection handle timing. Stack both and you're adding latency you don't need at the end of every turn.
Why This Integration Exists
The models powering voice AI today are genuinely capable. The gap between what's possible and what teams actually ship isn't the AI — it's the infrastructure around it. Latency accumulates at every integration boundary. Interruption handling that works in staging breaks in production. WebSocket connections drop. Audio buffers. Weeks disappear into problems that have nothing to do with the product being built.
Pulse delivers transcriptions at 64ms TTFT. Lightning synthesizes and streams audio in ~100ms. Those aren't benchmark numbers — they're the latency thresholds that make voice conversation feel natural rather than mechanical. LiveKit provides the WebRTC transport and turn-detection infrastructure to run them reliably in production.
This integration exists to close that gap. The plumbing is handled. Ship the product.
Get Started
Docs → docs.smallest.ai
Full example → Smallest AI × LiveKit cookbook
PyPI → livekit-plugins-smallestai
Questions? Find us on Discord or open an issue in the cookbook repo.
Smallest AI builds the fastest speech infrastructure on the internet. Lightning TTS and Pulse STT are available via API today.

Automate your Contact Centers with Us
Experience fast latency, strong security, and unlimited speech generation.
Automate Now

