Agents

Models

Resources

Pricing

Contact Sales

May 20, 2026

Smallest AI Joins LiveKit's Plugin Ecosystem- Real-Time Voice AI with Pulse STT and Lightning TTS

Prithvi Bharadwaj

Book a demo

Start building

Smallest AI is now a native LiveKit plugin. Add Pulse STT and Lightning TTS to your LiveKit voice agent pipeline -64ms transcription, ~100ms synthesis, fully interruptible, one install

The livekit-plugins-smallestai package is now live. Pulse STT and Lightning TTS are first-class plugins in LiveKit Agents, bringing real-time speech-to-text and ultra-low-latency synthesis into your voice pipeline without custom adapters or integration overhead.

One package. Both services. Production-ready from day one.

What is LiveKit Agents?

LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC. If you've looked at the voice agent landscape and found most options either too opinionated or too low-level, LiveKit sits in a genuinely useful middle ground — it gives you the building blocks of a real-time voice pipeline without dictating what every piece has to be.

Its plugin architecture is what makes it worth building on. STT, TTS, LLM, VAD — each is a swappable component. You pick the services you want, wire them into an AgentSession, and the framework handles the rest. Turns, interruptions, audio transport over WebRTC — all taken care of.

What We Shipped

The livekit-plugins-smallestai package is live on PyPI. It gives you two services:

smallestai.STT — powered by Pulse, our real-time speech-to-text engine. Streams over WebSocket with ~64ms TTFT, supports 39 languages with automatic detection, word-level timestamps, and speaker diarization.

smallestai.TTS — powered by Lightning v3.1, our ultra-low-latency TTS engine. ~100ms latency, 80+ voices, and output in whatever format your pipeline needs.

Both plug directly into LiveKit's plugin system. They behave exactly like any other LiveKit plugin — which means every part of the ecosystem that works with LiveKit works with these too.

How It Works

Install

bash

Set up your environment

LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...

LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...

LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...

STT- Pulse, streaming over WebSocket

python

from livekit.plugins import smallestai

# English, streaming
stt = smallestai.STT(language="en")

# Automatic detection across 39 languages
stt = smallestai.STT(language="multi")

# With speaker diarization
stt = smallestai.STT(language="en", diarize=True)

from livekit.plugins import smallestai

# English, streaming
stt = smallestai.STT(language="en")

# Automatic detection across 39 languages
stt = smallestai.STT(language="multi")

# With speaker diarization
stt = smallestai.STT(language="en", diarize=True)

from livekit.plugins import smallestai

# English, streaming
stt = smallestai.STT(language="en")

# Automatic detection across 39 languages
stt = smallestai.STT(language="multi")

# With speaker diarization
stt = smallestai.STT(language="en", diarize=True)

Pulse connects to our WebSocket endpoint for real-time streaming, with full support for interim and final transcripts. Word-level timestamps and confidence scores come included.

TTS- Lightning, sentence-streamed

python

from livekit.agents import tts, tokenize
from livekit.plugins import smallestai

smallest_tts = tts.StreamAdapter(
    tts=smallestai.TTS(
        model="lightning-v3.1",
        voice_id="sophia",
        language="en",
        speed=1.0,
    ),
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
)

from livekit.agents import tts, tokenize
from livekit.plugins import smallestai

smallest_tts = tts.StreamAdapter(
    tts=smallestai.TTS(
        model="lightning-v3.1",
        voice_id="sophia",
        language="en",
        speed=1.0,
    ),
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
)

from livekit.agents import tts, tokenize
from livekit.plugins import smallestai

smallest_tts = tts.StreamAdapter(
    tts=smallestai.TTS(
        model="lightning-v3.1",
        voice_id="sophia",
        language="en",
        speed=1.0,
    ),
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
)

One thing worth knowing: Lightning synthesizes audio per request rather than token-by-token, so wrapping it in LiveKit's StreamAdapter with a SentenceTokenizer is the right move. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk — keeping first-audio latency low without waiting for the entire LLM response to complete before a single word plays.

A Full Agent in One File

Here's a complete, production-ready voice agent using Smallest AI for both STT and TTS:

python

import logging
from dotenv import load_dotenv
from livekit.agents import (
    Agent, AgentSession, JobContext, JobProcess,
    RoomInputOptions, RoomOutputOptions, WorkerOptions, cli, tts, tokenize,
)
from livekit.plugins import openai, silero, smallestai

logger = logging.getLogger("voice-agent")
load_dotenv()

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice assistant built by Smallest AI.",
        )

    async def on_enter(self):
        self.session.generate_reply()

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=smallestai.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts.StreamAdapter(
            tts=smallestai.TTS(),
            sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
        ),
    )
    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

import logging
from dotenv import load_dotenv
from livekit.agents import (
    Agent, AgentSession, JobContext, JobProcess,
    RoomInputOptions, RoomOutputOptions, WorkerOptions, cli, tts, tokenize,
)
from livekit.plugins import openai, silero, smallestai

logger = logging.getLogger("voice-agent")
load_dotenv()

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice assistant built by Smallest AI.",
        )

    async def on_enter(self):
        self.session.generate_reply()

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=smallestai.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts.StreamAdapter(
            tts=smallestai.TTS(),
            sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
        ),
    )
    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

import logging
from dotenv import load_dotenv
from livekit.agents import (
    Agent, AgentSession, JobContext, JobProcess,
    RoomInputOptions, RoomOutputOptions, WorkerOptions, cli, tts, tokenize,
)
from livekit.plugins import openai, silero, smallestai

logger = logging.getLogger("voice-agent")
load_dotenv()

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice assistant built by Smallest AI.",
        )

    async def on_enter(self):
        self.session.generate_reply()

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=smallestai.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts.StreamAdapter(
            tts=smallestai.TTS(),
            sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
        ),
    )
    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

Run it:

bash

Open the LiveKit Agents Playground, drop in your LiveKit credentials, and your agent greets you immediately on session start.

Here's a scenario you've probably lived through.

You build a voice bot. The demo goes well — you ask it a question, it answers, everything flows nicely. Then someone tries to cut it off mid-sentence. The bot keeps talking. They try again. Still going. By the time it finally stops, it's already three sentences deep into an answer nobody wanted, and the whole thing feels like shouting at an elevator.

This isn't a niche edge case. It's the thing that separates a voice bot from a voice conversation. And it's genuinely difficult to get right — not because the concept is complicated, but because real-time audio is unforgiving. Any lag between the user starting to speak and the pipeline responding to that is felt immediately. There's no hiding it.

LiveKit handles this at the framework level using Silero VAD — continuous voice activity detection that runs alongside your pipeline and fires an interruption event the moment it detects speech. Audio halts immediately. The agent re-engages. The conversation doesn't skip a beat.

One detail worth keeping in mind: leave eou_timeout_ms at its default of 0. This disables server-side end-of-utterance detection on the Pulse side and lets LiveKit's own turn detection handle timing. Stack both and you're adding latency you don't need at the end of every turn.

Why This Integration Exists

The models powering voice AI today are genuinely capable. The gap between what's possible and what teams actually ship isn't the AI — it's the infrastructure around it. Latency accumulates at every integration boundary. Interruption handling that works in staging breaks in production. WebSocket connections drop. Audio buffers. Weeks disappear into problems that have nothing to do with the product being built.

Pulse delivers transcriptions at 64ms TTFT. Lightning synthesizes and streams audio in ~100ms. Those aren't benchmark numbers — they're the latency thresholds that make voice conversation feel natural rather than mechanical. LiveKit provides the WebRTC transport and turn-detection infrastructure to run them reliably in production.

This integration exists to close that gap. The plumbing is handled. Ship the product.

Get Started

Docs → docs.smallest.ai

Full example → Smallest AI × LiveKit cookbook

PyPI → livekit-plugins-smallestai

Questions? Find us on Discord or open an issue in the cookbook repo.

Smallest AI builds the fastest speech infrastructure on the internet. Lightning TTS and Pulse STT are available via API today.

Related Blogposts

View all

Best Speech Recognition Software in 2026

May 22, 2026

AI Audiobook Generation for Publishers: How to Turn Written Content Into Long-Form Audio at Scale

May 22, 2026

Build the future of voice agent orchestration

Contact sales

311 California Street, Suite 320
San Francisco, CA 94104

Models

Text to Speech

Speech to Text

Speech to Speech

Voice cloning

Agents

Overview

On Prem

Industries

Debt Collection

Healthcare

Real Estate

Small business

E-commerce

Documentation

For Agents

For Models

Resources

Pricing

Blogs

Research

Careers

Voice AI apps

Initiatives

Startup Grants

Legals

MSA

Privacy notice

HIPAA Agreement

Terms and conditions

Data processing

User Policy

Twitter

Instagram

Youtube

Discord

Substack

Medium

System status operational

We are

SOC 2,

GDPR, and

HIPAA, Compliant

Build the future of voice agent orchestration

Contact sales

311 California Street, Suite 320
San Francisco, CA 94104

Models

Text to Speech

Speech to Text

Speech to Speech

Voice cloning

Agents

Overview

On Prem

Industries

Debt Collection

Healthcare

Real Estate

Small business

E-commerce

Legals

MSA

Privacy notice

HIPAA Agreement

Terms and conditions

Data processing

User Policy

Documentation

For Agents

For Models

Resources

Pricing

Blogs

Research

Careers

Voice AI apps

Initiatives

Startup Grants

Twitter

Instagram

Youtube

Discord

Substack

Medium

System status operational

We are

SOC 2,

GDPR, and

HIPAA, Compliant

Build the future of voice agent orchestration

Contact sales

311 California Street, Suite 320
San Francisco, CA 94104

Models

Text to Speech

Speech to Text

Speech to Speech

Voice cloning

Agents

Overview

On Prem

Industries

Debt Collection

Healthcare

Real Estate

Small business

E-commerce

Documentation

For Agents

For Models

Resources

Pricing

Blogs

Research

Careers

Voice AI apps

Initiatives

Startup Grants

Legals

MSA

Privacy notice

HIPAA Agreement

Terms and conditions

Data processing

User Policy

Twitter

Instagram

Youtube

Discord

Substack

Medium

System status operational

We are

SOC 2,

GDPR, and

HIPAA, Compliant