Products

Industries

On Prem

Research

Pricing

Documentation

Blogs

Careers

Contact Sales

AGI under 10B parameters

Driving the future of small, efficient multi-modal models

Our Research

Try Models

OUR MODELS

Built to Scale

We outperform LLMs 100–1000× our size, with dramatically lower GPU usage and time-to-first-token as low as 45 ms. Powerful intelligence can also be efficient.

Synthesize

Yes, your delivery #AX344 has been shipped and shall arrive at your doorstep in 7 working days.

0:00/1:34

Synthesize

Yes, your delivery #AX344 has been shipped and shall arrive at your doorstep in 7 working days.

0:00/1:34

Synthesize

Yes, your delivery #AX344 has been shipped and shall arrive at your doorstep in 7 working days.

0:00/1:34

Lightning

Text to Speech Model Series

Lightning is one of the world's fastest text-to-speech models, with time to first byte as low as 100ms. It generates hyper-realistic audio in over 30 languages, with thousands of local accents and dialects supported.

Human-like Emotional Voices

30+ languages Support

Streaming support with 100ms TTFB

Voice Cloning Support

Try API

“I completely forgot to book dinner”

“I completely forgot to book dinner”

“I completely forgot to book dinner”

Electron

Small Language Model

LLMs memorize more information as they scale, and this behavior is often conflated with intelligence. Electron is an SLM that demonstrates how intelligence and memory can be decoupled, outperforming GPT-4.1 on multiple benchmarks with a TTFT of 45ms.

45ms TTFT

Less than 3B parameters

Specialized for conversational use-cases

NSFW, Prompt Attack protected

Try API

Speak

0:00/1:34

Speak

0:00/1:34

Speak

0:00/1:34

Pulse

Speech to text MODEL SERIES

Pulse transcribes audio across 36 languages spanning Europe, South America, and Asia, with state of the art streaming and batch accuracy, supporting code-switching and one of the world's fastest real-time factors for high volume production use-cases.

38+ languages with code-switching

Streaming, Batch support with 100ms TTFB

Emotion, speaker, time-stamp detection

Interruption handling

Try API

Hydra

Speech to speech MODEL SERIES

Hydra is one of the world's first fully functional full duplex multimodal models that can process long context, perform extremely accurate tool calling, and reply in highly emotional human like voices. Hydra represents a major scientific leap in asynchronous thinking.

Multi-modal speech, text model

Tool Calling Support

Asynchronous thinking

Hyper-emotional dialogue

View Hydra

Atoms

The AI Voice Agentic platform that just works in production

Voice agents that sound human, respond like humans, scale beyond humans. While we drive the business, you just relax!

Try Atoms

Book a Demo

Create

Train Fin to resolve even the most complex queries with your Procedures, knowledge and policies.

Test

Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live

Deploy

Set Fin live across every channel—⁠voice, email, chat, and social—⁠for consistent support wherever customers reach out.

Analyze

Use AI-powered Insights to analyze and improve Fin’s performance and deliver better customer experiences.

Atoms

The AI Voice Agentic platform that just works in production

Voice agents that sound human, respond like humans, scale beyond humans. While we drive the business, you just relax!

Try Atoms

Book a Demo

Create

Train Fin to resolve even the most complex queries with your Procedures, knowledge and policies.

Test

Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live

Deploy

Set Fin live across every channel—⁠voice, email, chat, and social—⁠for consistent support wherever customers reach out.

Analyze

Use AI-powered Insights to analyze and improve Fin’s performance and deliver better customer experiences.

Atoms

A Self-Learning Multi-Modal AI Agentic Platform

Voice, text agents that sound human, respond like humans, scale beyond humans.

Try Atoms

Book a Demo

Create

Create agents with 3 clicks, with an extremely intuitive UI that supports integrations with enterprise systems

Test

Run fully simulated customer conversations from start to finish to see exactly how the agent will behave before go live

Deploy

Set the agent live across every channel like, ⁠voice, email, chat, and social, for consistent conversations wherever customers reach out.

Analyze

Use AI-powered Insights to analyze and improve the agent's performance and deliver better customer experiences.

Atoms

A Self-Learning Multi-Modal AI Agentic Platform

Voice, text agents that sound human, respond like humans, scale beyond humans.

Try Atoms

Book a Demo

Create

Create agents with 3 clicks, with an extremely intuitive UI that supports integrations with enterprise systems

Test

Run fully simulated customer conversations from start to finish to see exactly how the agent will behave before go live

Deploy

Set the agent live across every channel like, ⁠voice, email, chat, and social, for consistent conversations wherever customers reach out.

Analyze

Use AI-powered Insights to analyze and improve the agent's performance and deliver better customer experiences.

For Developers

Automate. Orchestrate. Dominate — with code.

Build with our Node and Python SDKs.

Python SDK

Node.js SDK

javascript

python

curl

const options = {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <token>',
    'Content-Type': 'application/json'
  },
  body: '{
    "voice_id":"<string>",
    "text":"<string>",
    "sample_rate":8000,
    "add_wav_header":true}'
  };

fetch('https://waves-api.smallest.ai/api/v1/lightning/get_speech',
   options)
  .then(response => response.json())
  .then(response => console.log(response))
  .catch(err => console.error(err));

For Developers

Automate. Orchestrate. Dominate — with code.

Build with our Node and Python SDKs.

Python SDK

Node.js SDK

javascript

python

curl

const options = {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <token>',
    'Content-Type': 'application/json'
  },
  body: '{
    "voice_id":"<string>",
    "text":"<string>",
    "sample_rate":8000,
    "add_wav_header":true}'
  };

fetch('https://waves-api.smallest.ai/api/v1/lightning/get_speech',
   options)
  .then(response => response.json())
  .then(response => console.log(response))
  .catch(err => console.error(err));

For Developers

Automate. Orchestrate. Dominate — with code.

Build with our Node and Python SDKs.

Python SDK

Node.js SDK

javascript

python

curl

const options = {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <token>',
    'Content-Type': 'application/json'
  },
  body: '{
    "voice_id":"<string>",
    "text":"<string>",
    "sample_rate":8000,
    "add_wav_header":true}'
  };

fetch('https://waves-api.smallest.ai/api/v1/lightning/get_speech',
   options)
  .then(response => response.json())
  .then(response => console.log(response))
  .catch(err => console.error(err));

Your data, secure with Enterprise Security

Your data is secured by top SOC 2 Type 2, HIPAA, and PCI compliance standards, both in the cloud and on-premises.

Security Docs

We comply with HIPAA to protect your health information.

SOC 2–aligned controls to ensure security, availability, and confidentiality.

GDPR-compliant data handling with strong privacy and data protection.

ISO-aligned security and risk-management practices.

Powering Specialized Machine Intelligence

Our models power 100+ use cases across industries.

Conversational AI

B2C

Notetakers

AI companions

AI celebrity clones

B2B

Collections

Lead qualifications

Customer Support

Edge

Custom Chips

Specialized Hardware

Mobile Devices

Proven in production

Our agents can converse through speech and text with extremely high domain accuracies and ultra-low latencies, handling billions of conversations at enterprise scale.

1B+‎

calls run monthly

99.99%‎

uptime for enterprise clients

sub-400ms‎

average latency

<400ms average latency-to-response.

50% cost reduction

90% improvements in show-up rates

"Smallest AI provides the highest quality of speech agents for automating our highly complex payment contact centres”

Harinder Thakar

CEO Paytm Labs

The Smallest AI Thesis

View Full Thesis

It seems like the early days of AI again, wherein one particular architecture, the transformer, has dominated the industry to such an extent that the risk to question it and exploring alternatives is one that is taken by only a select few.

Today, it seems like the field of AI has made massive progress, and yet most of the economically valuable tasks are still human-driven.

In such times, it is important to take a step back and ask, what would true AGI look like, and is the transformer architecture a partial, complete, or a non-answer to achieve it?

We believe that AI will evolve very similarly to human intelligence - specialized, efficient, and continuously learning to stay relevant. Whilst today's LLMs may have their own place in society, they are not the right step towards breaking through the Turing tests for all economically viable tasks that portray intelligence.

Intelligence will be achieved through small models that are continuously learning, and powering specialized agents that are enabled by domain-relevant tools and infinite memory, which help them stay grounded and up to date.

Latest Research

Explore a selection of our recent research on some of the most complex and interesting challenges in AI.

CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives

With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like straight lines and circles. Although existing approaches achieve this by sketch-synthesisthrough-optimization, they do so on the space of higher order Bézier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDraw++, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDraw++ can be tracked end-to-end, with each visual concept being expressed exclusively in terms of primitives.

CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives

Artificial Special Intelligence: Beyond Scaling Laws Towards Structured Intelligence

Recent progress in artificial intelligence has been driven by empirical scaling laws linking performance improvements to increases in model parameters, data, and compute. These results have fueled the widespread belief that artificial general intelligence (AGI) will emerge primarily through continued scaling of large language models (LLMs). In this paper, we argue that this assumption conflates benchmark performance with intelligence and overlooks fundamental architectural limitations of current models. We propose Artificial Special Intelligence (ASI) as an alternative framework: intelligence arising from collections of small, specialized models that operate asynchronously, learn continuously, and interact with large-scale external memory. Drawing on evidence from machine learning, neuroscience, and cognitive science, we argue that intelligence is better characterized by structural properties—such as specialization, separation of compute and memory, and lifelong learning than by parameter count alone.

Artificial Special Intelligence: Beyond Scaling Laws Towards Structured Intelligence

SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS

Neural text-to-speech systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations due to their underrepresentation in predominantly English training corpora. Existing solutions require expensive multilingual data collection or manual phonetic annotation, limiting TTS deployment in diverse linguistic contexts. We introduce SonoEdit, a model editing technique that surgically corrects pronunciation errors in pre-trained TTS models without retraining. Correcting such errors traditionally requires costly supervised finetuning or manual phoneme injection. In this work, we present a parsimonious alternative using Null-Space Pronunciation Editing, a single-shot parameter update that modifies the pronunciation of specific words while provably preserving the rest of the model's behavior. We first adapt Acoustic Causal Tracing to identify the specific Transformer layers governing text-to-pronunciation mapping. We then employ Null-Space Constrained Editing to compute a closed-form weight update that rectifies the target pronunciation while remaining mathematically orthogonal to the manifold of general speech, constructing a constrained update that drives the model's acoustic output toward a desired pronunciation exemplar while ensuring zero first-order change on a preserved corpus.

Third Conference on Parsimony and Learning (CPAL 2026).

SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS

Third Conference on Parsimony and Learning (CPAL 2026).

CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives

Artificial Special Intelligence: Beyond Scaling Laws Towards Structured Intelligence

SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS

Third Conference on Parsimony and Learning (CPAL 2026).

Low Resource Indic Language Translation Shared Task

Abstract

We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation, leveraging data from WMT task datasets, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi, significantly expanding our training corpus. We fine-tune the pre-trained NLLB 3.3B model for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, which is not supported by the NLLB model, we introduce special tokens and train the model on our Khasi corpus. Our training involves masked language modelling, followed by fine-tuning for English-to-Indic and Indic-to-English translations.