logo

Thu Jun 05 202513 min Read

Lightning V2 : The Decisive Leader in Enterprise TTS

Lightning V2 by Smallest.ai: The Decisive Leader in Enterprise TTS

cover image

Akshat Mandloi

Data Scientist | CTO

cover image

Key Highlights – Why Lightning V2 Wins:


Lightning V2 sets a new benchmark for real-time conversational Text-to-Speech, delivering an unbeatable combination of speed, quality, and cost.

  •  World's Fastest: Achieves TTFB <100 ms latencies on-prem.
  • Superior Voice Quality: Consistently receives the highest MOS scores (median ~3.82), indicating more natural and reliable human-like speech, evaluated using advanced 3rd party models. (Details & Chart Below)
  • Unbeatable Cost: Dramatically more affordable at just $0.1 per 10,000 characters – up to 10x cheaper than ElevenLabs ($0.990) and ~4x cheaper than Cartesia ($0.392). (Details & Chart Below)

Text-to-Speech (TTS) technology converts written text into lifelike spoken audio, powering modern digital interfaces like virtual assistants, automated customer support, audiobooks, accessibility tools, and in-car navigation systems. For enterprise-ready, real-time applications, four qualities are paramount: Latency, Reliability, Naturalness, and Cost.

This report compares the “Lightning V2” model by Smallest.ai against other leading TTS providers, focusing on these mission-critical metrics.

Benchmarking Text-to-speech

  • Smallest.ai: A TTS platform focused on low-latency, high-quality voice synthesis across multiple languages.
  • Eleven Labs: A widely adopted TTS platform known for high-fidelity voice synthesis.
  • Cartesia: Builder of ultra-low-latency, high-fidelity TTS solutions for real-time, enterprise-grade applications.

Metrics for Benchmarking & Results

To effectively evaluate TTS capabilities, we focus on two key metrics: Latency and MOS (Mean Opinion Score).

  • Latency across regions: Measured as **Time to First Byte (TTFB)** – the time from submitting a request until the first byte of generated audio is available. This is crucial for real-time conversational flow.
  • Reliability and Naturalness (MOS): Measured using WvMOS, a third-party MOS score predictor that rates audio quality on a scale of 0 to 5 (5 being human-level quality). 

Latency Results

We compared the latency performance of four providers—Smallest, Eleven, and Cartesia—across diverse global regions. The results show the percentage of times each provider achieved the lowest latency. Smallest.ai emerges as the overall winner, consistently delivering the fastest response times.

These are the actual latencies that a user would experience when using these providers in production, which were calculated using the examples provided by these providers in their respective documentation.

Overall Average Latency per Provider

  • Smallest.ai: 212.88ms
  • Cartesia: 219.76ms
  • ElevenLabs: 512.48ms
Image

Smallest.ai has the lowest overall average latency at 212.88ms, followed by Cartesia at 219.76ms, and ElevenLabs significantly higher at 512.48ms.

Average Latency per Provider Across Each Region

Smallest.ai consistently demonstrated superior low latency across Asia South, Asia South East, EU West, US East, and US West.

Image

Note: Smallest.ai is deployment-ready across all geographies, offering to deploy closer to clients for ultra-low latencies.

MOS (Mean Opinion Score) by WvMOS and UTMOS V2

Audio samples generated by each model on a diverse and industry-specific benchmark were scored using WvMOS and UTMOS V2 which are the industry wide standard of measuring audio quality.

Image

Provider

Average WV MOS

Average UTMOS-V2

Overall Avg MOS

Smallest

3.804

3.349

3.577

Cartesia

3.768

3.371

3.570

ElevenLabs

3.682

3.248

3.465

  • Smallest.ai: Leads with the highest median MOS score (approx. 3.577) and a tighter distribution, indicating consistently high quality.
  • Cartesia: Median approx. 3.570
  • ElevenLabs: Median approx. 3.465

Examples - Audio Samples (Listen for Yourself)


Healthcare:

1.  "Your prescription for Amoxicillin, order ID RX-4521-789, will be ready for pickup at the Main Street Pharmacy after 4 PM today."

Smallest

Elevenlabs

Cartesia


2.  "Mr. Johnson, your insurance copay for the MRI scan at St. John Medical Center on June 1st is $75K, and payment can be made online or at the front desk."

Smallest.ai

ElevenLabs

Cartesia


Banking and Finance:

1.  "I noticed your monthly mortgage payment of $1,245.23 was successfully processed on June 1st through your auto-debit setup."

Smallest.ai

ElevenLabs

Cartesia

2.  "Hi Sarah, I see that your credit card ending in 3491 has a pending charge of $125.67 from 23/05/23—would you like me to review that with you?"

Smallest.ai

ElevenLabs

Cartesia


Insurance:

1.  "Your claim, associated with SSN 456-78-9012, for the water damage on May 12th is currently under review, and an adjuster will reach out to you by June 3rd"

Smallest.ai

ElevenLabs

Cartesia

  

2.  "Your travel insurance for the trip starting June 15th under policy number TRV-1122-908 covers trip cancellation, medical emergencies, and lost luggage."

Smallest.ai

ElevenLabs

Cartesia

   


Real Estate:

1.  "The closing date for your home purchase at 456 Pine Avenue is set for June 20th, and your loan number is 78945-ZXQ."

Smallest.ai

ElevenLabs

Cartesia

2.  "Mrs. Thomas, your property tax bill for 2025 is $3,456.78, and payment is due by June 30th; would you like to set up automatic payments?"

Smallest.ai

ElevenLabs

Cartesia

Cost

Cost is a critical factor for enterprise deployment at scale.

  • Smallest.ai: $0.100 per 10,000 characters
  • Cartesia: $0.392 per 10,000 characters
  • ElevenLabs: $0.990 per 10,000 characters
Image

Conclusion

The benchmark data demonstrates that Smallest.ai's Lightning V2 model offers a leading combination of low latency, high quality, and cost-effectiveness for enterprise TTS solutions, based on the metrics evaluated:

  • Latency Performance: With an overall average Time to First Byte (TTFB) of 212.88ms, Lightning V2 exhibited the fastest response times among the compared providers, a key factor for enabling real-time conversational applications.
  • Voice Quality: Lightning V2 achieved the highest median WvMOS score (approximately 3.82) in the comparison, indicating a high degree of perceived naturalness and clarity in its synthesized audio.
  • Cost-Effectiveness: Priced at $0.100 per 10,000 characters, Smallest.ai presents the most competitive pricing structure among the evaluated providers, offering potential for considerable operational cost savings.

Collectively, these results indicate that Smallest.ai's voice models provide a robust solution for enterprises seeking to implement effective and efficient voice workflow automation. For businesses prioritizing a balance of high-quality audio output, rapid delivery, and economical operations, the performance metrics of Smallest.ai align strongly with these objectives.