Thu Jun 05 2025 • 13 min Read
Lightning V2 : The Decisive Leader in Enterprise TTS
Lightning V2 by Smallest.ai: The Decisive Leader in Enterprise TTS
Akshat Mandloi
Data Scientist | CTO
Key Highlights – Why Lightning V2 Wins:
Lightning V2 sets a new benchmark for real-time conversational Text-to-Speech, delivering an unbeatable combination of speed, quality, and cost.
- World's Fastest: Achieves TTFB <100 ms latencies on-prem.
- Superior Voice Quality: Consistently receives the highest MOS scores (median ~3.82), indicating more natural and reliable human-like speech, evaluated using advanced 3rd party models. (Details & Chart Below)
- Unbeatable Cost: Dramatically more affordable at just $0.1 per 10,000 characters – up to 10x cheaper than ElevenLabs ($0.990) and ~4x cheaper than Cartesia ($0.392). (Details & Chart Below)
Text-to-Speech (TTS) technology converts written text into lifelike spoken audio, powering modern digital interfaces like virtual assistants, automated customer support, audiobooks, accessibility tools, and in-car navigation systems. For enterprise-ready, real-time applications, four qualities are paramount: Latency, Reliability, Naturalness, and Cost.
This report compares the “Lightning V2” model by Smallest.ai against other leading TTS providers, focusing on these mission-critical metrics.
Benchmarking Text-to-speech
- Smallest.ai: A TTS platform focused on low-latency, high-quality voice synthesis across multiple languages.
- Eleven Labs: A widely adopted TTS platform known for high-fidelity voice synthesis.
- Cartesia: Builder of ultra-low-latency, high-fidelity TTS solutions for real-time, enterprise-grade applications.
Metrics for Benchmarking & Results
To effectively evaluate TTS capabilities, we focus on two key metrics: Latency and MOS (Mean Opinion Score).
- Latency across regions: Measured as **Time to First Byte (TTFB)** – the time from submitting a request until the first byte of generated audio is available. This is crucial for real-time conversational flow.
- Reliability and Naturalness (MOS): Measured using WvMOS, a third-party MOS score predictor that rates audio quality on a scale of 0 to 5 (5 being human-level quality).
Latency Results
We compared the latency performance of four providers—Smallest, Eleven, and Cartesia—across diverse global regions. The results show the percentage of times each provider achieved the lowest latency. Smallest.ai emerges as the overall winner, consistently delivering the fastest response times.
These are the actual latencies that a user would experience when using these providers in production, which were calculated using the examples provided by these providers in their respective documentation.
Overall Average Latency per Provider
- Smallest.ai: 212.88ms
- Cartesia: 219.76ms
- ElevenLabs: 512.48ms
Smallest.ai has the lowest overall average latency at 212.88ms, followed by Cartesia at 219.76ms, and ElevenLabs significantly higher at 512.48ms.
Average Latency per Provider Across Each Region
Smallest.ai consistently demonstrated superior low latency across Asia South, Asia South East, EU West, US East, and US West.
Note: Smallest.ai is deployment-ready across all geographies, offering to deploy closer to clients for ultra-low latencies.
MOS (Mean Opinion Score) by WvMOS and UTMOS V2
Audio samples generated by each model on a diverse and industry-specific benchmark were scored using WvMOS and UTMOS V2 which are the industry wide standard of measuring audio quality.
Provider | Average WV MOS | Average UTMOS-V2 | Overall Avg MOS |
---|---|---|---|
Smallest | 3.804 | 3.349 | 3.577 |
Cartesia | 3.768 | 3.371 | 3.570 |
ElevenLabs | 3.682 | 3.248 | 3.465 |
- Smallest.ai: Leads with the highest median MOS score (approx. 3.577) and a tighter distribution, indicating consistently high quality.
- Cartesia: Median approx. 3.570
- ElevenLabs: Median approx. 3.465
Examples - Audio Samples (Listen for Yourself)
Healthcare:
1. "Your prescription for Amoxicillin, order ID RX-4521-789, will be ready for pickup at the Main Street Pharmacy after 4 PM today."
Smallest
Elevenlabs
Cartesia
2. "Mr. Johnson, your insurance copay for the MRI scan at St. John Medical Center on June 1st is $75K, and payment can be made online or at the front desk."
Smallest.ai
ElevenLabs
Cartesia
Banking and Finance:
1. "I noticed your monthly mortgage payment of $1,245.23 was successfully processed on June 1st through your auto-debit setup."
Smallest.ai
ElevenLabs
Cartesia
2. "Hi Sarah, I see that your credit card ending in 3491 has a pending charge of $125.67 from 23/05/23—would you like me to review that with you?"
Smallest.ai
ElevenLabs
Cartesia
Insurance:
1. "Your claim, associated with SSN 456-78-9012, for the water damage on May 12th is currently under review, and an adjuster will reach out to you by June 3rd"
Smallest.ai
ElevenLabs
Cartesia
2. "Your travel insurance for the trip starting June 15th under policy number TRV-1122-908 covers trip cancellation, medical emergencies, and lost luggage."
Smallest.ai
ElevenLabs
Cartesia
Real Estate:
1. "The closing date for your home purchase at 456 Pine Avenue is set for June 20th, and your loan number is 78945-ZXQ."
Smallest.ai
ElevenLabs
Cartesia
2. "Mrs. Thomas, your property tax bill for 2025 is $3,456.78, and payment is due by June 30th; would you like to set up automatic payments?"
Smallest.ai
ElevenLabs
Cartesia
Cost
Cost is a critical factor for enterprise deployment at scale.
- Smallest.ai: $0.100 per 10,000 characters
- Cartesia: $0.392 per 10,000 characters
- ElevenLabs: $0.990 per 10,000 characters
Conclusion
The benchmark data demonstrates that Smallest.ai's Lightning V2 model offers a leading combination of low latency, high quality, and cost-effectiveness for enterprise TTS solutions, based on the metrics evaluated:
- Latency Performance: With an overall average Time to First Byte (TTFB) of 212.88ms, Lightning V2 exhibited the fastest response times among the compared providers, a key factor for enabling real-time conversational applications.
- Voice Quality: Lightning V2 achieved the highest median WvMOS score (approximately 3.82) in the comparison, indicating a high degree of perceived naturalness and clarity in its synthesized audio.
- Cost-Effectiveness: Priced at $0.100 per 10,000 characters, Smallest.ai presents the most competitive pricing structure among the evaluated providers, offering potential for considerable operational cost savings.
Collectively, these results indicate that Smallest.ai's voice models provide a robust solution for enterprises seeking to implement effective and efficient voice workflow automation. For businesses prioritizing a balance of high-quality audio output, rapid delivery, and economical operations, the performance metrics of Smallest.ai align strongly with these objectives.
Recent Blog Posts
Interviews, tips, guides, industry best practices, and news.
Electron V2: A SLM powering real time conversations by Smallest.ai
Leading SLM Performance with Unmatched Speed, Reliability, and Cost-Efficiency for Enterprise AI
Top AI Platforms for Phone Calling in 2025
In 2025, customer calls no longer begin with “Press 1 for sales.” They begin with natural voices, real-time responses, and AI agents that remember your last conversation.
10 Best AI Voicemail in 2025
In today’s hyper-connected business world, voicemail is no longer just a digital answering machine—it’s a conversion tool.Here are top 10 tools to handle missed calls smarter.