Mon Dec 23 2024 • 13 min Read
Smallest AI vs Cartesia
Compare Smallest.ai vs Cartesia for TTS and Voice Cloning. Explore differences in voice quality, speed, emotional context, API features, and pricing.
Kaushal Choudhary
Senior Developer Advocate
In this article, we compare both platforms comprehensively on various parameters such as Voice Quality, Pricing, Latency, and performance to help you choose the best TTS for your use case.
Smallest.ai vs Cartesia, a quick overview
Feature | Smallest.ai | Cartesia |
---|---|---|
Languages Supported | 50+ | 13 |
Total Number of Voices | 100+ | 29 |
Voice Quality | Hyper-realistic and tone-matching. | High-quality and robust voices, with manual parameter settings overhead. |
Character Limits | 2500 characters on Platform and Infinite Request Length on SDK. | 500 characters for Sonic Turbo English |
Latency | sub-100ms for 10 seconds of audio + network time. | 3000ms for 10 seconds of audio + network time |
Price | Inexpensive pricing for all needs starts from as low as $0.01 per minute for TTS and $0.045 for voice cloning | Cartesia offers three pricing tiers, starting at $0.03 per minute. |
Voice Cloning | Instant and Professional Voice Cloning, with minimal latency. | Instant Voice Cloning with 30 seconds of audio. |
API | API access for all tier users. | API access for all plans. |
Comparing Text to Speech
We will test both platforms over different text types, commonly found in books, websites, etc, and they are essential parameters to judge the authenticity and naturalness of a TTS platform.
Sarcasm or Irony
The below sentence will evaluate if the model can reflect subtleties like sarcasm.
Oh great, another meeting that could have been an email!
let's see how both of the TTS perform.
Smallest.ai
Smallest AI doesn't require any parameter to be set explicitly until highly required. It can automatically catch the emotional context of the sentence and produce a highly natural and authentic voice.
Cartesia
In Cartesia, you have to manually set the parameters for Speed/Emotion to be able to catch the context of the sentence.
Let's listen to Cartesia without the parameters set.
and with the Speed decreased, the Voice Changed and the Positivity parameter was set a little higher.
Supported languages
Smallest.ai currently supports 50+ languages, whereas Cartesia supports only 13 languages.
Size of voice library
Smallest.ai supports 100+ voices with rich languages and dialects. Cartesia supports 29 voices across different languages and scenarios.
Latency
Smallest.ai leverages the Lightning Model, achieving consistent sub-100ms latency for Text-to-Speech (TTS) tasks. In contrast, our testing of Cartesia's API revealed that it consistently required over 3000ms to generate the same duration of audio across all TTS tasks.
Comparing Voice Cloning
Both platforms provide Instant Voice Cloning and support 1 free voice clone on their free tier. The voice clone samples along with the reference audio are provided below.
Here is the audio that was used as a reference.
Let's listen to the Voice clone generated.
Smallest.ai
Cartesia
Cartesia provides two configuration option for Voice Cloning.
- Stable - The clone will be more robust to languages and dialects but less similar to reference audio.
- High - The clone would sound more similar to reference audio but less robust.
Smallest.ai offers rapid and reliable voice cloning, ensuring that the generated audio maintains a natural flow and robustness. Cartesia, on the other hand, provides two distinct configuration modes: Stable and High. These modes contrast in terms of voice similarity and robustness but excel in delivering efficient voice cloning. However, smallest.ai stands out due to its superior speed, reliability, and compact size, offering a cost-effective and stable solution that enhances its overall appeal.
API Support
Both platforms provide production-grade API for businesses to integrate TTS and Voice Cloning services into their product.
Here is an example of both API's in Python.
Smallest.ai
For programmatic generation and easy integration into apps/websites, smallest.ai provides easy-to-use API support in multiple languages. Find more examples/approaches on the official repo here.
pip install smallestai
from smallest import Smallest
client = Smallest(api_key="SMALLEST_API_KEY")
client.synthesize(
text="Hello, this is a test for sync synthesis function.",
voice="emily",
speed=1.0,
sample_rate=24000,
save_as="smallest.wav"
)
Cartesia
Cartesia also provides an easily accessible API. Find the official docs here.
pip install cartesia
import os
import subprocess
from cartesia import Cartesia
if os.environ.get("CARTESIA_API_KEY") is None:
raise ValueError("CARTESIA_API_KEY is not set")
client = Cartesia(api_key=os.environ.get("CARTESIA_API_KEY"))
data = client.tts.bytes(
model_id="sonic-english",
transcript="Hello, world! I'm generating audio on Cartesia.",
voice_id="694f9389-aac1-45b6-b726-9d9369183238", # Barbershop Man
# You can find the supported `output_format`s at https://docs.cartesia.ai/api-reference/tts/bytes
output_format={
"container": "wav",
"encoding": "pcm_f32le",
"sample_rate": 44100,
},
)
with open("cartesia.wav", "wb") as f:
f.write(data)
Pricing
Smallest.ai offers competitive pricing, going as low as just $0.01 per minute for text-to-speech (TTS) and $0.045 for instant voice cloning, making it cost-effective even for large-scale businesses. Find the pricing here.
Cartesia.ai provides versatile plans, including a free tier with 10,000 characters. Paid plans range from $5/month for 100,000 characters to $299/month for 8 million characters, with custom enterprise options and a startup grant offering four months of the Scale Plan free. Learn more here.
Conclusion
Smallest.ai excels with its ultra-fast processing, hyper-realistic voice synthesis, and affordable pricing, making it a standout choice for businesses requiring scalable TTS and voice cloning solutions. Its ability to intuitively capture emotional tones and provide nuanced outputs with minimal latency showcases advanced innovation.
In contrast, Cartesia.ai, while offering flexibility and a competitive free tier, falls short in supported languages, voice depth, and processing efficiency. Its reliance on manual parameter adjustments limits usability for dynamic applications, leaving Smallest.ai as the more comprehensive and versatile option.
Recent Blog Posts
Interviews, tips, guides, industry best practices, and news.
Top 5 Speechify Alternatives for High-Quality Audio-Books
Explore the Top 5 Speechify Alternatives for audiobook creation: Compare pricing, audio quality, latency, and use case fit to find the best TTS for your needs.
Top 5 Alternatives to ElevenLabs in TTS
Explore top ElevenLabs alternatives like Smallest.ai, Cartesia, Resemble AI, Speechify, and FakeYou. Compare latency, pricing, fidelity, and use cases.
Smallest AI vs Play HT
Compare Smallest.ai vs Play.ht for Text-to-Speech. Explore hyper-realistic voices, latency, API access, pricing, voice cloning precision, and language support.