Join our discord for early access to new features!Join discord for early access!Join Now

Mon Dec 23 202413 min Read

Smallest AI vs Cartesia

Compare Smallest.ai vs Cartesia for TTS and Voice Cloning. Explore differences in voice quality, speed, emotional context, API features, and pricing.

cover image

Kaushal Choudhary

Senior Developer Advocate

cover image

In this article, we compare both platforms comprehensively on various parameters such as Voice Quality, Pricing, Latency, and performance to help you choose the best TTS for your use case.

Smallest.ai vs Cartesia, a quick overview

Feature

Smallest.ai

Cartesia

Languages Supported

50+

13

Total Number of Voices

100+

29

Voice Quality

Hyper-realistic and tone-matching.

High-quality and robust voices, with manual parameter settings overhead.

Character Limits

2500 characters on Platform and Infinite Request Length on SDK.

500 characters for Sonic Turbo English

Latency

sub-100ms for 10 seconds of audio + network time.

3000ms for 10 seconds of audio + network time

Price

Inexpensive pricing for all needs starts from as low as $0.01 per minute for TTS and $0.045 for voice cloning

Cartesia offers three pricing tiers, starting at $0.03 per minute.

Voice Cloning

Instant and Professional Voice Cloning, with minimal latency.

Instant Voice Cloning with 30 seconds of audio.

API

API access for all tier users.

API access for all plans.

Comparing Text to Speech

We will test both platforms over different text types, commonly found in books, websites, etc, and they are essential parameters to judge the authenticity and naturalness of a TTS platform.

Sarcasm or Irony

The below sentence will evaluate if the model can reflect subtleties like sarcasm.

Oh great, another meeting that could have been an email!

let's see how both of the TTS perform.

Smallest.ai

Smallest AI doesn't require any parameter to be set explicitly until highly required. It can automatically catch the emotional context of the sentence and produce a highly natural and authentic voice.

Cartesia

In Cartesia, you have to manually set the parameters for Speed/Emotion to be able to catch the context of the sentence.

Image

Let's listen to Cartesia without the parameters set.

and with the Speed decreased, the Voice Changed and the Positivity parameter was set a little higher.

Supported languages

Smallest.ai currently supports 50+ languages, whereas Cartesia supports only 13 languages.

Size of voice library

Smallest.ai supports 100+ voices with rich languages and dialects. Cartesia supports 29 voices across different languages and scenarios.

Latency

Smallest.ai leverages the Lightning Model, achieving consistent sub-100ms latency for Text-to-Speech (TTS) tasks. In contrast, our testing of Cartesia's API revealed that it consistently required over 3000ms to generate the same duration of audio across all TTS tasks.

Comparing Voice Cloning

Both platforms provide Instant Voice Cloning and support 1 free voice clone on their free tier. The voice clone samples along with the reference audio are provided below.

Here is the audio that was used as a reference.

Let's listen to the Voice clone generated.

Smallest.ai

Cartesia

Cartesia provides two configuration option for Voice Cloning.

  • Stable - The clone will be more robust to languages and dialects but less similar to reference audio.
  • High - The clone would sound more similar to reference audio but less robust.

Smallest.ai offers rapid and reliable voice cloning, ensuring that the generated audio maintains a natural flow and robustness. Cartesia, on the other hand, provides two distinct configuration modes: Stable and High. These modes contrast in terms of voice similarity and robustness but excel in delivering efficient voice cloning. However, smallest.ai stands out due to its superior speed, reliability, and compact size, offering a cost-effective and stable solution that enhances its overall appeal.

API Support

Both platforms provide production-grade API for businesses to integrate TTS and Voice Cloning services into their product.

Here is an example of both API's in Python.

Smallest.ai

For programmatic generation and easy integration into apps/websites, smallest.ai provides easy-to-use API support in multiple languages. Find more examples/approaches on the official repo here.

pip install smallestai
from smallest import Smallest

client = Smallest(api_key="SMALLEST_API_KEY")
client.synthesize(
  text="Hello, this is a test for sync synthesis function.", 
  voice="emily",
  speed=1.0,
  sample_rate=24000,
  save_as="smallest.wav"
)

Cartesia

Cartesia also provides an easily accessible API. Find the official docs here.

pip install cartesia
import os
import subprocess
from cartesia import Cartesia

if os.environ.get("CARTESIA_API_KEY") is None:
    raise ValueError("CARTESIA_API_KEY is not set")

client = Cartesia(api_key=os.environ.get("CARTESIA_API_KEY"))

data = client.tts.bytes(
    model_id="sonic-english",
    transcript="Hello, world! I'm generating audio on Cartesia.",
    voice_id="694f9389-aac1-45b6-b726-9d9369183238",  # Barbershop Man
    # You can find the supported `output_format`s at https://docs.cartesia.ai/api-reference/tts/bytes
    output_format={
        "container": "wav",
        "encoding": "pcm_f32le",
        "sample_rate": 44100,
    },
)

with open("cartesia.wav", "wb") as f:
    f.write(data)

Pricing

Smallest.ai offers competitive pricing, going as low as just $0.01 per minute for text-to-speech (TTS) and $0.045 for instant voice cloning, making it cost-effective even for large-scale businesses. Find the pricing here.

Cartesia.ai provides versatile plans, including a free tier with 10,000 characters. Paid plans range from $5/month for 100,000 characters to $299/month for 8 million characters, with custom enterprise options and a startup grant offering four months of the Scale Plan free. Learn more here.

Conclusion

Smallest.ai excels with its ultra-fast processing, hyper-realistic voice synthesis, and affordable pricing, making it a standout choice for businesses requiring scalable TTS and voice cloning solutions. Its ability to intuitively capture emotional tones and provide nuanced outputs with minimal latency showcases advanced innovation.

In contrast, Cartesia.ai, while offering flexibility and a competitive free tier, falls short in supported languages, voice depth, and processing efficiency. Its reliance on manual parameter adjustments limits usability for dynamic applications, leaving Smallest.ai as the more comprehensive and versatile option.