Sept 12, 2024 • 3 min Read
IndicTTS : A Text-to-Speech model for Indian Languages
Learn how to run Indic TTS on Google Colab for quick and seamless text-to-speech generation in Indian languages.
Kaushal Choudhary
Senior Developer Advocate
What is Indic TTS?
To develop a Text-to-Speech system focused on Indian languages, IIT Madras, along with the Ministry of Electronics & Information Technology (MeiTY) and 23 institutions created IndicTTS. This project was developed to improve the quality of synthesis and maintaining a small foot print of TTS model, and integrated with disability aids. It will provide continual expansion of TTS to include a broader range of Indian languages and dialects. It is partly used in Bhashini and hosted on UCLA. It is also used in IIIT Hyderabad Canvas and NPTEL.
We are going to use the AI4Bharat IndicTTS, which has the FastPitch
Text-to-Speech Model. You can try out the demo here.
Notebook Walk-through
Prerequisites
- Python Knowledge: Basic Python skills are needed to modify scripts.
- Colab Familiarity: Understanding how to run cells and upload files.
Step 1: Cloning the Indic TTS Repository
!git clone https://github.com/AI4Bharat/Indic-TTS.git
Step 2: Installing Dependencies
!pip install jupyter librosa pandas pytorch_lightning scikit-learn seaborn soundfile tqdm TTS tensorboard wandb pyenchant wandb
and then cd to inference
folder
%cd inference
You would also need to unzip
the language model folder; here we would do for en+hi
. Find other languages here.
!unzip en+hi.zip
Step 2.1
Change the pip version (for safety, as the imports and installations fail often if not done).
!pip install pip==23.2.1
Step 2.2
Edit the contents of the requirements-ml.txt
file in the inference folder.
# constraints for fairseq, works with TTS as well
numba==0.56.4
tts==0.14.3
numpy>=1.23.0
protobuf==3.19.4
TTS
# onnxruntime
# TTS @ git+https://github.com/ashwin-014/TTS@deployment
ai4bharat-transliteration
asteroid
Step 2.3
Then run the following commands.
!pip install -r requirements-ml.txt
!pip install -r requirements-utils.txt
Step 3: Organizing Models and Weights
Here, the en+hi (English + Hindi) model is used. The en+hi is 1.43 GB
zip file containing models and weights required for the inference. After downloading, follow these important steps. You can choose other languages
as well.
!wget https://github.com/AI4Bharat/Indic-TTS/releases/download/v1-checkpoints-release/en+hi.zip
copy and move this files to these folders.
!cp -r ./en+hi/fastpitch ./checkpoints
!cp -r ./en+hi/hifigan ./checkpoints
and,
!cp checkpoints/fastpitch/speakers.pth ./models/fastpitch/v1/en
Step 4: Setting up the Model
Here, we will setup the configs for models, speaker files, and vocoder files.
import io
from TTS.utils.synthesizer import Synthesizer
from src.inference import TextToSpeechEngine
# Initialize Hindi model
lang = "en+hi"
en_hi_model = Synthesizer(
tts_checkpoint=f'checkpoints/fastpitch/best_model.pth',
tts_config_path=f'checkpoints/fastpitch/config.json',
tts_speakers_file=f'checkpoints/fastpitch/speakers.pth',
# tts_speakers_file=None,
tts_languages_file=None,
vocoder_checkpoint=f'checkpoints/hifigan/best_model.pth',
vocoder_config=f'checkpoints/hifigan/config.json',
encoder_checkpoint="",
encoder_config="",
use_cuda=True,
)
# Setup TTS Engine
models = {
"hi": en_hi_model,
}
engine = TextToSpeechEngine(models)
Step 5: Inference and Saving the Audio
Now, given all the configurations are set, we can run the inference on our text. Here I am using Hinglish
(Hindi + English) language. We can save the generated audio into .wav
file and download it to the local computer as well.
from scipy.io.wavfile import write as scipy_wav_write
DEFAULT_SAMPLING_RATE = 16000
hindi_raw_audio = engine.infer_from_text(
input_text="सलाम दुनिया",
lang="hi",
speaker_name="male"
)
byte_io = io.BytesIO()
scipy_wav_write(byte_io, DEFAULT_SAMPLING_RATE, hindi_raw_audio)
with open("hindi_audio.wav", "wb") as f:
f.write(byte_io.read())
The generated audio will be in the specified regional language. You can find the full Colab Notebook here.
Conclusion
Running IndicTTS is straightforward, offering simple language selection and configuration setup. This minimizes the challenges of setting up and running models, facilitating smoother development. As a result, it accelerates research and development in Indian language-based Text-to-Speech (TTS) models. For real-time TTS, check out Waves, a system that supports both English and Hindi. You can try it now with free credits.
FAQs
1. Which Indian languages are supported by Indic TTS?
Languages like Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil and Telugu_ are included.
2. Do I need to know Python to run Indic TTS?
Yes, while Python helps run code, you can also try here if you don't know python.
3. Can I run Indic TTS for free on Colab's free tier?
Yes, you can run it for free, but consider upgrading to Colab Pro for better performance.
4. What are the alternatives to Indic TTS that enable real-time text to speech?
Waves by smallest.ai is a real-time text-to-speech system with ultra-realistic voice and minimal latency.
Recent Blog Posts
Interviews, tips, guides, industry best practices, and news.
Top 5 Speechify Alternatives for High-Quality Audio-Books
Explore the Top 5 Speechify Alternatives for audiobook creation: Compare pricing, audio quality, latency, and use case fit to find the best TTS for your needs.
Top 5 Alternatives to ElevenLabs in TTS
Explore top ElevenLabs alternatives like Smallest.ai, Cartesia, Resemble AI, Speechify, and FakeYou. Compare latency, pricing, fidelity, and use cases.
Smallest AI vs Cartesia
Compare Smallest.ai vs Cartesia for TTS and Voice Cloning. Explore differences in voice quality, speed, emotional context, API features, and pricing.