Sept 12, 2024 • 5 min Read
Running Indic TTS on Colab: Fast and Easy Text-to-Speech
Learn how to run Indic TTS on Google Colab for fast and easy text-to-speech generation in Indian languages. Get a step-by-step guide to set up and run it smoothly.
Kaushal Choudhary
Technical Writer
What is Indic TTS?
Indic TTS as the name suggests is a non-autoregressive Text-to-Speech model based on Indic Languages. These Indic Languages include Odia, Punjabi, Tamil, Bangla, Gujarati etc. Generally, TTS models are based on English only or some multilingual models which are international languages, giving less importance to Indic Languages. Indic TTS aims to improve regional language proficiency for TTS systems, thus increasing research and development for India based TTS systems. Indic TTS can be used in the national websites, for efficient transliteration of content available. While Indic TTS is excellent for generating regional language speech, solutions like Waves by smallest.ai offer real-time AI voices that are indistinguishable from humans, providing an alternative for developers looking to deliver high-quality voice synthesis.
Why is Indic TTS important?
- Accessibility: Enables people with visual impairments or literacy, language challenges to access digital content in their native languages.
- Bridging Language Gaps: Connects businesses, educational institutions, and customer support services with a broader audience by enabling communication in regional languages.
Why Use Google Colab for Indic TTS?
Google Colab offers a cloud-based environment with free access to powerful resources like GPUs, making it an ideal platform for running computationally intensive TTS tasks without needing expensive hardware.
- Free access to GPU: Colab provides free GPU access, boosting inference and testing.
- Ease of setup: Setting up Indic TTS in Colab is straightforward, and the platform supports fast prototyping and deployment.
Setting Up Indic TTS on Google Colab
Prerequisites for Running Indic TTS
You can find the full code here. Before diving into the setup, ensure you meet these basic requirements:
- Basic Python knowledge: You’ll need to know Python to modify or run the scripts.
- Familiarity with Colab: Google Colab basics, including how to run cells and upload files.
- Understanding TTS models: It helps to know how TTS systems work, though not mandatory.
Google Colab Basics
If you're new to Google Colab, here's a quick overview:
- Creating a Notebook: Navigate to Google Colab, and click on "New Notebook."
- Navigating the Interface: Colab lets you run Python code in a cell-based structure. Each cell can contain code, which is executed one by one.
Step 1: Cloning the Indic TTS Repository
To get started, clone the Indic TTS repository directly into your Colab environment:
!git clone https://github.com/AI4Bharat/Indic-TTS.git
This will pull the necessary files into Colab, enabling you to work on them without needing local storage.
Step 2: Installing Dependencies
Next, install the required libraries like TensorFlow, PyTorch, and other dependencies needed by Indic TTS.
!pip install jupyter librosa pandas pytorch_lightning scikit-learn seaborn soundfile tqdm TTS tensorboard wandb pyenchant wandb
or just do
!pip install -r requirements.txt
and then change to inference folder, follow these steps:
- Change the pip version
!pip install pip==23.2.1
- Change the
requirements-ml.txt
file in inference folder to this
# constraints for fairseq, works with TTS as well
numba==0.56.4
tts==0.14.3
numpy>=1.23.0
protobuf==3.19.4
TTS
# onnxruntime
# TTS @ git+https://github.com/ashwin-014/TTS@deployment
ai4bharat-transliteration
asteroid
- Run the following commands in different cells.
!pip install -r requirements-ml.txt
!pip install -r requirements-utils.txt
Using Indic TTS on Colab offers fast TTS processing for regional languages, but for projects requiring real-time, human-like voice synthesis, solutions like Waves can offer immediate results with minimal setup.
Step 3: Organizing Models and Weights
The models and weights are released as well, you can find it here. Here, en+hi (English + Hindi) model is used. The en+hi is 1.43 GB zip file containing models and weights required for the inference.
After downloading, follow these important steps: Download using this command, for English + Hindi, you can change according to your language preferences.
!wget https://github.com/AI4Bharat/Indic-TTS/releases/download/v1-checkpoints-release/en+hi.zip
- create a
checkpoints
andmodels
folder in the inference folder of Indic-TTS folder. - Put the content of the extracted zip folder into the checkpoints folder.
- In the models folder, create a
fastpitch\v1\en
folder and place thespeakers.pth
file from the ' fastpitchfolder inside
inference/checkpoints` folder.
Step 4: Running the Indic TTS Model
Let's set up the inference code to run the mode.
import io
from TTS.utils.synthesizer import Synthesizer
from src.inference import TextToSpeechEngine
# Initialize Hindi model
lang = "en+hi"
en_hi_model = Synthesizer(
tts_checkpoint=f'checkpoints/fastpitch/best_model.pth',
tts_config_path=f'checkpoints/fastpitch/config.json',
tts_speakers_file=f'checkpoints/fastpitch/speakers.pth',
# tts_speakers_file=None,
tts_languages_file=None,
vocoder_checkpoint=f'checkpoints/hifigan/best_model.pth',
vocoder_config=f'checkpoints/hifigan/config.json',
encoder_checkpoint="",
encoder_config="",
use_cuda=True,
)
# Setup TTS Engine
models = {
"hi": en_hi_model,
}
engine = TextToSpeechEngine(models)
then,
# Assuming the model is already defined and loaded
# Hindi TTS inference
from scipy.io.wavfile import write as scipy_wav_write
DEFAULT_SAMPLING_RATE = 16000
hindi_raw_audio = engine.infer_from_text(
input_text="सलाम दुनिया",
lang="hi",
speaker_name="male"
)
byte_io = io.BytesIO()
scipy_wav_write(byte_io, DEFAULT_SAMPLING_RATE, hindi_raw_audio)
with open("hindi_audio.wav", "wb") as f:
f.write(byte_io.read())
The generated audio will be in the specified regional language.
Step 5: Saving and Downloading Output Files
After generating the speech, the output file is saved with the extension wav
, you can right-click and download the file into your local computer.
Supported Languages and Use Cases for Indic TTS
Indic TTS supports a wide range of Indian languages such as, Hindi, Bengali, Tamil, Marathi, Telugu. Choose the appropriate model based on the language you want to process.
Use Cases of Indic TTS
Indic TTS has several real-world applications:
- Accessibility: Providing speech-based content for visually impaired users.
- Education: Delivering learning materials in regional languages.
- Customer Support: Enabling companies to support customers in their native tongue.
Troubleshooting Common Issues When Running Indic TTS
When running Indic TTS on Colab, you may encounter errors such as:
- Dependency issues: Ensure all necessary libraries are installed.
- Memory errors: Colab has memory limits; consider using smaller datasets or upgrading to Colab Pro.
Optimizing Performance
For faster TTS generation, consider upgrading to Colab Pro, which provides enhanced GPU access an
Best Practices for Running Indic TTS on Google Colab
Using Google Drive for Persistent Storage
Mounting Google Drive in Colab is essential for saving models and datasets persistently.
from google.colab import drive
drive.mount('/content/drive')
This allows you to save output files and pick up from where you left off.
Managing Colab’s Free Tier Limitations
Colab’s free tier comes with limitations such as runtime limits and restricted GPU access. For extended sessions, upgrade to Colab Pro.
Advantages of Using Indic TTS on Colab
By utilizing Colab’s GPU, Indic TTS tasks are processed quickly, allowing for faster development and testing.
Google Colab is free to use, making it accessible to developers and researchers without the need for expensive hardware.
Conclusion: Getting Started with Indic TTS on Google Colab
Setting up and running Indic TTS on Google Colab is a little sophisticated process, but it provides a great learning experience. The model is S.O.T.A on all the regional languages, creating a new frontier for Indic Language development in TTS. While Indic TTS on Colab is great for batch processing, if you're looking for real-time, high-quality speech generation, try Waves by smallest.ai and experience the power of AI-driven voices that sound human."
FAQs: Running Indic TTS on Google Colab
1. Which Indian languages are supported by Indic TTS?
Most major Indian languages like Hindi, Tamil, and Bengali are supported.
2. Do I need to know Python to run Indic TTS?
Basic Python knowledge is required for modifying or running scripts.
3. Can I run Indic TTS for free on Colab's free tier?
Yes, you can run it for free, but consider upgrading to Colab Pro for better performance.
4. Can I use Waves instead of Indic TTS for real-time text-to-speech?
Yes, Waves by smallest.ai offers real-time text-to-speech generation with AI voices that are indistinguishable from humans. It's a great alternative for users looking for real-time results in both English and other languages."
Recent Blog Posts
Interviews, tips, guides, industry best practices, and news.
How AI TTS Boosts Personalized Marketing for Businesses
Explore how businesses can use AI text-to-speech (TTS) to scale personalized marketing, enhance customer experience, and drive deeper engagement.
Best AI TTS Tools for Podcasts & Audiobooks | Real-Time
Explore the top AI text-to-speech tools for creating podcasts and audiobooks. Find real-time, customizable, and lifelike voices tailored for content creators.
How AI Text to Speech Is Transforming Content Creation
Learn how AI-powered TTS technology transforms content creation with advanced voice synthesis, content personalization, and seamless cross-platform distribution.