Sept 12, 20245 min Read

Running Indic TTS on Colab: Fast and Easy Text-to-Speech

Learn how to run Indic TTS on Google Colab for fast and easy text-to-speech generation in Indian languages. Get a step-by-step guide to set up and run it smoothly.

cover image

Kaushal Choudhary

Technical Writer

cover image

What is Indic TTS?

Indic TTS as the name suggests is a non-autoregressive Text-to-Speech model based on Indic Languages. These Indic Languages include Odia, Punjabi, Tamil, Bangla, Gujarati etc. Generally, TTS models are based on English only or some multilingual models which are international languages, giving less importance to Indic Languages. Indic TTS aims to improve regional language proficiency for TTS systems, thus increasing research and development for India based TTS systems. Indic TTS can be used in the national websites, for efficient transliteration of content available. While Indic TTS is excellent for generating regional language speech, solutions like Waves by smallest.ai offer real-time AI voices that are indistinguishable from humans, providing an alternative for developers looking to deliver high-quality voice synthesis.

Why is Indic TTS important?

  • Accessibility: Enables people with visual impairments or literacy, language challenges to access digital content in their native languages.
  • Bridging Language Gaps: Connects businesses, educational institutions, and customer support services with a broader audience by enabling communication in regional languages.

Why Use Google Colab for Indic TTS?

Google Colab offers a cloud-based environment with free access to powerful resources like GPUs, making it an ideal platform for running computationally intensive TTS tasks without needing expensive hardware.

  • Free access to GPU: Colab provides free GPU access, boosting inference and testing.
  • Ease of setup: Setting up Indic TTS in Colab is straightforward, and the platform supports fast prototyping and deployment.

Setting Up Indic TTS on Google Colab

Prerequisites for Running Indic TTS

You can find the full code here. Before diving into the setup, ensure you meet these basic requirements:

  • Basic Python knowledge: You’ll need to know Python to modify or run the scripts.
  • Familiarity with Colab: Google Colab basics, including how to run cells and upload files.
  • Understanding TTS models: It helps to know how TTS systems work, though not mandatory.

Google Colab Basics

If you're new to Google Colab, here's a quick overview:

  1. Creating a Notebook: Navigate to Google Colab, and click on "New Notebook."
  2. Navigating the Interface: Colab lets you run Python code in a cell-based structure. Each cell can contain code, which is executed one by one.

Step 1: Cloning the Indic TTS Repository

To get started, clone the Indic TTS repository directly into your Colab environment:

!git clone https://github.com/AI4Bharat/Indic-TTS.git

This will pull the necessary files into Colab, enabling you to work on them without needing local storage.

Step 2: Installing Dependencies

Next, install the required libraries like TensorFlow, PyTorch, and other dependencies needed by Indic TTS.

!pip install jupyter librosa pandas pytorch_lightning scikit-learn seaborn soundfile tqdm TTS tensorboard wandb pyenchant wandb

or just do

!pip install -r requirements.txt

and then change to inference folder, follow these steps:

  1. Change the pip version
!pip install pip==23.2.1
  1. Change the requirements-ml.txt file in inference folder to this
# constraints for fairseq, works with TTS as well
numba==0.56.4
tts==0.14.3
numpy>=1.23.0
protobuf==3.19.4

TTS
# onnxruntime
# TTS @ git+https://github.com/ashwin-014/TTS@deployment

ai4bharat-transliteration
asteroid
  1. Run the following commands in different cells.
!pip install -r requirements-ml.txt
!pip install -r requirements-utils.txt

Using Indic TTS on Colab offers fast TTS processing for regional languages, but for projects requiring real-time, human-like voice synthesis, solutions like Waves can offer immediate results with minimal setup.

Step 3: Organizing Models and Weights

The models and weights are released as well, you can find it here. Here, en+hi (English + Hindi) model is used. The en+hi is 1.43 GB zip file containing models and weights required for the inference.

After downloading, follow these important steps: Download using this command, for English + Hindi, you can change according to your language preferences.

!wget https://github.com/AI4Bharat/Indic-TTS/releases/download/v1-checkpoints-release/en+hi.zip
  1. create a checkpoints and models folder in the inference folder of Indic-TTS folder.
  2. Put the content of the extracted zip folder into the checkpoints folder.
  3. In the models folder, create a fastpitch\v1\en folder and place the speakers.pth file from the ' fastpitchfolder insideinference/checkpoints` folder.

Step 4: Running the Indic TTS Model

Let's set up the inference code to run the mode.

import io
from TTS.utils.synthesizer import Synthesizer
from src.inference import TextToSpeechEngine

# Initialize Hindi model
lang = "en+hi"
en_hi_model = Synthesizer(
tts_checkpoint=f'checkpoints/fastpitch/best_model.pth',
tts_config_path=f'checkpoints/fastpitch/config.json',
tts_speakers_file=f'checkpoints/fastpitch/speakers.pth',

# tts_speakers_file=None,
tts_languages_file=None,
vocoder_checkpoint=f'checkpoints/hifigan/best_model.pth',
vocoder_config=f'checkpoints/hifigan/config.json',
encoder_checkpoint="",
encoder_config="",
use_cuda=True,
)
# Setup TTS Engine
models = {

"hi": en_hi_model,

}
engine = TextToSpeechEngine(models)

then,

# Assuming the model is already defined and loaded
# Hindi TTS inference
from scipy.io.wavfile import write as scipy_wav_write
DEFAULT_SAMPLING_RATE = 16000
hindi_raw_audio = engine.infer_from_text(
input_text="सलाम दुनिया",
lang="hi",
speaker_name="male"
)
byte_io = io.BytesIO()
scipy_wav_write(byte_io, DEFAULT_SAMPLING_RATE, hindi_raw_audio)

with  open("hindi_audio.wav", "wb") as f:
f.write(byte_io.read())

The generated audio will be in the specified regional language.

Step 5: Saving and Downloading Output Files

After generating the speech, the output file is saved with the extension wav, you can right-click and download the file into your local computer.

Supported Languages and Use Cases for Indic TTS

Indic TTS supports a wide range of Indian languages such as, Hindi, Bengali, Tamil, Marathi, Telugu. Choose the appropriate model based on the language you want to process.

Use Cases of Indic TTS

Indic TTS has several real-world applications:

  • Accessibility: Providing speech-based content for visually impaired users.
  • Education: Delivering learning materials in regional languages.
  • Customer Support: Enabling companies to support customers in their native tongue.

Troubleshooting Common Issues When Running Indic TTS

When running Indic TTS on Colab, you may encounter errors such as:

  • Dependency issues: Ensure all necessary libraries are installed.
  • Memory errors: Colab has memory limits; consider using smaller datasets or upgrading to Colab Pro.

Optimizing Performance

For faster TTS generation, consider upgrading to Colab Pro, which provides enhanced GPU access an

Best Practices for Running Indic TTS on Google Colab

Using Google Drive for Persistent Storage

Mounting Google Drive in Colab is essential for saving models and datasets persistently.

from google.colab import drive
drive.mount('/content/drive')

This allows you to save output files and pick up from where you left off.

Managing Colab’s Free Tier Limitations

Colab’s free tier comes with limitations such as runtime limits and restricted GPU access. For extended sessions, upgrade to Colab Pro.

Advantages of Using Indic TTS on Colab

By utilizing Colab’s GPU, Indic TTS tasks are processed quickly, allowing for faster development and testing.

Google Colab is free to use, making it accessible to developers and researchers without the need for expensive hardware.

Conclusion: Getting Started with Indic TTS on Google Colab

Setting up and running Indic TTS on Google Colab is a little sophisticated process, but it provides a great learning experience. The model is S.O.T.A on all the regional languages, creating a new frontier for Indic Language development in TTS. While Indic TTS on Colab is great for batch processing, if you're looking for real-time, high-quality speech generation, try Waves by smallest.ai and experience the power of AI-driven voices that sound human."

FAQs: Running Indic TTS on Google Colab

1. Which Indian languages are supported by Indic TTS?

Most major Indian languages like Hindi, Tamil, and Bengali are supported.

2. Do I need to know Python to run Indic TTS?

Basic Python knowledge is required for modifying or running scripts.

3. Can I run Indic TTS for free on Colab's free tier?

Yes, you can run it for free, but consider upgrading to Colab Pro for better performance.

4. Can I use Waves instead of Indic TTS for real-time text-to-speech?

Yes, Waves by smallest.ai offers real-time text-to-speech generation with AI voices that are indistinguishable from humans. It's a great alternative for users looking for real-time results in both English and other languages."