Join our discord for early access to new features!Join discord for early access!Join Now

Sept 12, 20243 min Read

IndicTTS : A Text-to-Speech model for Indian Languages

Learn how to run Indic TTS on Google Colab for quick and seamless text-to-speech generation in Indian languages.

cover image

Kaushal Choudhary

Senior Developer Advocate

cover image

What is Indic TTS?

To develop a Text-to-Speech system focused on Indian languages, IIT Madras, along with the Ministry of Electronics & Information Technology (MeiTY) and 23 institutions created IndicTTS. This project was developed to improve the quality of synthesis and maintaining a small foot print of TTS model, and integrated with disability aids. It will provide continual expansion of TTS to include a broader range of Indian languages and dialects. It is partly used in Bhashini and hosted on UCLA. It is also used in IIIT Hyderabad Canvas and NPTEL.

We are going to use the AI4Bharat IndicTTS, which has the FastPitch Text-to-Speech Model. You can try out the demo here.

Notebook Walk-through

Prerequisites

  • Python Knowledge: Basic Python skills are needed to modify scripts.
  • Colab Familiarity: Understanding how to run cells and upload files.

Step 1: Cloning the Indic TTS Repository

!git clone https://github.com/AI4Bharat/Indic-TTS.git

Step 2: Installing Dependencies

!pip install jupyter librosa pandas pytorch_lightning scikit-learn seaborn soundfile tqdm TTS tensorboard wandb pyenchant wandb

and then cd to inference folder

%cd inference

You would also need to unzip the language model folder; here we would do for en+hi. Find other languages here.

!unzip en+hi.zip

Step 2.1

Change the pip version (for safety, as the imports and installations fail often if not done).

!pip install pip==23.2.1

Step 2.2

Edit the contents of the requirements-ml.txt file in the inference folder.

# constraints for fairseq, works with TTS as well
numba==0.56.4
tts==0.14.3
numpy>=1.23.0
protobuf==3.19.4

TTS
# onnxruntime
# TTS @ git+https://github.com/ashwin-014/TTS@deployment

ai4bharat-transliteration
asteroid

Step 2.3

Then run the following commands.

!pip install -r requirements-ml.txt
!pip install -r requirements-utils.txt

Step 3: Organizing Models and Weights

Here, the en+hi (English + Hindi) model is used. The en+hi is 1.43 GB zip file containing models and weights required for the inference. After downloading, follow these important steps. You can choose other languages as well.

!wget https://github.com/AI4Bharat/Indic-TTS/releases/download/v1-checkpoints-release/en+hi.zip

copy and move this files to these folders.

!cp -r ./en+hi/fastpitch ./checkpoints
!cp -r ./en+hi/hifigan ./checkpoints

and,

!cp checkpoints/fastpitch/speakers.pth ./models/fastpitch/v1/en

Step 4: Setting up the Model

Here, we will setup the configs for models, speaker files, and vocoder files.

import io
from TTS.utils.synthesizer import Synthesizer
from src.inference import TextToSpeechEngine

# Initialize Hindi model
lang = "en+hi"
en_hi_model = Synthesizer(
    tts_checkpoint=f'checkpoints/fastpitch/best_model.pth',
    tts_config_path=f'checkpoints/fastpitch/config.json',
    tts_speakers_file=f'checkpoints/fastpitch/speakers.pth',

# tts_speakers_file=None,
tts_languages_file=None,
    vocoder_checkpoint=f'checkpoints/hifigan/best_model.pth',
    vocoder_config=f'checkpoints/hifigan/config.json',
    encoder_checkpoint="",
    encoder_config="",
    use_cuda=True,
)
# Setup TTS Engine
models = {
    "hi": en_hi_model,
    }

engine = TextToSpeechEngine(models)

Step 5: Inference and Saving the Audio

Now, given all the configurations are set, we can run the inference on our text. Here I am using Hinglish(Hindi + English) language. We can save the generated audio into .wav file and download it to the local computer as well.

from scipy.io.wavfile import write as scipy_wav_write

DEFAULT_SAMPLING_RATE = 16000
hindi_raw_audio = engine.infer_from_text(
        input_text="सलाम दुनिया",
        lang="hi",
        speaker_name="male"
)
byte_io = io.BytesIO()

scipy_wav_write(byte_io, DEFAULT_SAMPLING_RATE, hindi_raw_audio)

with  open("hindi_audio.wav", "wb") as f:
    f.write(byte_io.read())

The generated audio will be in the specified regional language. You can find the full Colab Notebook here.

Conclusion

Running IndicTTS is straightforward, offering simple language selection and configuration setup. This minimizes the challenges of setting up and running models, facilitating smoother development. As a result, it accelerates research and development in Indian language-based Text-to-Speech (TTS) models. For real-time TTS, check out Waves, a system that supports both English and Hindi. You can try it now with free credits.

FAQs

1. Which Indian languages are supported by Indic TTS?

Languages like Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil and Telugu_ are included.

2. Do I need to know Python to run Indic TTS?

Yes, while Python helps run code, you can also try here if you don't know python.

3. Can I run Indic TTS for free on Colab's free tier?

Yes, you can run it for free, but consider upgrading to Colab Pro for better performance.

4. What are the alternatives to Indic TTS that enable real-time text to speech?

Waves by smallest.ai is a real-time text-to-speech system with ultra-realistic voice and minimal latency.