[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

TL;DR:
Fine-tuned Chatterbox-Multilingual (Resemble AI's open-source TTS) to support Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi using LoRA adapters + tokenizer extension. Only 7.8M / 544M parameters trained. Model + audio samples available.

---

The Problem

Chatterbox-Multilingual supports 23 languages with zero-shot voice cloning, but no Dravidian languages (Telugu, Kannada, Tamil, Malayalam) and limited Indo-Aryan coverage beyond Hindi. That's 500M+ speakers with no representation.

The conventional approach would be: build G2P (grapheme-to-phoneme) for each language, retrain the full model, spend months on it. Hindi schwa deletion alone is an unsolved problem. Bengali G2P is notoriously hard.

The Approach

Instead of phonemes, I went grapheme-level:

Extended the BPE tokenizer with Indic script characters (2454 → 2871 tokens). Telugu, Kannada, Bengali, Tamil, Malayalam, Gujarati graphemes added alongside their existing Devanagari.

Brahmic warm-start
— Initialized new character embeddings from phonetically equivalent Devanagari characters. Telugu "క" (ka) gets initialized from Hindi "क" (ka). This works because Brahmic scripts share phonetic structure — same sounds, different glyphs. The model starts with a reasonable prior instead of random noise.

LoRA on T3 backbone
— Rank-32 adapters on q/k/v/o projections of the Llama-based T3 module. ~7.8M trainable params (1.4% of 544M total). Everything else frozen: vocoder (S3Gen), speaker encoder, speech tokenizer.
Incremental language training
— Added languages one at a time with weighted sampling. Started with Hindi-only (validate pipeline), then Telugu+Hindi, then Kannada+Telugu+Hindi, finally all 8 languages. This prevents catastrophic forgetting — Hindi CER actually improved after adding 7 new languages.

Results

CER (Character Error Rate) via Whisper large-v3 ASR on 100 held-out samples per language:

Language	CER	Notes
Hindi	0.1058	Improved from 0.29 baseline
Kannada	0.1434
Tamil	0.1608
Marathi	0.1976
Gujarati	0.2377
Bengali	0.2450
Telugu	0.2853
Malayalam	0.8593	Experimental — needs more data

Malayalam struggles significantly. Likely needs more training data or a dedicated round. The rest produce intelligible, natural-sounding speech.

What Didn't Work / Limitations

-
Malayalam
— CER 0.86 is essentially unintelligible. Possibly the script complexity (many conjuncts) or insufficient data.
-
No MOS evaluation yet
— CER tells you the words are right, not that it sounds natural. Subjective eval is pending.
-
2 speakers per language
— Male + female from IndicTTS. Won't generalize to all voice types.
-
No code-mixing
— Hindi+English mixed sentences not specifically trained yet.

Links

-
Model + audio samples:
https://huggingface.co/reenigne314/chatterbox-indic-lora
-
Article (full writeup):
https://theatomsofai.substack.com/p/teaching-an-ai-to-speak-indian-languages
-
Base model:
[ResembleAI/chatterbox](
https://github.com/resemble-ai/chatterbox
) (MIT license)

Quick Start

```python
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

model = ChatterboxMultilingualTTS.from_indic_lora(device="cuda", speaker="te_female")
wav = model.generate("నమస్కారం, మీరు ఎలా ఉన్నారు?", language_id="te")
```

Training Details

- Hardware: 1x RTX PRO 6000 Blackwell (96GB)
- Data: SPRINGLab IndicTTS + ai4bharat Rasa
- 6 training rounds, incremental language addition
- LoRA rank 32, alpha 64, bf16

Part 2 (technical deep-dive with code) coming this week. Happy to answer questions about the approach.

submitted by /u/Icy_Gas8807
[link] [comments]

[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

Want to read more?

Tagged with