| Dataset | Hours | Languages | Word-level Alignments | Source |
|---|---|---|---|---|
| Libri-light | 60k | 1 | Audiobook | |
| GigaSpeech | 10k | 1 | Multi-domain | |
| GigaSpeech 2 | 28k | 3 | Multi-domain | |
| WenetSpeech | 22k | 1 | Multi-domain | |
| WenetSpeech4TTS | 13k | 1 | Multi-domain | |
| MLS | 51k | 8 | Audiobook | |
| Emilia | 101k | 6 | Multi-domain | |
| LEMAS-Dataset (Ours) | 150k | 10 | Multi-domain |