LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

Zhiyuan Zhao, Lijian Lin, Ye Zhu, Kai Xie, Yunfei Liu, Yu Li

International Digital Economy Academy (IDEA)

LEMAS dataset pipeline and benchmark models

LEMAS is a Large-scale Extensible Multilingual Audio Suite, providing multilingual speech corpus (LEMAS-Dataset) with word-level timestamps, covering over 150,000 hours across 10 major languages (zh/en/de/fr/es/pt/it/ru/id/vi). Built with a rigorous alignment and confidence-based filtering pipeline, LEMAS supports diverse generative paradigms including zero-shot multilingual speech synthesis (LEMAS-TTS 0.3B) and seamless speech editing system (LEMAS-Edit).

Comparison with Existing Datasets

Dataset	Hours	Languages	Source
Libri-light	60k	1	Audiobook
GigaSpeech	10k	1	Multi-domain
GigaSpeech 2	28k	3	Multi-domain
WenetSpeech	22k	1	Multi-domain
WenetSpeech4TTS	13k	1	Multi-domain
MLS	51k	8	Audiobook
Emilia	101k	6	Multi-domain
LEMAS-Dataset (Ours)	150k	10	Multi-domain

LEMAS-TTS: Multilingual Zero-shot Synthesis

We showcase zero-shot multilingual and cross-lingual synthesis with a short reference speech clip and reference text, and generate target speech conditioned on the target text. For each example, we provide the reference speech (ground truth) and three target-speech systems: a strong baseline (OpenAudio-S1-mini), our LEMAS-TTS, and LEMAS-TTS + Prosody Encoder.

Example 1

Reference text (EN)

They were set down at a small group of farmhouses.

Reference (Ground Truth)

Target text (EN)

Location data is also some of our most personal information.

Baseline

LEMAS-TTS

TTS+Prosody

Example 2

Reference text (ES)

Y podía regalar a los que a su casa llegasen.

Reference (Ground Truth)

Target text (EN)

City planning economic development and social services together.

Baseline

LEMAS-TTS

TTS+Prosody

Example 3

Reference text (ZH)

梅梅想看看游戏室是什么样子，就擦掉眼泪。

Reference (Ground Truth)

Target text (ZH)

这不是简单的迷信，也不是简单的儒家伦理。

Baseline

LEMAS-TTS

TTS+Prosody

Example 4

Reference text (DE)

Im bereich des netzteiles sogar etwas zu offen, im hinblick auf die isolierung der netzspannung.

Reference (Ground Truth)

Target text (ZH)

甚至有的人总觉得自己怀才不遇，好像世界上的人都辜负自己，都对不起自己。

Baseline

LEMAS-TTS

TTS+Prosody

Example 5

Reference text (DE)

Wir bleiben für immer vereint. wir gegen den rest der welt.

Reference (Ground Truth)

Target text (DE)

Atome, die für uns vorher noch verschlossen waren, nun in unseren lichtkörpern öffnen.

Baseline

LEMAS-TTS

TTS+Prosody

Example 6

Reference text (EN)

For they did not suspect him to be their enemy and he gained it thus.

Reference (Ground Truth)

Target text (DE)

Atome, die für uns vorher noch verschlossen waren, nun in unseren lichtkörpern öffnen.

Baseline

LEMAS-TTS

TTS+Prosody

Example 7

Reference text (ES)

Se ubicaron dentro otras dependencias como la real audiencia, la ca rcel y las cajas reales.

Reference (Ground Truth)

Target text (ES)

Es fundamental llevarla a cabocon el esfuerzo de todos y todas.

Baseline

LEMAS-TTS

TTS+Prosody

Example 8

Reference text (FR)

Cinq étapes pour rédiger votre règlement intérieur!

Reference (Ground Truth)

Target text (ES)

Es fundamental llevarla a cabocon el esfuerzo de todos y todas.

Baseline

LEMAS-TTS

TTS+Prosody

Example 9

Reference text (FR)

S il dit après je suis désolé vous pouvez faire confiance à son je suis désolé.

Reference (Ground Truth)

Target text (FR)

Mourir dans la nature? parce qu avant de mourir, ils ont le temps de se reproduire!

Baseline

LEMAS-TTS

TTS+Prosody

Example 10

Reference text (ID)

Yang beriman di antara kalian beriman di antara kalian.

Reference (Ground Truth)

Target text (FR)

Cinq étapes pour rédiger votre règlement intérieur!

Baseline

LEMAS-TTS

TTS+Prosody

Example 11

Reference text (ID)

Tanggal dan bulan saya tidak bisa dapat yang tau bisa berbagi informasi di kolom komentar.

Reference (Ground Truth)

Target text (ID)

Yang ke tiga, kemudian rukuk dan memanjangkan rukuknya.

Baseline

LEMAS-TTS

TTS+Prosody

Example 12

Reference text (IT)

Condizioni, in determinate situazioni e dovremmo riconoscere solo determinati diritti.

Reference (Ground Truth)

Target text (ID)

Dari para perampok tersebut beruntung anaknya selamat meskipun mengalami luka tembak di bagian.

Baseline

LEMAS-TTS

TTS+Prosody

Example 13

Reference text (IT)

Sarebbero stati studiati i metodi di cura del dottore dei miracoli.

Reference (Ground Truth)

Target text (IT)

E quello era come non effettuare una chiamata di emergenza.

Baseline

LEMAS-TTS

TTS+Prosody

Example 14

Reference text (PT)

E as práticas feministas em âmbitos acadêmicos, artísticos e sociais.

Reference (Ground Truth)

Target text (IT)

No, non l ascoltavo e le dissi, non venire qui con questa roba.

Baseline

LEMAS-TTS

TTS+Prosody

Example 15

Reference text (PT)

Outra coisa que também podemos dizer sobre as terminações é que podem ser chamadas de sufixo.

Reference (Ground Truth)

Target text (PT)

E as práticas feministas em âmbitos acadêmicos, artísticos e sociais.

Baseline

LEMAS-TTS

TTS+Prosody

Example 16

Reference text (RU)

Приветствую вас на канале вкусные рецепты.

Reference (Ground Truth)

Target text (PT)

E não são estas aquelas mesmas águas.

Baseline

LEMAS-TTS

TTS+Prosody

Example 17

Reference text (RU)

Но я не могу сделать то, о чем вы просите.

Reference (Ground Truth)

Target text (RU)

Того, кто ниже ума, мы называем будду, дурачок.

Baseline

LEMAS-TTS

TTS+Prosody

Example 18

Reference text (VI)

Bởi vì gu ti vi và bên đại lý không tìm.

Reference (Ground Truth)

Target text (RU)

Приветствую вас на канале вкусные рецепты.

Baseline

LEMAS-TTS

TTS+Prosody

Example 19

Reference text (VI)

Bàn ăn với mười hai môn đệ và khi các ông đang.

Reference (Ground Truth)

Target text (VI)

Và bì trộn, với cơm tấm. thế là mình có món, cơm tấm bì cho ngày hôm sau.

Baseline

LEMAS-TTS

TTS+Prosody

Example 20

Reference text (ZH)

直到最近，我们需要的大部分东西都是手工制作的。

Reference (Ground Truth)

Target text (VI)

Sau này thiên hạ sẽ quy về dòng họ tư mã. đám trẻ sau khi nghe tin tức này đều sợ hãi.

Baseline

LEMAS-TTS

TTS+Prosody

LEMAS-Edit: Word-level Speech Editing

LEMAS-Edit formulates speech editing as masked token infilling over codec tokens, powered by the word-level timestamps in LEMAS-Dataset. For each example, we modify a specific word or phrase in the original utterance. We show the original recording (ground truth), the edited result from LEMAS-Edit, and a re-synthesized version from LEMAS-TTS.

Example 1

DE: Folgen: wann kommen wir denn in 【hessen】/ 【bayern】 in die situation.

Original

LEMAS-Edit

LEMAS-TTS

Example 2

DE: Vom griechischen und im 【griechischen heißt】/ 【hellenischen heißt】 das ganze optischer regler. und der steht ganz links,

Original

LEMAS-Edit

LEMAS-TTS

Example 3

EN: That i was in fact unable to do any 【literary work】/ 【written work】

Original

LEMAS-Edit

LEMAS-TTS

Example 4

EN: If you use 【quotation marks】/ 【inverted commas】 around the expression that you want to see it will give you exact matches for that expression

Original

LEMAS-Edit

LEMAS-TTS

Example 5

EN: While there are several models of the 【cougar】/ 【panther】 mine resistant ambush protected mrap vehicle the six by six variant with an automatic grenade launcher

Original

LEMAS-Edit

LEMAS-TTS

Example 6

ES: 【Mujer】/ 【señora】 por la que ha dejado a ésta ese hombre?», y sentía, ¿por qué no he de confesarle la verdad?

Original

LEMAS-Edit

LEMAS-TTS

Example 7

ES: Es una 【visión】/ 【perspectiva】 muy triste porque nos define y sienta nuestro destino no?

Original

LEMAS-Edit

LEMAS-TTS

Example 8

ES: Para grabar un 【vídeo】/ 【grabación】 sobre qué opina la gente de los informáticos?

Original

LEMAS-Edit

LEMAS-TTS

Example 9

FR: Vous avez publié en 【sciences humaines et sociales】/ 【sciences sociales】 ? le délai est alors de douze mois.

Original

LEMAS-Edit

LEMAS-TTS

Example 10

FR: Publiés en collaboration avec cette vidéo pour tout savoir sur la 【chlorophylle】/ 【pigment vert】: sa vie, son oeuvre...

Original

LEMAS-Edit

LEMAS-TTS

Example 11

FR: On estime que 95% du trafic internet mondial se fait par ces 【câbles】/ 【fils】 de télécommunication posés au fond des océans.

Original

LEMAS-Edit

LEMAS-TTS

Example 12

IT: Tempi difficili e preoccupanti. la prevista introduzione a livello mondiale di 【passaporti】/ 【documenti di viaggio】

Original

LEMAS-Edit

LEMAS-TTS

Example 13

IT: Non intesa come【etichetta】/【marchio】diagnostica nosografica, ma come processo di conoscenza

Original

LEMAS-Edit

LEMAS-TTS

Example 14

IT: È una specie di via di mezzo fra 【scary movie】/【parody film】 e la evil overlord list,

Original

LEMAS-Edit

LEMAS-TTS

Example 15

PT: Na opção【cartões mbnet】/【cartões virtuais】, podemos ver o "histórico" dos cartões gerados,

Original

LEMAS-Edit

LEMAS-TTS

Example 16

PT: Elevados do que【os vossos】/【teus】, então eu comecei a perceber como

Original

LEMAS-Edit

LEMAS-TTS

Example 17

PT: As palavras:【família da fonte】/【tipo de letra】. vamos clicar nele e selecionarmos a fonte

Original

LEMAS-Edit

LEMAS-TTS

Example 18

ZH: 【道士】/【和尚】也没有再闹了，乖乖的就跟人进去擦药水去了。

Original

LEMAS-Edit

LEMAS-TTS

Example 19

ZH: 又出使【南方】/【北方】的闽东岳等族，希望他们紧随其后，作为支援。

Original

LEMAS-Edit

LEMAS-TTS

Example 20

ZH: 还说了自己去公司看见【夏子涵】/【刘亦菲】给方程送药的事情，方程也生气了。

Original

LEMAS-Edit

LEMAS-TTS

Citation

@article{zhao2026lemas,
  title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
  author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
  journal={arXiv preprint arXiv:2601.04233},
  year={2026}
}