Ressources for End-to-End French Text-to-Speech Blizzard challenge

From MaRDI portal



DOI10.5281/zenodo.13918615Zenodo13918615MaRDI QIDQ6724146

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 11 October 2024

Copyright license: No records found.



Here are 289 chapters of 5 audiobooks from Librivox (51:12) read by Nadine Eckert-Boulet (NEB): Madame Bovary (MB) by Gustave Flaubert (FL) - 3 volumes, 35 chapters(original wavs; text) Les mystres de Paris (LMP) by Eugene Sue (ES) - 4 volumes, 83 chapters (original wavs1, wavs2, wavs3; text1, text2, text3) Les tribulations d'un chinois en Chine (TCC) by Jules Verne (JV) - 1 volume, 22 chapters (original wavs; text) La fille du pirate (LFDP) by Henri Émile Chevalier (EC) - 7 volumes, 121 chapters (original wavs, text) La vampire (VAMP) by Paul Fval (PF) - 1 volume, 28 chapters (original wavs, text) and 2515 utterances (2:03) read by another female French speaker Aurlie Derbier (AD): 1608 utterances extracted from various books (DIVERS_BOOK_AD*) 907 transcripts of the sessions of the French parliament (DIVERS_PARL_01*) We recently added three speakers from Librivox/Litteratureaudio: Ezwa (EZWA): L'pouvante by Maurice Level (originalwavs; text) - 11 chapters - 4869 utterances 03:16 Pauline Latournerie (PL): Le pdagogue n'aime pas les enfants by Henri Roorda (original wavs; text) - 6 chapters - 1320 utterances 01:17 Jean-Luc Fischer (JLF): LAffaire Charles Dexter Ward by Howard Phillips Lovecraft (original wavs; text) - 16 chapters - 1823 utterances 02:37 Each .wav file (sampled at 22050Hz) corresponds to one entire chapter. The format of the filenames is:{author's acronym}_{book's acronym}_{reader's acronym}_{volume's number}_{chapter's number} The NEB_train.csv file gives text and phonetic alignments (essentially for MB and LMP) for utterances in 4 fields separated by '|':{filename}|{start_ms}|{end_ms}|{text or phonetic content}. Most utterances are separated by at least a pause of 400ms. The intervals [start_ms:end_ms] comprise leading and trailing silences of 130ms (since wavs are entire chapters, these silences are "true" ambient silences). Same for AD_train.csv. When phonetic alignment has been performed, 2 additional fields have been added: {aligned phones}|{durations in ms}. Each input character or phone has a corresponding aligned phone and a duration. Note that all aligned utterances start and end with an aligned phone of 130ms. The set of aligned phones comprises: The set of input phones The silence: '__' The symbol'_'for silent characters, e.g. "chat" is aligned with's^ _ a _' 29 combinedaligned phones ('ai', 'aj', 'bq', 'dq','dz', 'dz^', 'fq', 'gq', 'gz', 'ji', 'ju', 'jq', 'ij', 'kq', 'ks', 'ksq', 'lq', 'mq', 'nq','rw', 'rq', 'sq', 'tq', 'ts', 'ts^', 'wa', 'zq', 'pq') that align to only onecharacter,e.g. "expatrier" is aligned with'e^ ks p a tr ij e _' Text is in UTF8. ,, '~','""','()','[]' are respectively used for speaking quotes, turn switches, three dots, quoted expression, aside quotes, notes. Because of rare occurrences, has been transcribed as 'oe'. Paragraphs (two consecutive carriage returns in the original text) are cued by a special character . It usually ends an utterance but could be used within an utterance if its associated pause is too short. When available, phonetic content is given per word in curly brackets '{}'. We use 39 phonetic symbols: oral vowels: a (fa), e (fe), e^ (fait), x (feu), x^ (coeur), i (riz), y (fut), u (fou), o (faux), o^ (porc) schwa: q (gage) nasal vowels: a~ (rang), e~ (fin), x~ (un), o~ (rond) semi-vowels: h (huit), w (ouate), j (hier) consonants: p (pas), t (tas), k (cas), b (bas), d (dos), g (gars), f (faux), s (sot) , s^ (chat), v (vu), z (zut), z^ (jus), r (riz), l (la), m (ma), n (non), n~ (oignon), ng (camping)






This page was built for dataset: Ressources for End-to-End French Text-to-Speech Blizzard challenge