1*a58d3d2aSXin LiThe following datasets can be used to train a language-independent LPCNet model. 2*a58d3d2aSXin LiA good choice is to include all the data from these datasets, except for 3*a58d3d2aSXin Lihi_fi_tts for which only a small subset is recommended (since it's very large 4*a58d3d2aSXin Libut has few speakers). Note that this data typically needs to be resampled 5*a58d3d2aSXin Libefore it can be used. 6*a58d3d2aSXin Li 7*a58d3d2aSXin Lihttps://www.openslr.org/resources/30/si_lk.tar.gz 8*a58d3d2aSXin Lihttps://www.openslr.org/resources/32/af_za.tar.gz 9*a58d3d2aSXin Lihttps://www.openslr.org/resources/32/st_za.tar.gz 10*a58d3d2aSXin Lihttps://www.openslr.org/resources/32/tn_za.tar.gz 11*a58d3d2aSXin Lihttps://www.openslr.org/resources/32/xh_za.tar.gz 12*a58d3d2aSXin Lihttps://www.openslr.org/resources/37/bn_bd.zip 13*a58d3d2aSXin Lihttps://www.openslr.org/resources/37/bn_in.zip 14*a58d3d2aSXin Lihttps://www.openslr.org/resources/41/jv_id_female.zip 15*a58d3d2aSXin Lihttps://www.openslr.org/resources/41/jv_id_male.zip 16*a58d3d2aSXin Lihttps://www.openslr.org/resources/42/km_kh_male.zip 17*a58d3d2aSXin Lihttps://www.openslr.org/resources/43/ne_np_female.zip 18*a58d3d2aSXin Lihttps://www.openslr.org/resources/44/su_id_female.zip 19*a58d3d2aSXin Lihttps://www.openslr.org/resources/44/su_id_male.zip 20*a58d3d2aSXin Lihttps://www.openslr.org/resources/61/es_ar_female.zip 21*a58d3d2aSXin Lihttps://www.openslr.org/resources/61/es_ar_male.zip 22*a58d3d2aSXin Lihttps://www.openslr.org/resources/63/ml_in_female.zip 23*a58d3d2aSXin Lihttps://www.openslr.org/resources/63/ml_in_male.zip 24*a58d3d2aSXin Lihttps://www.openslr.org/resources/64/mr_in_female.zip 25*a58d3d2aSXin Lihttps://www.openslr.org/resources/65/ta_in_female.zip 26*a58d3d2aSXin Lihttps://www.openslr.org/resources/65/ta_in_male.zip 27*a58d3d2aSXin Lihttps://www.openslr.org/resources/66/te_in_female.zip 28*a58d3d2aSXin Lihttps://www.openslr.org/resources/66/te_in_male.zip 29*a58d3d2aSXin Lihttps://www.openslr.org/resources/69/ca_es_female.zip 30*a58d3d2aSXin Lihttps://www.openslr.org/resources/69/ca_es_male.zip 31*a58d3d2aSXin Lihttps://www.openslr.org/resources/70/en_ng_female.zip 32*a58d3d2aSXin Lihttps://www.openslr.org/resources/70/en_ng_male.zip 33*a58d3d2aSXin Lihttps://www.openslr.org/resources/71/es_cl_female.zip 34*a58d3d2aSXin Lihttps://www.openslr.org/resources/71/es_cl_male.zip 35*a58d3d2aSXin Lihttps://www.openslr.org/resources/72/es_co_female.zip 36*a58d3d2aSXin Lihttps://www.openslr.org/resources/72/es_co_male.zip 37*a58d3d2aSXin Lihttps://www.openslr.org/resources/73/es_pe_female.zip 38*a58d3d2aSXin Lihttps://www.openslr.org/resources/73/es_pe_male.zip 39*a58d3d2aSXin Lihttps://www.openslr.org/resources/74/es_pr_female.zip 40*a58d3d2aSXin Lihttps://www.openslr.org/resources/75/es_ve_female.zip 41*a58d3d2aSXin Lihttps://www.openslr.org/resources/75/es_ve_male.zip 42*a58d3d2aSXin Lihttps://www.openslr.org/resources/76/eu_es_female.zip 43*a58d3d2aSXin Lihttps://www.openslr.org/resources/76/eu_es_male.zip 44*a58d3d2aSXin Lihttps://www.openslr.org/resources/77/gl_es_female.zip 45*a58d3d2aSXin Lihttps://www.openslr.org/resources/77/gl_es_male.zip 46*a58d3d2aSXin Lihttps://www.openslr.org/resources/78/gu_in_female.zip 47*a58d3d2aSXin Lihttps://www.openslr.org/resources/78/gu_in_male.zip 48*a58d3d2aSXin Lihttps://www.openslr.org/resources/79/kn_in_female.zip 49*a58d3d2aSXin Lihttps://www.openslr.org/resources/79/kn_in_male.zip 50*a58d3d2aSXin Lihttps://www.openslr.org/resources/80/my_mm_female.zip 51*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/irish_english_male.zip 52*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/midlands_english_female.zip 53*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/midlands_english_male.zip 54*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/northern_english_female.zip 55*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/northern_english_male.zip 56*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/scottish_english_female.zip 57*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/scottish_english_male.zip 58*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/southern_english_female.zip 59*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/southern_english_male.zip 60*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/welsh_english_female.zip 61*a58d3d2aSXin Lihttps://www.openslr.org/resources/83/welsh_english_male.zip 62*a58d3d2aSXin Lihttps://www.openslr.org/resources/86/yo_ng_female.zip 63*a58d3d2aSXin Lihttps://www.openslr.org/resources/86/yo_ng_male.zip 64*a58d3d2aSXin Lihttps://www.openslr.org/resources/109/hi_fi_tts_v0.tar.gz 65*a58d3d2aSXin Li 66*a58d3d2aSXin LiThe corresponding citations for all these datasets are: 67*a58d3d2aSXin Li 68*a58d3d2aSXin Li @inproceedings{demirsahin-etal-2020-open, 69*a58d3d2aSXin Li title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}}, 70*a58d3d2aSXin Li author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara}, 71*a58d3d2aSXin Li booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, 72*a58d3d2aSXin Li month = may, 73*a58d3d2aSXin Li year = {2020}, 74*a58d3d2aSXin Li pages = {6532--6541}, 75*a58d3d2aSXin Li address = {Marseille, France}, 76*a58d3d2aSXin Li publisher = {European Language Resources Association (ELRA)}, 77*a58d3d2aSXin Li url = {https://www.aclweb.org/anthology/2020.lrec-1.804}, 78*a58d3d2aSXin Li ISBN = {979-10-95546-34-4}, 79*a58d3d2aSXin Li } 80*a58d3d2aSXin Li @inproceedings{kjartansson-etal-2020-open, 81*a58d3d2aSXin Li title = {{Open-Source High Quality Speech Datasets for Basque, Catalan and Galician}}, 82*a58d3d2aSXin Li author = {Kjartansson, Oddur and Gutkin, Alexander and Butryna, Alena and Demirsahin, Isin and Rivera, Clara}, 83*a58d3d2aSXin Li booktitle = {Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)}, 84*a58d3d2aSXin Li year = {2020}, 85*a58d3d2aSXin Li pages = {21--27}, 86*a58d3d2aSXin Li month = may, 87*a58d3d2aSXin Li address = {Marseille, France}, 88*a58d3d2aSXin Li publisher = {European Language Resources association (ELRA)}, 89*a58d3d2aSXin Li url = {https://www.aclweb.org/anthology/2020.sltu-1.3}, 90*a58d3d2aSXin Li ISBN = {979-10-95546-35-1}, 91*a58d3d2aSXin Li } 92*a58d3d2aSXin Li 93*a58d3d2aSXin Li 94*a58d3d2aSXin Li @inproceedings{guevara-rukoz-etal-2020-crowdsourcing, 95*a58d3d2aSXin Li title = {{Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech}}, 96*a58d3d2aSXin Li author = {Guevara-Rukoz, Adriana and Demirsahin, Isin and He, Fei and Chu, Shan-Hui Cathy and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Gutkin, Alexander and Butryna, Alena and Kjartansson, Oddur}, 97*a58d3d2aSXin Li booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, 98*a58d3d2aSXin Li year = {2020}, 99*a58d3d2aSXin Li month = may, 100*a58d3d2aSXin Li address = {Marseille, France}, 101*a58d3d2aSXin Li publisher = {European Language Resources Association (ELRA)}, 102*a58d3d2aSXin Li url = {https://www.aclweb.org/anthology/2020.lrec-1.801}, 103*a58d3d2aSXin Li pages = {6504--6513}, 104*a58d3d2aSXin Li ISBN = {979-10-95546-34-4}, 105*a58d3d2aSXin Li } 106*a58d3d2aSXin Li @inproceedings{he-etal-2020-open, 107*a58d3d2aSXin Li title = {{Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems}}, 108*a58d3d2aSXin Li author = {He, Fei and Chu, Shan-Hui Cathy and Kjartansson, Oddur and Rivera, Clara and Katanova, Anna and Gutkin, Alexander and Demirsahin, Isin and Johny, Cibu and Jansche, Martin and Sarin, Supheakmungkol and Pipatsrisawat, Knot}, 109*a58d3d2aSXin Li booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, 110*a58d3d2aSXin Li month = may, 111*a58d3d2aSXin Li year = {2020}, 112*a58d3d2aSXin Li address = {Marseille, France}, 113*a58d3d2aSXin Li publisher = {European Language Resources Association (ELRA)}, 114*a58d3d2aSXin Li pages = {6494--6503}, 115*a58d3d2aSXin Li url = {https://www.aclweb.org/anthology/2020.lrec-1.800}, 116*a58d3d2aSXin Li ISBN = "{979-10-95546-34-4}", 117*a58d3d2aSXin Li } 118*a58d3d2aSXin Li 119*a58d3d2aSXin Li 120*a58d3d2aSXin Li @inproceedings{kjartansson-etal-tts-sltu2018, 121*a58d3d2aSXin Li title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}}, 122*a58d3d2aSXin Li author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin}, 123*a58d3d2aSXin Li booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)}, 124*a58d3d2aSXin Li year = {2018}, 125*a58d3d2aSXin Li address = {Gurugram, India}, 126*a58d3d2aSXin Li month = aug, 127*a58d3d2aSXin Li pages = {66--70}, 128*a58d3d2aSXin Li URL = {http://dx.doi.org/10.21437/SLTU.2018-14} 129*a58d3d2aSXin Li } 130*a58d3d2aSXin Li 131*a58d3d2aSXin Li 132*a58d3d2aSXin Li @inproceedings{oo-etal-2020-burmese, 133*a58d3d2aSXin Li title = {{Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech}}, 134*a58d3d2aSXin Li author = {Oo, Yin May and Wattanavekin, Theeraphol and Li, Chenfang and De Silva, Pasindu and Sarin, Supheakmungkol and Pipatsrisawat, Knot and Jansche, Martin and Kjartansson, Oddur and Gutkin, Alexander}, 135*a58d3d2aSXin Li booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)}, 136*a58d3d2aSXin Li month = may, 137*a58d3d2aSXin Li year = {2020}, 138*a58d3d2aSXin Li pages = "6328--6339", 139*a58d3d2aSXin Li address = {Marseille, France}, 140*a58d3d2aSXin Li publisher = {European Language Resources Association (ELRA)}, 141*a58d3d2aSXin Li url = {https://www.aclweb.org/anthology/2020.lrec-1.777}, 142*a58d3d2aSXin Li ISBN = {979-10-95546-34-4}, 143*a58d3d2aSXin Li } 144*a58d3d2aSXin Li @inproceedings{van-niekerk-etal-2017, 145*a58d3d2aSXin Li title = {{Rapid development of TTS corpora for four South African languages}}, 146*a58d3d2aSXin Li author = {Daniel van Niekerk and Charl van Heerden and Marelie Davel and Neil Kleynhans and Oddur Kjartansson and Martin Jansche and Linne Ha}, 147*a58d3d2aSXin Li booktitle = {Proc. Interspeech 2017}, 148*a58d3d2aSXin Li pages = {2178--2182}, 149*a58d3d2aSXin Li address = {Stockholm, Sweden}, 150*a58d3d2aSXin Li month = aug, 151*a58d3d2aSXin Li year = {2017}, 152*a58d3d2aSXin Li URL = {http://dx.doi.org/10.21437/Interspeech.2017-1139} 153*a58d3d2aSXin Li } 154*a58d3d2aSXin Li 155*a58d3d2aSXin Li @inproceedings{gutkin-et-al-yoruba2020, 156*a58d3d2aSXin Li title = {{Developing an Open-Source Corpus of Yoruba Speech}}, 157*a58d3d2aSXin Li author = {Alexander Gutkin and I{\c{s}}{\i}n Demir{\c{s}}ahin and Oddur Kjartansson and Clara Rivera and K\d{\'o}lá Túb\d{\`o}sún}, 158*a58d3d2aSXin Li booktitle = {Proceedings of Interspeech 2020}, 159*a58d3d2aSXin Li pages = {404--408}, 160*a58d3d2aSXin Li month = {October}, 161*a58d3d2aSXin Li year = {2020}, 162*a58d3d2aSXin Li address = {Shanghai, China}, 163*a58d3d2aSXin Li publisher = {International Speech and Communication Association (ISCA)}, 164*a58d3d2aSXin Li doi = {10.21437/Interspeech.2020-1096}, 165*a58d3d2aSXin Li url = {http://dx.doi.org/10.21437/Interspeech.2020-1096}, 166*a58d3d2aSXin Li } 167*a58d3d2aSXin Li 168*a58d3d2aSXin Li@article{bakhturina2021hi, 169*a58d3d2aSXin Li title={{Hi-Fi Multi-Speaker English TTS Dataset}}, 170*a58d3d2aSXin Li author={Bakhturina, Evelina and Lavrukhin, Vitaly and Ginsburg, Boris and Zhang, Yang}, 171*a58d3d2aSXin Li journal={arXiv preprint arXiv:2104.01497}, 172*a58d3d2aSXin Li year={2021} 173*a58d3d2aSXin Li} 174