css10
Speech datasets
A collection of speech datasets for 10 languages to support text-to-speech tasks
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
465 stars
23 watching
60 forks
Language: HTML
last commit: over 4 years ago
Linked from 1 awesome list
datasetspeechspeech-to-text
Related projects:
Repository | Description | Stars |
---|---|---|
kyubyong/wordvectors | Provides pre-trained word vectors for multiple languages to facilitate NLP tasks | 2,215 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 9 |
matbahasa/talpco | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
gabolsgabs/dali | A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning | 349 |
crownpku/small-chinese-corpus | A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. | 531 |
fwang91/imdb-face | A large-scale noise-controlled face recognition dataset designed to study the impact of data noise on recognition accuracy. | 431 |
philipperemy/timit | A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems | 294 |
nytud/pws | A collection of parallel corpora of Winograd schemata in multiple languages | 0 |
candlewill/dialog_corpus | A collection of datasets used to train and improve chatbot systems in both English and Chinese. | 2,033 |
ufal-dsg/alex_context_nlg_dataset | A dataset for training natural language generation models in dialogue systems by incorporating context information. | 23 |
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |
flagopen/flaginstruct | A collection of diverse instruction corpora for improving the development and tuning of Chinese Language Models | 173 |
nytud/husst | A dataset and benchmarking kit for evaluating language understanding in Hungarian | 1 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |