css10

Speech datasets

A collection of speech datasets for 10 languages to support text-to-speech tasks

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

GitHub

467 stars
23 watching
60 forks
Language: HTML
last commit: almost 5 years ago
Linked from 1 awesome list

datasetspeechspeech-to-text

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kyubyong/wordvectors Provides pre-trained word vectors for multiple languages to facilitate NLP tasks 2,216
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 8
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
gabolsgabs/dali A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning 351
crownpku/small-chinese-corpus A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering. 529
fwang91/imdb-face A large-scale noise-controlled face recognition dataset designed to study the impact of data noise on recognition accuracy. 433
philipperemy/timit A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems 296
nytud/pws A collection of parallel corpora of Winograd schemata in multiple languages 0
candlewill/dialog_corpus A collection of datasets used to train and improve chatbot systems in both English and Chinese. 2,033
ufal-dsg/alex_context_nlg_dataset A dataset for training natural language generation models in dialogue systems by incorporating context information. 23
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
flagopen/flaginstruct A collection of diverse instruction corpora for improving the development and tuning of Chinese Language Models 173
nytud/husst A dataset of annotated sentences for training and evaluating sentiment analysis models in the Hungarian language. 1
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1