css10

Speech datasets

A collection of speech datasets for 10 languages to support text-to-speech tasks

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

GitHub

467 stars

23 watching

60 forks

Language: HTML

last commit: over 6 years ago

Linked from 1 awesome list

datasetspeechspeech-to-text

Backlinks from these awesome lists:

oroszgy/awesome-hungarian-nlp

Related projects:

Repository	Description	Stars
kyubyong/wordvectors	Provides pre-trained word vectors for multiple languages to facilitate NLP tasks	2,216
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
matbahasa/talpco	A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research.	49
karthikncode/nlp-datasets	A curated list of Natural Language Processing datasets used to train and evaluate NLP models.	919
gabolsgabs/dali	A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning	351
crownpku/small-chinese-corpus	A collection of datasets and tools for NLP tasks on Chinese texts, including part-of-speech tagging, named entity recognition, and question answering.	529
fwang91/imdb-face	A large-scale noise-controlled face recognition dataset designed to study the impact of data noise on recognition accuracy.	433
philipperemy/timit	A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems	297
nytud/pws	A collection of parallel corpora of Winograd schemata in multiple languages	0
candlewill/dialog_corpus	A collection of datasets used to train and improve chatbot systems in both English and Chinese.	2,033
ufal-dsg/alex_context_nlg_dataset	A dataset for training natural language generation models in dialogue systems by incorporating context information.	23
mirfan899/urdu	A collection of Urdu language datasets for various NLP tasks and applications	71
flagopen/flaginstruct	A collection of diverse instruction corpora for improving the development and tuning of Chinese Language Models	173
nytud/husst	A dataset of annotated sentences for training and evaluating sentiment analysis models in the Hungarian language.	1
nytud/happ	A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms	1