timit
Speech dataset
A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
294 stars
8 watching
131 forks
last commit: over 2 years ago darpaspeechtimittimit-dataset
Related projects:
Repository | Description | Stars |
---|---|---|
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
vt-nlp/multiinstruct | A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. | 133 |
kyubyong/css10 | A collection of speech datasets for 10 languages to support text-to-speech tasks | 465 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |
rodrigopivi/chatito | A tool for generating datasets for AI chatbots and natural language processing tasks using a simple domain-specific language. | 876 |
poio-nlp/poio-corpus | A collection of language resources extracted from publicly available sources. | 7 |
gabolsgabs/dali | A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning | 349 |
thu-coai/cdial-gpt | A large-scale Chinese conversation dataset and pre-trained dialog models for text generation | 1,782 |
ynop/audiomate | A Python library for handling audio datasets, providing tools for accessing, manipulating, and preparing data for machine learning tasks. | 131 |
paul-rottger/hatecheck-data | A dataset of hate speech detection test cases with annotations | 56 |
michael-wzhu/promptcblue | A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain | 323 |
mirfan899/urdu | A collection of Urdu language datasets for various NLP tasks and applications | 71 |
songys/chatbot_data | Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean. | 355 |
dbd-research-group/birdset | A comprehensive benchmark dataset collection for audio classification in avian bioacoustics, aiming to advance bird sound classification by providing diverse real-world evaluation use cases. | 25 |
certainlyio/corona_dataset | A collection of data to train chatbots on COVID-19-related questions | 11 |