timit

Speech dataset

A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.

GitHub

294 stars
8 watching
131 forks
last commit: over 2 years ago
darpaspeechtimittimit-dataset

Related projects:

Repository Description Stars
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
vt-nlp/multiinstruct A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. 133
kyubyong/css10 A collection of speech datasets for 10 languages to support text-to-speech tasks 465
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1
rodrigopivi/chatito A tool for generating datasets for AI chatbots and natural language processing tasks using a simple domain-specific language. 876
poio-nlp/poio-corpus A collection of language resources extracted from publicly available sources. 7
gabolsgabs/dali A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning 349
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,782
ynop/audiomate A Python library for handling audio datasets, providing tools for accessing, manipulating, and preparing data for machine learning tasks. 131
paul-rottger/hatecheck-data A dataset of hate speech detection test cases with annotations 56
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 323
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
songys/chatbot_data Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean. 355
dbd-research-group/birdset A comprehensive benchmark dataset collection for audio classification in avian bioacoustics, aiming to advance bird sound classification by providing diverse real-world evaluation use cases. 25
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11