timit

Speech dataset

A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.

GitHub

297 stars

8 watching

133 forks

last commit: over 4 years ago

darpaspeechtimittimit-dataset

Related projects:

Repository	Description	Stars
karthikncode/nlp-datasets	A curated list of Natural Language Processing datasets used to train and evaluate NLP models.	919
vt-nlp/multiinstruct	A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning.	134
kyubyong/css10	A collection of speech datasets for 10 languages to support text-to-speech tasks	467
nytud/happ	A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms	1
rodrigopivi/chatito	A tool for generating datasets for AI chatbots and natural language processing tasks using a simple domain-specific language.	877
poio-nlp/poio-corpus	A collection of language resources extracted from publicly available sources.	7
gabolsgabs/dali	A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning	351
thu-coai/cdial-gpt	A large-scale Chinese conversation dataset and pre-trained dialog models for text generation	1,799
ynop/audiomate	A Python library for handling audio datasets, providing tools for accessing, manipulating, and preparing data for machine learning tasks.	133
paul-rottger/hatecheck-data	A dataset of hate speech detection test cases with annotations	57
michael-wzhu/promptcblue	A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain	328
mirfan899/urdu	A collection of Urdu language datasets for various NLP tasks and applications	71
songys/chatbot_data	Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean.	357
dbd-research-group/birdset	A collection of audio classification datasets for bird sound recognition, including data preparation tools and model training support.	29
certainlyio/corona_dataset	A collection of data to train chatbots on COVID-19-related questions	11