timit

Speech dataset

A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.

GitHub

297 stars
8 watching
133 forks
last commit: almost 3 years ago
darpaspeechtimittimit-dataset

Related projects:

Repository Description Stars
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
vt-nlp/multiinstruct A multimodal benchmark dataset designed to evaluate the performance of vision-language foundation models through instruction tuning. 134
kyubyong/css10 A collection of speech datasets for 10 languages to support text-to-speech tasks 467
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1
rodrigopivi/chatito A tool for generating datasets for AI chatbots and natural language processing tasks using a simple domain-specific language. 877
poio-nlp/poio-corpus A collection of language resources extracted from publicly available sources. 7
gabolsgabs/dali A large dataset of synchronized audio, lyrics, and vocal notes created using machine learning 351
thu-coai/cdial-gpt A large-scale Chinese conversation dataset and pre-trained dialog models for text generation 1,799
ynop/audiomate A Python library for handling audio datasets, providing tools for accessing, manipulating, and preparing data for machine learning tasks. 133
paul-rottger/hatecheck-data A dataset of hate speech detection test cases with annotations 57
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 328
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
songys/chatbot_data Data collection and model development for a conversational AI chatbot focused on emotional wellness support in Korean. 357
dbd-research-group/birdset A collection of audio classification datasets for bird sound recognition, including data preparation tools and model training support. 29
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11