Chatito

Dataset generator

A tool for generating datasets for AI chatbots and natural language processing tasks using a simple domain-specific language.

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

GitHub

877 stars
28 watching
156 forks
Language: TypeScript
last commit: about 1 year ago
Linked from 1 awesome list

chatbotchatbotschatitodatasetdataset-generationnamed-entity-recognitionnlgnlpnlutext-classification

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
candlewill/dialog_corpus A collection of datasets used to train and improve chatbot systems in both English and Chinese. 2,033
radi-cho/datasetgpt A command-line interface to generate textual datasets with Large Language Models 293
poio-nlp/poio-corpus A collection of language resources extracted from publicly available sources. 7
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11
maluuba/geneva_datasets Scripts to generate datasets for an image generation task using Generative Adversarial Networks and deep learning techniques 37
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 56
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
botman/studio A bundle of tools and testing environment for developing chatbots using the Laravel PHP framework. 330
instancio/instancio Automates object creation and population with customizable data generation, reuse, and external feed integration for unit testing. 930
chatopera/insuranceqa-corpus-zh An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. 1,020
pharo-ai/datasets A Smalltalk library for loading and managing datasets as data frames. 9
mirfan899/urdu A collection of Urdu language datasets for various NLP tasks and applications 71
ifttt/polo Tool generates sample data from database models for testing and development purposes 776
philipperemy/timit A collection of acoustic and phonetic speech data designed for training and evaluating automatic speech recognition systems 294
abbey4799/cutegpt A conversational language model developed to improve understanding of complex instructions and Chinese vocabulary. 62