AlpacaDataCleaned

Language data set

A cleaned and curated version of an Alpaca dataset used to train a large language model

Alpaca dataset from Stanford, cleaned and curated

GitHub

2k stars
27 watching
153 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alvations/seedling A corpus and API for human language data 11
carbonz0/alpaca-chinese-dataset A dataset for training and fine-tuning large language models on Chinese text prompts. 390
pointnetwork/point-alpaca Recreated weights from Stanford Alpaca model fine-tuned for specific task 406
alvations/sugarlike A tool that identifies languages in text by comparing them to a reference set of patterns. 1
alvations/sugali A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. 2
google-research/flan A repository providing tools and datasets to fine-tune language models for specific tasks 1,474
code-kern-ai/refinery A tool to help data scientists manage and annotate natural language data for training AI models 1,402
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
datacanvasio/alaya A pre-trained conversational AI model with high-quality training data and fine-tuned for various tasks such as question answering, code generation, and text summarization. 43
airaria/visual-chinese-llama-alpaca Develops a multimodal Chinese language model with visual capabilities 424
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
alpacahq/alpaca-trade-api-python A Python client for Alpaca's trade API 1,735
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
sparklingpandas/sparklingpandas Enables distributed data analysis using PySpark and Pandas APIs 361