AlpacaDataCleaned
Language data set
A cleaned and curated version of an Alpaca dataset used to train a large language model
Alpaca dataset from Stanford, cleaned and curated
2k stars
27 watching
153 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
alvations/seedling | A corpus and API for human language data | 11 |
carbonz0/alpaca-chinese-dataset | A dataset for training and fine-tuning large language models on Chinese text prompts. | 390 |
pointnetwork/point-alpaca | Recreated weights from Stanford Alpaca model fine-tuned for specific task | 406 |
alvations/sugarlike | A tool that identifies languages in text by comparing them to a reference set of patterns. | 1 |
alvations/sugali | A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. | 2 |
google-research/flan | A repository providing tools and datasets to fine-tune language models for specific tasks | 1,474 |
code-kern-ai/refinery | A tool to help data scientists manage and annotate natural language data for training AI models | 1,402 |
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 437 |
matbahasa/talpco | A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
datacanvasio/alaya | A pre-trained conversational AI model with high-quality training data and fine-tuned for various tasks such as question answering, code generation, and text summarization. | 43 |
airaria/visual-chinese-llama-alpaca | Develops a multimodal Chinese language model with visual capabilities | 424 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
alpacahq/alpaca-trade-api-python | A Python client for Alpaca's trade API | 1,735 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
sparklingpandas/sparklingpandas | Enables distributed data analysis using PySpark and Pandas APIs | 361 |