AlpacaDataCleaned
Language data set
A cleaned and curated version of an Alpaca dataset used to train a large language model
Alpaca dataset from Stanford, cleaned and curated
2k stars
27 watching
153 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A corpus and API for human language data | 11 |
| A dataset for training and fine-tuning large language models on Chinese text prompts. | 392 |
| Recreated weights from Stanford Alpaca model fine-tuned for specific task | 406 |
| A tool that identifies languages in text by comparing them to a reference set of patterns. | 1 |
| A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. | 2 |
| A repository providing tools and datasets to fine-tune language models for specific tasks | 1,484 |
| A tool to help data scientists manage and annotate natural language data for training AI models | 1,405 |
| Provides pre-trained language models and tools for fine-tuning and evaluation | 439 |
| A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
| A pre-trained AI model that can engage in natural language conversations with high accuracy and understanding. | 43 |
| Develops a multimodal Chinese language model with visual capabilities | 429 |
| A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
| A Python client for Alpaca's trade API | 1,745 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| Enables distributed data analysis using PySpark and Pandas APIs | 362 |