distilabel
AI data generator
A framework for generating synthetic data and AI feedback to accelerate AI development
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
2k stars
17 watching
138 forks
Language: Python
last commit: 3 months ago aihuggingfacellmsopenaipythonrlaifrlhfsynthetic-datasynthetic-dataset-generation
Related projects:
Repository | Description | Stars |
---|---|---|
| A toolkit for generating synthetic data while preserving differential privacy | 602 |
| Automates product content generation using AI to improve SEO and customer experience. | 26 |
| An integrated framework for training custom generative AI models | 246 |
| Software for generating synthetic multivariate data with statistical properties preserved | 57 |
| A framework for efficient and optimized retrieval augmented generative pipelines using state-of-the-art LLMs and Information Retrieval. | 1,392 |
| A framework for creating autonomous AI agents with simple decorators and cryptographic security. | 1,158 |
| A new programming language designed to support the development of hybrid AI systems. | 86 |
| Automates large batches of AI-generated artwork locally using GPU acceleration. | 633 |
| A utility library for generating and manipulating unique identifiers in a Substrate-based storage system | 6 |
| Automates the generation of comprehensive README files using AI-powered language models. | 1,665 |
| An AI-powered development platform that generates code and stores it on GitHub, allowing developers to customize and integrate it into their workflows. | 1,301 |
| An audio generation library that uses diffusion models to produce high-quality audio samples from noise or text input | 1,975 |
| An automated feature generation tool for tabular data | 806 |
| An AI model designed to generate and execute code automatically | 816 |
| A tool to help data scientists manage and annotate natural language data for training AI models | 1,405 |