data-prep-kit
Data prep tool
A toolkit for streamlining data preparation for developers building large language model applications
Open source project for data preparation of LLM application builders
363 stars
18 watching
140 forks
Language: Python
last commit: 2 months ago code-qualitydatadata-prepdata-preparationdata-preprocessingdata-preprocessing-pipelinesdatacurationdatarecipesdeduplicationfinetuninglarge-language-modelslarge-scale-data-processingllmllmappsmalwarepythonrayspark
Related projects:
Repository | Description | Stars |
---|---|---|
| A Python library for rapidly collecting, cleaning, and visualizing data with minimal code | 2,088 |
| A Python library that provides a simple API for data preparation and analysis on various big-data engines | 1,486 |
| A collection of Go-based resources and tools for data science tasks | 879 |
| Provides tools and infrastructure for setting up and managing reproducible data science projects | 152 |
| Provides tools to prepare and process data for the Merck challenge at Kaggle | 10 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| Tools and utilities for efficient data processing with a focus on text analysis. | 206 |
| A Java-based framework for building read-write Linked Data applications based on the W3C LDP specification | 43 |
| Creating tutorials to teach data science skills through Jupyter Notebook | 7,011 |
| A comprehensive toolset for building Large Language Model (LLM) based applications | 1,733 |
| A tool for preparing language data for publication in various formats. | 7 |
| A tool to help data scientists manage and annotate natural language data for training AI models | 1,405 |
| Utilities for creating charts, maps, and tables for data visualization | 1,368 |
| A lightweight Common Lisp library for interacting with relational databases. | 101 |
| An introduction to data wrangling with the DataFrames.jl package in Julia | 122 |