data-prep-kit

Data prep tool

A toolkit for streamlining data preparation for developers building large language model applications

Open source project for data preparation of LLM application builders

GitHub

363 stars
18 watching
140 forks
Language: Python
last commit: about 1 month ago
code-qualitydatadata-prepdata-preparationdata-preprocessingdata-preprocessing-pipelinesdatacurationdatarecipesdeduplicationfinetuninglarge-language-modelslarge-scale-data-processingllmllmappsmalwarepythonrayspark

Related projects:

Repository Description Stars
sfu-db/dataprep A Python library for rapidly collecting, cleaning, and visualizing data with minimal code 2,088
hi-primus/optimus A Python library that provides a simple API for data preparation and analysis on various big-data engines 1,486
gopherdata/resources A collection of Go-based resources and tools for data science tasks 879
oxinabox/datadeps.jl Provides tools and infrastructure for setting up and managing reproducible data science projects 152
zygmuntz/kaggle-merck Provides tools to prepare and process data for the Merck challenge at Kaggle 10
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,789
columbia-applied-data-science/rosetta Tools and utilities for efficient data processing with a focus on text analysis. 206
ldp4j/ldp4j A Java-based framework for building read-write Linked Data applications based on the W3C LDP specification 43
mrmimic/data-scientist-roadmap Creating tutorials to teach data science skills through Jupyter Notebook 7,011
melih-unsal/demogpt A comprehensive toolset for building Large Language Model (LLM) based applications 1,733
sillsdev/pathway A tool for preparing language data for publication in various formats. 7
code-kern-ai/refinery A tool to help data scientists manage and annotate natural language data for training AI models 1,405
datawrapper/datawrapper Utilities for creating charts, maps, and tables for data visualization 1,368
fukamachi/datafly A lightweight Common Lisp library for interacting with relational databases. 101
juliaacademy/dataframes An introduction to data wrangling with the DataFrames.jl package in Julia 122