data-prep-kit

Data prep toolkit

A toolkit for streamlining data preparation for developers building large language model applications

Open source project for data preparation of LLM application builders

GitHub

290 stars
16 watching
130 forks
Language: Jupyter Notebook
last commit: 7 days ago
code-qualitydatadata-prepdata-preparationdata-preprocessingdata-preprocessing-pipelinesdatacurationdatarecipesdeduplicationfinetuninglarge-language-modelslarge-scale-data-processingllmllmappsmalwarepythonrayspark

Related projects:

Repository Description Stars
sfu-db/dataprep A Python library for rapidly collecting, cleaning, and visualizing data with minimal code 2,068
hi-primus/optimus A Python library that provides a simple API for data preparation and analysis on various big-data engines 1,481
gopherdata/resources A collection of Go-based resources and tools for data science tasks 876
oxinabox/datadeps.jl Provides tools and infrastructure for setting up and managing reproducible data science projects 151
zygmuntz/kaggle-merck Provides tools to prepare and process data for the Merck challenge at Kaggle 10
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
columbia-applied-data-science/rosetta Tools and utilities for efficient data processing with a focus on text analysis. 206
ldp4j/ldp4j A Java-based framework for building read-write Linked Data applications based on the W3C LDP specification 43
mrmimic/data-scientist-roadmap Creating tutorials to teach data science skills through Jupyter Notebook 6,986
melih-unsal/demogpt A comprehensive toolset for building Large Language Model (LLM) based applications 1,710
sillsdev/pathway A tool for preparing language data for publication in various formats. 7
code-kern-ai/refinery A tool to help data scientists manage and annotate natural language data for training AI models 1,402
datawrapper/datawrapper Utilities for creating charts, maps, and tables for data visualization 1,363
fukamachi/datafly A lightweight Common Lisp library for interacting with relational databases. 100
juliaacademy/dataframes An introduction to data wrangling with the DataFrames.jl package in Julia 122