data-prep-kit
Data prep toolkit
A toolkit for streamlining data preparation for developers building large language model applications
Open source project for data preparation of LLM application builders
290 stars
16 watching
130 forks
Language: Jupyter Notebook
last commit: 8 days ago code-qualitydatadata-prepdata-preparationdata-preprocessingdata-preprocessing-pipelinesdatacurationdatarecipesdeduplicationfinetuninglarge-language-modelslarge-scale-data-processingllmllmappsmalwarepythonrayspark
Related projects:
Repository | Description | Stars |
---|---|---|
sfu-db/dataprep | A Python library for rapidly collecting, cleaning, and visualizing data with minimal code | 2,068 |
hi-primus/optimus | A Python library that provides a simple API for data preparation and analysis on various big-data engines | 1,481 |
gopherdata/resources | A collection of Go-based resources and tools for data science tasks | 876 |
oxinabox/datadeps.jl | Provides tools and infrastructure for setting up and managing reproducible data science projects | 151 |
zygmuntz/kaggle-merck | Provides tools to prepare and process data for the Merck challenge at Kaggle | 10 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
columbia-applied-data-science/rosetta | Tools and utilities for efficient data processing with a focus on text analysis. | 206 |
ldp4j/ldp4j | A Java-based framework for building read-write Linked Data applications based on the W3C LDP specification | 43 |
mrmimic/data-scientist-roadmap | Creating tutorials to teach data science skills through Jupyter Notebook | 6,986 |
melih-unsal/demogpt | A comprehensive toolset for building Large Language Model (LLM) based applications | 1,710 |
sillsdev/pathway | A tool for preparing language data for publication in various formats. | 7 |
code-kern-ai/refinery | A tool to help data scientists manage and annotate natural language data for training AI models | 1,402 |
datawrapper/datawrapper | Utilities for creating charts, maps, and tables for data visualization | 1,363 |
fukamachi/datafly | A lightweight Common Lisp library for interacting with relational databases. | 100 |
juliaacademy/dataframes | An introduction to data wrangling with the DataFrames.jl package in Julia | 122 |