refinery
Data annotation toolkit
A tool to help data scientists manage and annotate natural language data for training AI models
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
1k stars
18 watching
68 forks
Language: Python
last commit: 5 months ago
Linked from 2 awesome lists
active-learningannotationsartificial-intelligencedata-centric-aidata-labelingdata-sciencedeep-learninghuman-in-the-looplabelinglabeling-toolmachine-learningnatural-language-processingneural-searchnlppythonspacysupervised-learningtext-annotationtext-classificationtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
jd-aig/nlp_baai | A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. | 252 |
zhanghang1989/pytorch-encoding | A Python framework for building deep learning models with optimized encoding layers and batch normalization. | 2,041 |
datacanvasio/cooka | An automated machine learning toolkit with visualization and feature engineering capabilities | 40 |
pku-dair/mindware | An efficient AutoML system that automates the machine learning lifecycle | 52 |
synyi/poplar | A web-based annotation tool for natural language processing (NLP) | 519 |
ardanlabs/training-ai | Provides training materials and tools for building machine learning applications | 72 |
nlplab/brat | A web-based annotation tool designed to facilitate intuitive and fast creation of text-bound and relational annotations. | 1,824 |
gopherdata/resources | A collection of Go-based resources and tools for data science tasks | 876 |
numaproj/numalogic | A collection of machine learning models and tools for real-time time series data analytics and anomaly detection | 167 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
etalab/piaf | A question answering annotation platform with features like text input, user management and scoring | 87 |
catmaid/catmaid | A collaborative annotation toolkit for massive amounts of image data used in connectomics and neuroscience research | 188 |
lightning-universe/lightning-bolts | Provides a toolbox of components to extend PyTorch Lightning for deep learning research and production | 1,693 |
embodiedgpt/embodiedgpt_pytorch | A PyTorch-based toolkit for creating customized multimedia datasets and handling heterogeneous data for training AI models. | 340 |
kevincoble/aitoolbox | A toolbox of AI modules written in Swift for various machine learning tasks and algorithms | 794 |