refinery

Data annotation toolkit

A tool to help data scientists manage and annotate natural language data for training AI models

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

GitHub

1k stars
18 watching
68 forks
Language: Python
last commit: 5 months ago
Linked from 2 awesome lists

active-learningannotationsartificial-intelligencedata-centric-aidata-labelingdata-sciencedeep-learninghuman-in-the-looplabelinglabeling-toolmachine-learningnatural-language-processingneural-searchnlppythonspacysupervised-learningtext-annotationtext-classificationtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jd-aig/nlp_baai A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. 252
zhanghang1989/pytorch-encoding A Python framework for building deep learning models with optimized encoding layers and batch normalization. 2,041
datacanvasio/cooka An automated machine learning toolkit with visualization and feature engineering capabilities 40
pku-dair/mindware An efficient AutoML system that automates the machine learning lifecycle 52
synyi/poplar A web-based annotation tool for natural language processing (NLP) 519
ardanlabs/training-ai Provides training materials and tools for building machine learning applications 72
nlplab/brat A web-based annotation tool designed to facilitate intuitive and fast creation of text-bound and relational annotations. 1,824
gopherdata/resources A collection of Go-based resources and tools for data science tasks 876
numaproj/numalogic A collection of machine learning models and tools for real-time time series data analytics and anomaly detection 167
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
etalab/piaf A question answering annotation platform with features like text input, user management and scoring 87
catmaid/catmaid A collaborative annotation toolkit for massive amounts of image data used in connectomics and neuroscience research 188
lightning-universe/lightning-bolts Provides a toolbox of components to extend PyTorch Lightning for deep learning research and production 1,693
embodiedgpt/embodiedgpt_pytorch A PyTorch-based toolkit for creating customized multimedia datasets and handling heterogeneous data for training AI models. 340
kevincoble/aitoolbox A toolbox of AI modules written in Swift for various machine learning tasks and algorithms 794