refinery

Data annotation toolkit

A tool to help data scientists manage and annotate natural language data for training AI models

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

GitHub

1k stars
18 watching
69 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

active-learningannotationsartificial-intelligencedata-centric-aidata-labelingdata-sciencedeep-learninghuman-in-the-looplabelinglabeling-toolmachine-learningnatural-language-processingneural-searchnlppythonspacysupervised-learningtext-annotationtext-classificationtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jd-aig/nlp_baai A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. 254
zhanghang1989/pytorch-encoding A Python framework for building deep learning models with optimized encoding layers and batch normalization. 2,044
datacanvasio/cooka An automated machine learning toolkit with visualization and feature engineering capabilities 40
pku-dair/mindware An efficient AutoML system that automates the machine learning lifecycle 53
synyi/poplar A web-based annotation tool for natural language processing (NLP) 520
ardanlabs/training-ai Provides training materials and tools for building machine learning applications 72
nlplab/brat A web-based annotation tool designed to facilitate intuitive and fast creation of text-bound and relational annotations. 1,831
gopherdata/resources A collection of Go-based resources and tools for data science tasks 879
numaproj/numalogic A collection of machine learning models and tools for real-time time series data analytics and anomaly detection 168
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,789
etalab/piaf A question answering annotation platform with features like text input, user management and scoring 87
catmaid/catmaid A collaborative annotation toolkit for massive amounts of image data used in connectomics and neuroscience research 188
lightning-universe/lightning-bolts Provides a toolbox of components to extend PyTorch Lightning for deep learning research and production 1,700
embodiedgpt/embodiedgpt_pytorch A PyTorch-based toolkit for creating customized multimedia datasets and handling heterogeneous data for training AI models. 346
kevincoble/aitoolbox A toolbox of AI modules written in Swift for various machine learning tasks and algorithms 794