unstructured

Data pipeline library

A toolkit for building custom machine learning pipelines from unstructured data

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

GitHub

9k stars
62 watching
790 forks
Language: HTML
last commit: about 1 month ago
Linked from 1 awesome list

data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ml-tooling/opyrator Automates conversion of machine learning code into production-ready microservices with web API and GUI. 3,116
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 8,303
fastapi/fastapi A modern Python framework for building high-performance RESTful APIs 78,676
gradio-app/gradio Enables rapid creation and deployment of web applications for machine learning models and functions using Python 34,557
lightly-ai/lightly A Python library for self-supervised learning on images using contrastive learning and deep learning techniques. 3,204
towhee-io/towhee A framework for building efficient neural data processing pipelines using large language models and state-of-the-art deep learning models. 3,255
instructor-ai/instructor A Python library that simplifies working with structured outputs from large language models 8,551
pipedreamhq/pipedream An integration platform that enables developers to automate workflows across multiple applications and services using pre-built components and custom code 9,075
explosion/spacy Industrial-strength NLP library for Python and Cython 30,459
pypi/warehouse The software behind the Python Package Index. 3,617
gokumohandas/made-with-ml Teaches machine learning fundamentals and software engineering practices for building production-ready ML applications 37,816
juhaku/utoipa Generates OpenAPI documentation from Rust API code 2,556
mlflow/mlflow A platform for managing machine learning projects from inception to deployment 19,021
kedro-org/kedro A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity 10,050
alibaba/pipcook An open-source machine learning platform for web developers, providing a modular framework for building and deploying models 2,551