unstructured

Data pipeline library

A toolkit for building custom machine learning pipelines from unstructured data

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

GitHub

9k stars
62 watching
773 forks
Language: HTML
last commit: about 23 hours ago
Linked from 1 awesome list

data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ml-tooling/opyrator Automates conversion of machine learning code into production-ready microservices with web API and GUI. 3,107
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 8,223
fastapi/fastapi A modern Python framework for building high-performance RESTful APIs with automatic interactive documentation and robust standards-based features. 78,258
gradio-app/gradio Enables rapid creation and deployment of web applications for machine learning models and functions using Python 34,244
lightly-ai/lightly A Python library for training self-supervised learning models on images. 3,187
towhee-io/towhee A framework for building efficient neural data processing pipelines using large language models and state-of-the-art deep learning models. 3,236
instructor-ai/instructor A Python library that simplifies working with structured outputs from large language models 8,356
pipedreamhq/pipedream An integration platform for automating event-driven workflows with pre-built actions and custom code support. 9,006
explosion/spacy Industrial-strength NLP library for Python and Cython 30,368
pypi/warehouse A software system that powers the package registry for Python packages 3,606
gokumohandas/made-with-ml Teaches machine learning fundamentals and software engineering practices for building production-ready ML applications 37,698
juhaku/utoipa Generates OpenAPI documentation from Rust API code 2,512
mlflow/mlflow A platform for managing machine learning projects from inception to deployment 18,919
kedro-org/kedro A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity 10,032
alibaba/pipcook An open-source machine learning platform for web developers, providing a modular framework for building and deploying models 2,549