unstructured
Data pipeline library
A toolkit for building custom machine learning pipelines from unstructured data
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
9k stars
60 watching
755 forks
Language: HTML
last commit: 6 days ago
Linked from 1 awesome list
data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing
Related projects:
Repository | Description | Stars |
---|---|---|
ml-tooling/opyrator | Automates conversion of machine learning code into production-ready microservices with web API and GUI. | 3,102 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 6,651 |
fastapi/fastapi | A modern Python framework for building high-performance RESTful APIs with automatic interactive documentation and robust standards-based features. | 77,670 |
gradio-app/gradio | Enables rapid creation and deployment of web applications for machine learning models and functions using Python | 33,962 |
lightly-ai/lightly | An open-source framework for self-supervised learning on images using deep learning techniques. | 3,165 |
towhee-io/towhee | A framework for building efficient neural data processing pipelines using large language models and state-of-the-art deep learning models. | 3,226 |
instructor-ai/instructor | A Python library that provides structured outputs from large language models (LLMs) and facilitates seamless integration with various LLM providers. | 8,163 |
pipedreamhq/pipedream | An integration platform for automating data flows between applications and services. | 8,981 |
explosion/spacy | Industrial-strength NLP library for Python and Cython | 30,230 |
pypi/warehouse | A software system that powers the package registry for Python packages | 3,601 |
gokumohandas/made-with-ml | Teaches machine learning fundamentals and software engineering practices for building production-ready ML applications | 37,603 |
juhaku/utoipa | Generates OpenAPI documentation from Rust API code | 2,474 |
mlflow/mlflow | A platform to manage the entire machine learning lifecycle, from experiment tracking to model deployment. | 18,781 |
kedro-org/kedro | A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity | 10,004 |
alibaba/pipcook | An open-source machine learning platform for web developers, providing a modular framework for building and deploying models | 2,543 |