unstructured
Data pipeline library
A toolkit for building custom machine learning pipelines from unstructured data
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
9k stars
62 watching
773 forks
Language: HTML
last commit: about 23 hours ago
Linked from 1 awesome list
data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing
Related projects:
Repository | Description | Stars |
---|---|---|
ml-tooling/opyrator | Automates conversion of machine learning code into production-ready microservices with web API and GUI. | 3,107 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 8,223 |
fastapi/fastapi | A modern Python framework for building high-performance RESTful APIs with automatic interactive documentation and robust standards-based features. | 78,258 |
gradio-app/gradio | Enables rapid creation and deployment of web applications for machine learning models and functions using Python | 34,244 |
lightly-ai/lightly | A Python library for training self-supervised learning models on images. | 3,187 |
towhee-io/towhee | A framework for building efficient neural data processing pipelines using large language models and state-of-the-art deep learning models. | 3,236 |
instructor-ai/instructor | A Python library that simplifies working with structured outputs from large language models | 8,356 |
pipedreamhq/pipedream | An integration platform for automating event-driven workflows with pre-built actions and custom code support. | 9,006 |
explosion/spacy | Industrial-strength NLP library for Python and Cython | 30,368 |
pypi/warehouse | A software system that powers the package registry for Python packages | 3,606 |
gokumohandas/made-with-ml | Teaches machine learning fundamentals and software engineering practices for building production-ready ML applications | 37,698 |
juhaku/utoipa | Generates OpenAPI documentation from Rust API code | 2,512 |
mlflow/mlflow | A platform for managing machine learning projects from inception to deployment | 18,919 |
kedro-org/kedro | A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity | 10,032 |
alibaba/pipcook | An open-source machine learning platform for web developers, providing a modular framework for building and deploying models | 2,549 |