nidaba

OCR pipeline

Automates OCR pipeline for text digitization and conversion of raw images into citable texts.

An expandable and scalable OCR pipeline

GitHub

86 stars
9 watching
12 forks
Language: Python
last commit: about 7 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
openphilology/tei-ocr Customizes TEI XML for metadata from OCR processes to capture detailed layout and content information 1
openseg-group/openseg.pytorch Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing. 1,190
bandrel/ocyara Performs OCR on images and scans them for matches to Yara rules 40
allenai/scispacy A collection of custom spaCy pipelines and models for analyzing scientific documents. 1,709
seven45/pdm-ci Provides a base image for creating Python CI pipelines with package manager support 11
hhio618/golem-ci A decentralized task pipeline on Golem.network using Python. 5
hamdikahloun/windows_ocr An OCR library allowing developers to embed high-quality character recognition functionality in their products. 18
bjpop/rubra A bioinformatics pipeline system that supports running workflow stages on a distributed compute cluster. 38
openiti/ocr_gs_data Provides gold standard data for training and testing optical character recognition (OCR) engines. 15
sirfz/tesserocr An OCR API wrapper that enables concurrent execution using Python's threading module and releases the GIL. 2,016
ros-perception/image_pipeline A ROS package providing an image processing pipeline 800
osciiart/deepaa Generates ASCII art from images using deep learning-based convolutional neural networks 1,522
calamari-ocr/calamari An OCR engine with modular design and a command-line interface, providing pre-trained models and a Python API for customization. 1,049
druths/xp A tool for creating flexible and self-documenting data science pipelines 56