hocr-spec

OCR format

A specification for an embedded OCR workflow and output format

The hOCR Embedded OCR Workflow and Output Format

GitHub

74 stars
13 watching
20 forks
Language: HTML
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ocropus/hocr-tools Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format 370
ub-mannheim/ocr-fileformat Tool for converting and validating OCR file formats 180
kba/docker-ocropy An OCR system built into a Docker container to perform text recognition on images. 9
hamdikahloun/windows_ocr An OCR library allowing developers to embed high-quality character recognition functionality in their products. 18
kba/ocrad-docker A containerized implementation of OCR software for document recognition 2
ocr4all/ocr4all Provides a platform for converting historical printed materials into editable digital text 238
kba/kraken-docker A Docker container for running the Kraken OCR engine 5
mittagessen/kraken An OCR system optimized for historical and non-Latin scripts 748
onb-rd/hocrtools Utilities to process and transform hOCR files into ALTO format using XSLT transformations 6
ibm/max-ocr An optical character recognition system deployed as a web service using a trained Tesseract OCR model 47
r1me/ttesseractocr4 An Object Pascal binding for the Tesseract OCR engine to perform optical character recognition 145
pjk/libcbor A C library for parsing and generating CBOR data format 342
tesseract-ocr/docs A collection of documents detailing various aspects and improvements to the Tesseract OCR engine 260
meh/ruby-tesseract-ocr A Ruby wrapper around the Tesseract OCR API to provide an easy-to-use interface for optical character recognition tasks 629