hocr-tools
OCR analyzer
Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
370 stars
19 watching
79 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
kba/hocr-spec | A specification for an embedded OCR workflow and output format | 74 |
ub-mannheim/ocr-gt-tools | A web-based tool for editing and annotating OCR transcriptions of scanned text | 48 |
ryanfb/ancientgreekocr-ocr-evaluation-tools | A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems | 22 |
ub-mannheim/ocr-fileformat | Tool for converting and validating OCR file formats | 180 |
mittagessen/kraken | An OCR system optimized for historical and non-Latin scripts | 748 |
microsoft/unicoder | This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. | 88 |
athento/hocr-parser | A Python library for parsing the HOCR specification into structured data | 13 |
hc-guo/owl | A Large Language Model designed to analyze and manage IT operations data | 237 |
cneud/ocr-conversion | A collection of scripts and stylesheets for converting data between different OCR formats. | 71 |
lowresourcelanguages/hltdi-morphology | Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
cisocrgroup/resources | Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools. | 15 |
eddieantonio/ocreval | A collection of tools and utilities for evaluating the performance and quality of OCR output | 57 |
macroecology/letsr | A package for analyzing and handling macroecological data in R. | 28 |
onb-rd/hocrtools | Utilities to process and transform hOCR files into ALTO format using XSLT transformations | 6 |
aantron/lambdasoup | A functional HTML scraping and manipulation library | 383 |