hocr-tools

OCR analyzer

Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

GitHub

370 stars
19 watching
79 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kba/hocr-spec A specification for an embedded OCR workflow and output format 74
ub-mannheim/ocr-gt-tools A web-based tool for editing and annotating OCR transcriptions of scanned text 48
ryanfb/ancientgreekocr-ocr-evaluation-tools A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems 22
ub-mannheim/ocr-fileformat Tool for converting and validating OCR file formats 180
mittagessen/kraken An OCR system optimized for historical and non-Latin scripts 748
microsoft/unicoder This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. 88
athento/hocr-parser A Python library for parsing the HOCR specification into structured data 13
hc-guo/owl A Large Language Model designed to analyze and manage IT operations data 237
cneud/ocr-conversion A collection of scripts and stylesheets for converting data between different OCR formats. 71
lowresourcelanguages/hltdi-morphology Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. 5
cisocrgroup/resources Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools. 15
eddieantonio/ocreval A collection of tools and utilities for evaluating the performance and quality of OCR output 57
macroecology/letsr A package for analyzing and handling macroecological data in R. 28
onb-rd/hocrtools Utilities to process and transform hOCR files into ALTO format using XSLT transformations 6
aantron/lambdasoup A functional HTML scraping and manipulation library 383