hocr-tools
OCR analyzer
Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
373 stars
19 watching
79 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A specification for an embedded OCR workflow and output format | 74 |
| A web-based tool for editing and annotating OCR transcriptions of scanned text | 48 |
| A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems | 22 |
| Tool for converting and validating OCR file formats | 182 |
| An OCR system optimized for historical and non-Latin scripts, providing layout analysis, character recognition, and support for various formats. | 757 |
| This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. | 89 |
| A Python library for parsing the HOCR specification into structured data | 13 |
| A Large Language Model designed to analyze and manage IT operations data | 240 |
| A collection of scripts and stylesheets for converting data between different OCR formats. | 72 |
| Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
| Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools. | 15 |
| A collection of tools and utilities for evaluating the performance and quality of OCR output | 57 |
| A package for analyzing and handling macroecological data in R. | 28 |
| Utilities to process and transform hOCR files into ALTO format using XSLT transformations | 6 |
| A functional HTML scraping and manipulation library in OCaml | 384 |