Resources
OCR resources
Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools.
Manuals, lexica, OCR test data for PoCoTo and the profiler
15 stars
6 watching
2 forks
Language: Lex
last commit: over 3 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A Java-based tool for correcting errors in OCR'd historical documents | 40 |
| Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format | 373 |
| Develops OCR models and ground truth data for a Latin lexical resource | 1 |
| A software package for concordance analysis in R | 9 |
| Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms. | 21 |
| A tool for processing Coq and Lean 4 code embedded in text documents | 237 |
| A collection of annual reports on domestic programming languages in China. | 234 |
| Extracts common information from text strings in various formats | 79 |
| Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. | 10 |
| Language detection library using Bloom filters for speed and memory efficiency. | 685 |
| Provides common testing tools and utilities for C++ projects | 33 |
| A cheat sheet for conducting enumeration during penetration testing and security assessments | 102 |
| Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing. | 1,191 |
| An OCR system optimized for historical and non-Latin scripts, providing layout analysis, character recognition, and support for various formats. | 757 |