Resources
OCR resources
Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools.
Manuals, lexica, OCR test data for PoCoTo and the profiler
15 stars
6 watching
2 forks
Language: Lex
last commit: over 4 years ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A Java-based tool for correcting errors in OCR'd historical documents | 40 |
| | Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format | 373 |
| | Develops OCR models and ground truth data for a Latin lexical resource | 1 |
| | A software package for concordance analysis in R | 9 |
| | Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms. | 21 |
| | A tool for processing Coq and Lean 4 code embedded in text documents | 237 |
| | A collection of annual reports on domestic programming languages in China. | 234 |
| | Extracts common information from text strings in various formats | 79 |
| | Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. | 10 |
| | Language detection library using Bloom filters for speed and memory efficiency. | 685 |
| | Provides common testing tools and utilities for C++ projects | 33 |
| | A cheat sheet for conducting enumeration during penetration testing and security assessments | 102 |
| | Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing. | 1,191 |
| | An OCR system optimized for historical and non-Latin scripts, providing layout analysis, character recognition, and support for various formats. | 757 |