Resources

OCR resources

Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools.

Manuals, lexica, OCR test data for PoCoTo and the profiler

GitHub

15 stars
6 watching
2 forks
Language: Lex
last commit: over 3 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
cisocrgroup/pocoto A Java-based tool for correcting errors in OCR'd historical documents 40
ocropus/hocr-tools Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format 370
lascivaroma/lexical Develops OCR models and ground truth data for a Latin lexical resource 1
aslez/concor A software package for concordance analysis in R 9
lex4all/lex4all Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms. 21
cpitclaudel/alectryon Tools for processing Coq code and prose in technical documents 236
ploc-org/cnpl A collection of annual reports on domestic programming languages in China. 234
talyssonoc/commonregexruby Extracts common information from text strings in various formats 79
chreul/ocr_testdata_earlyprintedbooks Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. 10
peterc/whatlanguage Language detection library using Bloom filters for speed and memory efficiency. 685
osrf/osrf_testing_tools_cpp Provides common testing tools and utilities for C++ projects 33
oncybersec/oscp-enumeration-cheat-sheet A cheat sheet for conducting enumeration during penetration testing and security assessments 102
openseg-group/openseg.pytorch Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing. 1,190
mittagessen/kraken An OCR system optimized for historical and non-Latin scripts 748