Resources

OCR resources

Resources and data for developing a language-aware OCR document error profiler and PoCoTo tools.

Manuals, lexica, OCR test data for PoCoTo and the profiler

15 stars

6 watching

2 forks

Language: Lex

last commit: about 5 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

kba/awesome-ocr

Related projects:

Repository	Description	Stars
cisocrgroup/pocoto	A Java-based tool for correcting errors in OCR'd historical documents	40
ocropus/hocr-tools	Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format	373
lascivaroma/lexical	Develops OCR models and ground truth data for a Latin lexical resource	1
aslez/concor	A software package for concordance analysis in R	9
lex4all/lex4all	Software tool to generate pronunciation lexicons for low-resource languages using speech recognition and machine learning algorithms.	21
cpitclaudel/alectryon	A tool for processing Coq and Lean 4 code embedded in text documents	237
ploc-org/cnpl	A collection of annual reports on domestic programming languages in China.	234
talyssonoc/commonregexruby	Extracts common information from text strings in various formats	79
chreul/ocr_testdata_earlyprintedbooks	Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books.	10
peterc/whatlanguage	Language detection library using Bloom filters for speed and memory efficiency.	685
osrf/osrf_testing_tools_cpp	Provides common testing tools and utilities for C++ projects	33
oncybersec/oscp-enumeration-cheat-sheet	A cheat sheet for conducting enumeration during penetration testing and security assessments	102
openseg-group/openseg.pytorch	Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing.	1,191
mittagessen/kraken	An OCR system optimized for historical and non-Latin scripts, providing layout analysis, character recognition, and support for various formats.	757