OCR_Testdata_EarlyPrintedBooks

Historical OCR dataset

Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books.

A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.

GitHub

10 stars

2 watching

2 forks

last commit: over 8 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

kba/awesome-ocr

Related projects:

Repository	Description	Stars
tberg12/ocular	An OCR system designed to transcribe historical documents with high accuracy, handling various challenges such as font variation and code-switching.	256
openiti/ocr_gs_data	Provides gold standard data for training and testing optical character recognition (OCR) engines.	15
openarabic/ocr_gs_data	A collection of double-checked gold standard data for training and testing OCR engines.	13
ocr4all/ocr4all	Provides OCR services for historical documents through an intuitive web interface	244
igobronidze/hrs_training_data	Training data for a handwritten recognition system	21
ryanfb/ancientgreekocr-ocr-evaluation-tools	A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems	22
hamdikahloun/windows_ocr	An OCR library allowing developers to embed high-quality character recognition functionality in their products.	18
ponteineptique/toebler-ocr	An OCR project using historical French book data to train models and generate transcriptions.	1
jbaiter/archiscribe-corpus	A repository of transcribed 19th century German texts from various sources.	8
ibm/max-ocr	An optical character recognition system deployed as a web service using a trained Tesseract OCR model	47
cneud/ocr-conversion	A collection of scripts and stylesheets for converting data between different OCR formats.	72
ivylee/model-cards-and-datasheets	A collection of documentation and resources for various machine learning models, including their architectures, applications, and usage examples.	71
dannnylo/rtesseract	A Ruby library providing an interface to the Tesseract OCR system.	838
bandrel/ocyara	Performs OCR on images and scans them for matches to Yara rules	40
johndeere/sampledata	Provides sample data files for testing purposes	29