OCR_Testdata_EarlyPrintedBooks

Historical OCR dataset

Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books.

A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.

GitHub

10 stars
2 watching
2 forks
last commit: almost 7 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
tberg12/ocular An OCR system designed to transcribe historical documents with high accuracy, handling various challenges such as font variation and code-switching. 255
openiti/ocr_gs_data Provides gold standard data for training and testing optical character recognition (OCR) engines. 15
openarabic/ocr_gs_data A collection of double-checked gold standard data for training and testing OCR engines. 13
ocr4all/ocr4all Provides a platform for converting historical printed materials into editable digital text 238
igobronidze/hrs_training_data Training data for a handwritten recognition system 20
ryanfb/ancientgreekocr-ocr-evaluation-tools A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems 22
hamdikahloun/windows_ocr An OCR library allowing developers to embed high-quality character recognition functionality in their products. 18
ponteineptique/toebler-ocr An OCR project using historical French book data to train models and generate transcriptions. 1
jbaiter/archiscribe-corpus A repository of transcribed 19th century German texts from various sources. 8
ibm/max-ocr An optical character recognition system deployed as a web service using a trained Tesseract OCR model 47
cneud/ocr-conversion A collection of scripts and stylesheets for converting data between different OCR formats. 71
ivylee/model-cards-and-datasheets A collection of documentation and resources for various machine learning models, including their architectures, applications, and usage examples. 71
dannnylo/rtesseract A Ruby library providing an interface to the Tesseract OCR system. 828
bandrel/ocyara Performs OCR on images and scans them for matches to Yara rules 40
johndeere/sampledata Provides sample data files for testing purposes 29