OCR_Testdata_EarlyPrintedBooks
Historical OCR dataset
Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books.
A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.
10 stars
2 watching
2 forks
last commit: almost 7 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
tberg12/ocular | An OCR system designed to transcribe historical documents with high accuracy, handling various challenges such as font variation and code-switching. | 255 |
openiti/ocr_gs_data | Provides gold standard data for training and testing optical character recognition (OCR) engines. | 15 |
openarabic/ocr_gs_data | A collection of double-checked gold standard data for training and testing OCR engines. | 13 |
ocr4all/ocr4all | Provides a platform for converting historical printed materials into editable digital text | 238 |
igobronidze/hrs_training_data | Training data for a handwritten recognition system | 20 |
ryanfb/ancientgreekocr-ocr-evaluation-tools | A collection of tools and scripts to evaluate the accuracy of Optical Character Recognition (OCR) systems | 22 |
hamdikahloun/windows_ocr | An OCR library allowing developers to embed high-quality character recognition functionality in their products. | 18 |
ponteineptique/toebler-ocr | An OCR project using historical French book data to train models and generate transcriptions. | 1 |
jbaiter/archiscribe-corpus | A repository of transcribed 19th century German texts from various sources. | 8 |
ibm/max-ocr | An optical character recognition system deployed as a web service using a trained Tesseract OCR model | 47 |
cneud/ocr-conversion | A collection of scripts and stylesheets for converting data between different OCR formats. | 71 |
ivylee/model-cards-and-datasheets | A collection of documentation and resources for various machine learning models, including their architectures, applications, and usage examples. | 71 |
dannnylo/rtesseract | A Ruby library providing an interface to the Tesseract OCR system. | 828 |
bandrel/ocyara | Performs OCR on images and scans them for matches to Yara rules | 40 |
johndeere/sampledata | Provides sample data files for testing purposes | 29 |