archiscribe-corpus
Text dataset
A repository of transcribed 19th century German texts from various sources.
Repository for 19th century German fraktur lines transcribed via archiscribe.jbaiter.de
8 stars
4 watching
1 forks
last commit: almost 6 years ago
Linked from 1 awesome list
19th-centurydatasetevaluation-datafrakturhistorical-dataocrtraining-data
Related projects:
Repository | Description | Stars |
---|---|---|
jbaiter/archiscribe | A tool for transcribing OCR data from archival documents | 17 |
jbest/typeface-corpus | A collection of typeface samples to improve OCR accuracy for natural history collections and digital humanities. | 7 |
chreul/ocr_testdata_earlyprintedbooks | Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. | 10 |
aitutorials/datasets | A comprehensive collection of datasets from various AI-related sources worldwide. | 46 |
bertez/corpora | A collection of Galician language data in JSON format. | 2 |
bsvino/jaiprimer | A documentation project focused on explaining Jonathan Blow's programming language Jai. | 1,811 |
famrashel/idn-treebank | A manually tagged Indonesian corpus consisting of parse-trees from sentences. | 36 |
elte-dh/regenykorpusz | A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
esamattis/jslibs | A curated collection of useful JavaScript libraries for building web applications. | 59 |
art-group-it/gasp | Generating abstracts of scientific papers from citations | 9 |
alessandrogianfelici/danish_reviews_dataset | A dataset of Danish reviews scraped from the internet to train sentiment classification models | 2 |
pedrobarcha/old-books-dataset | A collection of scanned book pages with ground truth annotations for OCR research and text analysis | 12 |
famrashel/idn-tagged-corpus | A manually tagged Indonesian language corpus in tab-separated file format | 88 |
several27/fakenewscorpus | A large dataset of news articles with labeled categories to train fake news recognition algorithms | 387 |
dativebase/old | Software for creating collaborative databases of language data | 1 |