archiscribe-corpus

Text dataset

A repository of transcribed 19th century German texts from various sources.

Repository for 19th century German fraktur lines transcribed via archiscribe.jbaiter.de

GitHub

8 stars
4 watching
1 forks
last commit: almost 6 years ago
Linked from 1 awesome list

19th-centurydatasetevaluation-datafrakturhistorical-dataocrtraining-data

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jbaiter/archiscribe A tool for transcribing OCR data from archival documents 17
jbest/typeface-corpus A collection of typeface samples to improve OCR accuracy for natural history collections and digital humanities. 7
chreul/ocr_testdata_earlyprintedbooks Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. 10
aitutorials/datasets A comprehensive collection of datasets from various AI-related sources worldwide. 46
bertez/corpora A collection of Galician language data in JSON format. 2
bsvino/jaiprimer A documentation project focused on explaining Jonathan Blow's programming language Jai. 1,811
famrashel/idn-treebank A manually tagged Indonesian corpus consisting of parse-trees from sentences. 36
elte-dh/regenykorpusz A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. 4
esamattis/jslibs A curated collection of useful JavaScript libraries for building web applications. 59
art-group-it/gasp Generating abstracts of scientific papers from citations 9
alessandrogianfelici/danish_reviews_dataset A dataset of Danish reviews scraped from the internet to train sentiment classification models 2
pedrobarcha/old-books-dataset A collection of scanned book pages with ground truth annotations for OCR research and text analysis 12
famrashel/idn-tagged-corpus A manually tagged Indonesian language corpus in tab-separated file format 88
several27/fakenewscorpus A large dataset of news articles with labeled categories to train fake news recognition algorithms 387
dativebase/old Software for creating collaborative databases of language data 1