archiscribe-corpus

Text dataset

A repository of transcribed 19th century German texts from various sources.

Repository for 19th century German fraktur lines transcribed via archiscribe.jbaiter.de

GitHub

8 stars

4 watching

1 forks

last commit: over 7 years ago

Linked from 1 awesome list

19th-centurydatasetevaluation-datafrakturhistorical-dataocrtraining-data

Backlinks from these awesome lists:

kba/awesome-ocr

Related projects:

Repository	Description	Stars
jbaiter/archiscribe	A tool for transcribing OCR data from archival documents	17
jbest/typeface-corpus	A collection of typeface samples to improve OCR accuracy for natural history collections and digital humanities.	7
chreul/ocr_testdata_earlyprintedbooks	Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books.	10
aitutorials/datasets	A comprehensive collection of datasets from various AI-related sources worldwide.	46
bertez/corpora	A collection of Galician language data in JSON format.	2
bsvino/jaiprimer	A documentation project focused on explaining Jonathan Blow's programming language Jai.	1,816
famrashel/idn-treebank	A manually tagged Indonesian corpus consisting of parse-trees from sentences.	36
elte-dh/regenykorpusz	A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University.	4
esamattis/jslibs	A curated collection of useful JavaScript libraries for building web applications.	59
art-group-it/gasp	Generating abstracts of scientific papers from citations	9
alessandrogianfelici/danish_reviews_dataset	A dataset of Danish reviews scraped from the internet to train sentiment classification models	2
pedrobarcha/old-books-dataset	A collection of scanned book pages with ground truth annotations for OCR research and text analysis	12
famrashel/idn-tagged-corpus	A manually tagged Indonesian language corpus in tab-separated file format	88
several27/fakenewscorpus	A large dataset of news articles with labeled categories to train fake news recognition algorithms	385
dativebase/old	Software for creating collaborative databases of language data	1