archiscribe-corpus
Text dataset
A repository of transcribed 19th century German texts from various sources.
Repository for 19th century German fraktur lines transcribed via archiscribe.jbaiter.de
8 stars
4 watching
1 forks
last commit: almost 7 years ago
Linked from 1 awesome list
19th-centurydatasetevaluation-datafrakturhistorical-dataocrtraining-data
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A tool for transcribing OCR data from archival documents | 17 |
| | A collection of typeface samples to improve OCR accuracy for natural history collections and digital humanities. | 7 |
| | Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. | 10 |
| | A comprehensive collection of datasets from various AI-related sources worldwide. | 46 |
| | A collection of Galician language data in JSON format. | 2 |
| | A documentation project focused on explaining Jonathan Blow's programming language Jai. | 1,816 |
| | A manually tagged Indonesian corpus consisting of parse-trees from sentences. | 36 |
| | A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
| | A curated collection of useful JavaScript libraries for building web applications. | 59 |
| | Generating abstracts of scientific papers from citations | 9 |
| | A dataset of Danish reviews scraped from the internet to train sentiment classification models | 2 |
| | A collection of scanned book pages with ground truth annotations for OCR research and text analysis | 12 |
| | A manually tagged Indonesian language corpus in tab-separated file format | 88 |
| | A large dataset of news articles with labeled categories to train fake news recognition algorithms | 385 |
| | Software for creating collaborative databases of language data | 1 |