archiscribe-corpus
Text dataset
A repository of transcribed 19th century German texts from various sources.
Repository for 19th century German fraktur lines transcribed via archiscribe.jbaiter.de
8 stars
4 watching
1 forks
last commit: about 6 years ago
Linked from 1 awesome list
19th-centurydatasetevaluation-datafrakturhistorical-dataocrtraining-data
Related projects:
Repository | Description | Stars |
---|---|---|
| A tool for transcribing OCR data from archival documents | 17 |
| A collection of typeface samples to improve OCR accuracy for natural history collections and digital humanities. | 7 |
| Provides test data and models for training Optical Character Recognition (OCR) systems on historical printed books. | 10 |
| A comprehensive collection of datasets from various AI-related sources worldwide. | 46 |
| A collection of Galician language data in JSON format. | 2 |
| A documentation project focused on explaining Jonathan Blow's programming language Jai. | 1,816 |
| A manually tagged Indonesian corpus consisting of parse-trees from sentences. | 36 |
| A large corpus of Hungarian novels with annotated texts and metadata, developed by the Department of Digital Humanities at Eötvös Loránd University. | 4 |
| A curated collection of useful JavaScript libraries for building web applications. | 59 |
| Generating abstracts of scientific papers from citations | 9 |
| A dataset of Danish reviews scraped from the internet to train sentiment classification models | 2 |
| A collection of scanned book pages with ground truth annotations for OCR research and text analysis | 12 |
| A manually tagged Indonesian language corpus in tab-separated file format | 88 |
| A large dataset of news articles with labeled categories to train fake news recognition algorithms | 385 |
| Software for creating collaborative databases of language data | 1 |