pdf-text-extraction-benchmark

A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.

GitHub

63 stars
6 watching
11 forks
Language: TeX
last commit: almost 4 years ago
arxivbenchmarkevaluationextractionpdftextext-extraction