DocumentLayoutAnalysis

Document analyzer

Develops tools and algorithms for analyzing layout and structure of documents in PDF format

Document Layout Analysis resources repos for development with PdfPig.

GitHub

583 stars
35 watching
64 forks
Language: C#
last commit: about 1 year ago
altoalto-xmlcsharpdocstrumdocument-layout-analysishocrhocr-documentslayout-analysispage-segmentationpage-xmlpdfpdfpigrecursive-xy-cuttable-extractionteixy-cutxycut

Related projects:

Repository Description Stars
uglytoad/pdfpig A C# library for extracting and analyzing text from PDF files 1,733
ocr4all/larex A tool for analyzing and extracting layouts from early printed books. 180
x-plug/mplug-docowl A large language model designed to understand documents without OCR, focusing on document structure and content analysis. 1,563
chungkwong/mathocr A software project that enables the recognition and analysis of printed scientific documents, particularly focusing on mathematical expressions. 167
jesparza/peepdf A Python tool for analyzing PDF files to identify potential security risks and malicious content. 1,309
hiddenillusion/analyzepdf A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. 176
pdf-archiver/pdf-archiver A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. 305
jsv4/opencontracts A document analytics platform providing features for managing documents, extracting layout information and vector embeddings, annotating documents, and querying them using LlamaIndex. 717
ocropus/hocr-tools Tools for manipulating and analyzing multi-lingual OCR results by representing them in a standard HTML format 370
tylabs/qs_old A tool to analyze and extract malicious content from office documents and executables 126
9b/pdfxray_lite A lightweight command-line tool for analyzing and visualizing PDFs without a backend 35
rrrene/inch Analyzes and suggests improvements to inline documentation in Ruby codebases 518
hiddenillusion/analyzepe Analyzes PE files by combining data from various tools to generate a centralized report. 204
sn4k3/uvtools Tools for analyzing and manipulating 3D printing data, including support analysis and file conversions. 1,231
binaryanalysisplatform/bap A comprehensive toolkit for analyzing and understanding binary programs 2,068