PdfPig
PDF analyzer
A C# library for extracting and analyzing text from PDF files
Read and extract text and other content from PDFs in C# (port of PDFBox)
2k stars
50 watching
247 forks
Language: C#
last commit: 2 months ago
Linked from 2 awesome lists
alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox
Related projects:
Repository | Description | Stars |
---|---|---|
| Develops tools and algorithms for analyzing layout and structure of documents in PDF format | 591 |
| A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. | 178 |
| A Python tool for analyzing PDF files to identify potential security risks and malicious content. | 1,319 |
| A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities | 28 |
| A tool to extract text from PDFs and add a searchable layer to them | 279 |
| A lightweight command-line tool for analyzing and visualizing PDFs without a backend | 35 |
| A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. | 308 |
| Provides tools and libraries for generating, manipulating, and rendering PDF documents from C#. | 1,371 |
| A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services | 164 |
| Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
| A C# wrapper around the QPdf library for PDF manipulation and operations. | 17 |
| Extracts tables from PDF files using Java | 1,859 |
| A Go library for extracting text from PDF files, particularly invoices. | 708 |
| Analyzes PE files by combining data from various tools to generate a centralized report. | 204 |
| A Rust library for working with PDF documents | 1,680 |