PdfPig
PDF analyzer
A C# library for extracting and analyzing text from PDF files
Read and extract text and other content from PDFs in C# (port of PDFBox)
2k stars
49 watching
241 forks
Language: C#
last commit: 10 days ago
Linked from 2 awesome lists
alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox
Related projects:
Repository | Description | Stars |
---|---|---|
bobld/documentlayoutanalysis | Develops tools and algorithms for analyzing layout and structure of documents in PDF format | 583 |
hiddenillusion/analyzepdf | A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. | 176 |
jesparza/peepdf | A Python tool for analyzing PDF files to identify potential security risks and malicious content. | 1,309 |
steelthread/mimeograph | A CoffeeScript library for extracting text from PDFs and creating searchable files | 28 |
leofcardoso/pdf2pdfocr | A tool to extract text from PDFs and add a searchable layer to them | 274 |
9b/pdfxray_lite | A lightweight command-line tool for analyzing and visualizing PDFs without a backend | 35 |
pdf-archiver/pdf-archiver | A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. | 305 |
itext/itextsharp | Provides tools and libraries for generating, manipulating, and rendering PDF documents from C#. | 1,367 |
aeksco/aws-pdf-textract-pipeline | A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services | 164 |
ckorzen/pdf-text-extraction-benchmark | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
svengeance/qpdfsharp | A C# wrapper around the QPdf library for PDF manipulation and operations. | 15 |
tabulapdf/tabula-java | Extracts tables from PDF files using Java | 1,848 |
unidoc/unidoc | A Go library for extracting text from PDF files, particularly invoices. | 708 |
hiddenillusion/analyzepe | Analyzes PE files by combining data from various tools to generate a centralized report. | 204 |
j-f-liu/lopdf | A Rust library for working with PDF documents | 1,653 |