PdfPig

PDF analyzer

A C# library for extracting and analyzing text from PDF files

Read and extract text and other content from PDFs in C# (port of PDFBox)

GitHub

2k stars
49 watching
241 forks
Language: C#
last commit: 10 days ago
Linked from 2 awesome lists

alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bobld/documentlayoutanalysis Develops tools and algorithms for analyzing layout and structure of documents in PDF format 583
hiddenillusion/analyzepdf A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. 176
jesparza/peepdf A Python tool for analyzing PDF files to identify potential security risks and malicious content. 1,309
steelthread/mimeograph A CoffeeScript library for extracting text from PDFs and creating searchable files 28
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 274
9b/pdfxray_lite A lightweight command-line tool for analyzing and visualizing PDFs without a backend 35
pdf-archiver/pdf-archiver A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. 305
itext/itextsharp Provides tools and libraries for generating, manipulating, and rendering PDF documents from C#. 1,367
aeksco/aws-pdf-textract-pipeline A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services 164
ckorzen/pdf-text-extraction-benchmark Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles 65
svengeance/qpdfsharp A C# wrapper around the QPdf library for PDF manipulation and operations. 15
tabulapdf/tabula-java Extracts tables from PDF files using Java 1,848
unidoc/unidoc A Go library for extracting text from PDF files, particularly invoices. 708
hiddenillusion/analyzepe Analyzes PE files by combining data from various tools to generate a centralized report. 204
j-f-liu/lopdf A Rust library for working with PDF documents 1,653