PdfPig

PDF analyzer

A C# library for extracting and analyzing text from PDF files

Read and extract text and other content from PDFs in C# (port of PDFBox)

GitHub

2k stars

50 watching

247 forks

Language: C#

last commit: over 1 year ago

Linked from 2 awesome lists

alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox

github.com/UglyToad/PdfPig/wiki

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
bobld/documentlayoutanalysis	Develops tools and algorithms for analyzing layout and structure of documents in PDF format	591
hiddenillusion/analyzepdf	A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign.	178
jesparza/peepdf	A Python tool for analyzing PDF files to identify potential security risks and malicious content.	1,319
steelthread/mimeograph	A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities	28
leofcardoso/pdf2pdfocr	A tool to extract text from PDFs and add a searchable layer to them	279
9b/pdfxray_lite	A lightweight command-line tool for analyzing and visualizing PDFs without a backend	35
pdf-archiver/pdf-archiver	A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching.	308
itext/itextsharp	Provides tools and libraries for generating, manipulating, and rendering PDF documents from C#.	1,371
aeksco/aws-pdf-textract-pipeline	A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services	164
ckorzen/pdf-text-extraction-benchmark	Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles	65
svengeance/qpdfsharp	A C# wrapper around the QPdf library for PDF manipulation and operations.	17
tabulapdf/tabula-java	Extracts tables from PDF files using Java	1,859
unidoc/unidoc	A Go library for extracting text from PDF files, particularly invoices.	708
hiddenillusion/analyzepe	Analyzes PE files by combining data from various tools to generate a centralized report.	204
j-f-liu/lopdf	A Rust library for working with PDF documents	1,680