PdfPig

PDF analyzer

A C# library for extracting and analyzing text from PDF files

Read and extract text and other content from PDFs in C# (port of PDFBox)

GitHub

2k stars
50 watching
247 forks
Language: C#
last commit: about 1 month ago
Linked from 2 awesome lists

alto-xmlcsharpdocument-analysishocrlayout-analysisnetstandardpage-xmlpdfpdf-documentpdf-document-processorpdf-extractorpdf-filespdf-generationpdfbox

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
bobld/documentlayoutanalysis Develops tools and algorithms for analyzing layout and structure of documents in PDF format 591
hiddenillusion/analyzepdf A tool to analyze PDF files by examining their characteristics to determine if they are malicious or benign. 178
jesparza/peepdf A Python tool for analyzing PDF files to identify potential security risks and malicious content. 1,319
steelthread/mimeograph A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities 28
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 279
9b/pdfxray_lite A lightweight command-line tool for analyzing and visualizing PDFs without a backend 35
pdf-archiver/pdf-archiver A tool for digitizing and organizing paper documents by scanning and tagging files for easy searching. 308
itext/itextsharp Provides tools and libraries for generating, manipulating, and rendering PDF documents from C#. 1,371
aeksco/aws-pdf-textract-pipeline A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services 164
ckorzen/pdf-text-extraction-benchmark Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles 65
svengeance/qpdfsharp A C# wrapper around the QPdf library for PDF manipulation and operations. 17
tabulapdf/tabula-java Extracts tables from PDF files using Java 1,859
unidoc/unidoc A Go library for extracting text from PDF files, particularly invoices. 708
hiddenillusion/analyzepe Analyzes PE files by combining data from various tools to generate a centralized report. 204
j-f-liu/lopdf A Rust library for working with PDF documents 1,680