DocumentUnderstanding

Info extractor

Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks.

Research papers and code on information extraction from image/pdf

GitHub

96 stars

7 watching

11 forks

last commit: over 3 years ago

Related projects:

Repository	Description	Stars
ckorzen/pdf-text-extraction-benchmark	Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles	65
eyurtsev/kor	An open-source wrapper around LLMs to extract structured data from text	1,638
geeks-of-data/knowledge-gpt	Extracts and stores information from various sources using AI models to generate answers.	283
mordragt/bib_kit	A Firefox extension that extracts website information to create citations in a specific citation format.	14
subeeshvasu/awsome_delineation	A curated list of resources and papers on 3D and 2D delineation techniques for image processing and computer vision applications.	22
xyntopia/pydoxtools	A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.	78
nikolamilosevic86/tabinout	A framework for extracting information from tables in scientific literature using a rule-based approach.	42
knowitall/reverb	Extracts binary relationships from English sentences at scale	543
51j0/android-storage-extractor	A tool to extract local data storage of an Android application in one click.	16
steelthread/mimeograph	A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities	28
instructor-ai/instructor-js	A structured extraction library powered by AI models and TypeScript schema validation	618
gunnarmorling/quarkus-pdf-extract	A Quarkus-based microservice to extract text from PDF files	24
drewnoakes/metadata-extractor-dotnet	A .NET library for extracting metadata from various image, video, and audio file formats.	953
leofcardoso/pdf2pdfocr	A tool to extract text from PDFs and add a searchable layer to them	279
monarch-initiative/ontogpt	An LLM-based tool for extracting structured information from text with ontology-based grounding.	626