DocumentUnderstanding

Info extractor

Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks.

Research papers and code on information extraction from image/pdf

GitHub

96 stars
7 watching
11 forks
last commit: about 2 years ago

Related projects:

Repository Description Stars
ckorzen/pdf-text-extraction-benchmark Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles 65
eyurtsev/kor An open-source wrapper around LLMs to extract structured data from text 1,638
geeks-of-data/knowledge-gpt Extracts and stores information from various sources using AI models to generate answers. 283
mordragt/bib_kit A Firefox extension that extracts website information to create citations in a specific citation format. 14
subeeshvasu/awsome_delineation A curated list of resources and papers on 3D and 2D delineation techniques for image processing and computer vision applications. 22
xyntopia/pydoxtools A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. 78
nikolamilosevic86/tabinout A framework for extracting information from tables in scientific literature using a rule-based approach. 42
knowitall/reverb Extracts binary relationships from English sentences at scale 543
51j0/android-storage-extractor A tool to extract local data storage of an Android application in one click. 16
steelthread/mimeograph A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities 28
instructor-ai/instructor-js A structured extraction library powered by AI models and TypeScript schema validation 618
gunnarmorling/quarkus-pdf-extract A Quarkus-based microservice to extract text from PDF files 24
drewnoakes/metadata-extractor-dotnet A .NET library for extracting metadata from various image, video, and audio file formats. 953
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 279
monarch-initiative/ontogpt An LLM-based tool for extracting structured information from text with ontology-based grounding. 626