DocumentUnderstanding

Info extractor

Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks.

Research papers and code on information extraction from image/pdf

GitHub

96 stars
7 watching
11 forks
last commit: almost 2 years ago

Related projects:

Repository Description Stars
ckorzen/pdf-text-extraction-benchmark Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles 65
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
geeks-of-data/knowledge-gpt Extracts and stores information from various sources using AI models to generate answers. 279
mordragt/bib_kit A Firefox extension that extracts website information to create citations in a specific citation format. 13
subeeshvasu/awsome_delineation A curated list of resources and papers on 3D and 2D delineation techniques for image processing and computer vision applications. 22
xyntopia/pydoxtools A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. 77
nikolamilosevic86/tabinout A framework for extracting information from tables in scientific literature using a rule-based approach. 41
knowitall/reverb Extracts binary relationships from English sentences at scale 543
51j0/android-storage-extractor A tool to extract local data storage of an Android application in one click. 16
steelthread/mimeograph A CoffeeScript library for extracting text from PDFs and creating searchable files 28
instructor-ai/instructor-js A structured extraction library powered by AI models and TypeScript schema validation 586
gunnarmorling/quarkus-pdf-extract A Quarkus-based microservice to extract text from PDF files 24
drewnoakes/metadata-extractor-dotnet A .NET library for extracting metadata from various image, video, and audio file formats. 944
leofcardoso/pdf2pdfocr A tool to extract text from PDFs and add a searchable layer to them 274
monarch-initiative/ontogpt An LLM-based tool for extracting structured information from text with ontology-based grounding. 609