DocumentUnderstanding
Info extractor
Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks.
Research papers and code on information extraction from image/pdf
96 stars
7 watching
11 forks
last commit: almost 2 years ago Related projects:
Repository | Description | Stars |
---|---|---|
ckorzen/pdf-text-extraction-benchmark | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
eyurtsev/kor | Extracts structured data from unstructured text using large language models | 1,629 |
geeks-of-data/knowledge-gpt | Extracts and stores information from various sources using AI models to generate answers. | 279 |
mordragt/bib_kit | A Firefox extension that extracts website information to create citations in a specific citation format. | 13 |
subeeshvasu/awsome_delineation | A curated list of resources and papers on 3D and 2D delineation techniques for image processing and computer vision applications. | 22 |
xyntopia/pydoxtools | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 77 |
nikolamilosevic86/tabinout | A framework for extracting information from tables in scientific literature using a rule-based approach. | 41 |
knowitall/reverb | Extracts binary relationships from English sentences at scale | 543 |
51j0/android-storage-extractor | A tool to extract local data storage of an Android application in one click. | 16 |
steelthread/mimeograph | A CoffeeScript library for extracting text from PDFs and creating searchable files | 28 |
instructor-ai/instructor-js | A structured extraction library powered by AI models and TypeScript schema validation | 586 |
gunnarmorling/quarkus-pdf-extract | A Quarkus-based microservice to extract text from PDF files | 24 |
drewnoakes/metadata-extractor-dotnet | A .NET library for extracting metadata from various image, video, and audio file formats. | 944 |
leofcardoso/pdf2pdfocr | A tool to extract text from PDFs and add a searchable layer to them | 274 |
monarch-initiative/ontogpt | An LLM-based tool for extracting structured information from text with ontology-based grounding. | 609 |