pydoxtools

Document extractor

A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

GitHub

78 stars
6 watching
10 forks
Language: Python
last commit: 7 months ago
chatgptdocument-analysisdocument-extractionextractioninformation-retrievalllmnlppdfpython

Related projects:

Repository Description Stars
nikolamilosevic86/tabinout A framework for extracting information from tables in scientific literature using a rule-based approach. 42
pxyup/fitter A utility for extracting and processing data from various sources, including APIs, websites, and static text 120
cocacola-lab/chatie A framework for extracting information from unannotated text using large language models 795
robotools/extractor A tool for extracting data from font binaries into UFO objects. 53
xnl-h4ck3r/xnlinkfinder An automated tool to discover and extract links from web applications 1,216
mediawiki-utilities/python-mediawiki-utilities A set of utilities for extracting and processing data from MediaWiki installations. 55
redteamoperations/googleworkspacedirectorydump A tool to extract and map user and group relationships within Google Workspace directories. 16
earthquakesan/fox-py A Python library for extracting knowledge from data using the Federated Knowledge Extraction Framework. 7
xlab-steampunk/ansible-doc-extractor A tool to extract and format documentation from Ansible modules. 16
malfrats/xeuledoc A tool to fetch information about public Google documents from various services 856
anonyfox/elixir-scrape A tool for extracting structured data from web resources using information-retrieval techniques. 328
bikash/documentunderstanding Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. 96
geeks-of-data/knowledge-gpt Extracts and stores information from various sources using AI models to generate answers. 283
recrm/archivetools A collection of tools for extracting and analyzing data from web archives 71
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336