pydoxtools

Document extractor

A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

GitHub

77 stars
6 watching
10 forks
Language: Python
last commit: 3 months ago
chatgptdocument-analysisdocument-extractionextractioninformation-retrievalllmnlppdfpython

Related projects:

Repository Description Stars
nikolamilosevic86/tabinout A framework for extracting information from tables in scientific literature using a rule-based approach. 41
pxyup/fitter A utility for extracting and processing data from various sources, including APIs, websites, and static text 119
cocacola-lab/chatie A framework for extracting information from unannotated text using large language models 789
robotools/extractor A tool for extracting data from font binaries into UFO objects. 52
xnl-h4ck3r/xnlinkfinder A Python tool used to automatically discover and extract endpoints, parameters, and wordlists from target websites. 1,204
mediawiki-utilities/python-mediawiki-utilities Provides tools to extract and process data from MediaWiki installations 55
redteamoperations/googleworkspacedirectorydump A tool to extract and map user and group relationships within Google Workspace directories. 16
earthquakesan/fox-py A Python library for extracting knowledge from data using the Federated Knowledge Extraction Framework. 7
xlab-steampunk/ansible-doc-extractor A tool to extract and format documentation from Ansible modules. 16
malfrats/xeuledoc A tool to fetch information about public Google documents from various services 846
anonyfox/elixir-scrape A tool for extracting structured data from web resources using information-retrieval techniques. 328
bikash/documentunderstanding Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. 96
geeks-of-data/knowledge-gpt Extracts and stores information from various sources using AI models to generate answers. 279
recrm/archivetools A collection of tools for extracting and analyzing data from web archives 69
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336