pydoxtools
Document extractor
A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
77 stars
6 watching
10 forks
Language: Python
last commit: 3 months ago chatgptdocument-analysisdocument-extractionextractioninformation-retrievalllmnlppdfpython
Related projects:
Repository | Description | Stars |
---|---|---|
nikolamilosevic86/tabinout | A framework for extracting information from tables in scientific literature using a rule-based approach. | 41 |
pxyup/fitter | A utility for extracting and processing data from various sources, including APIs, websites, and static text | 119 |
cocacola-lab/chatie | A framework for extracting information from unannotated text using large language models | 789 |
robotools/extractor | A tool for extracting data from font binaries into UFO objects. | 52 |
xnl-h4ck3r/xnlinkfinder | A Python tool used to automatically discover and extract endpoints, parameters, and wordlists from target websites. | 1,204 |
mediawiki-utilities/python-mediawiki-utilities | Provides tools to extract and process data from MediaWiki installations | 55 |
redteamoperations/googleworkspacedirectorydump | A tool to extract and map user and group relationships within Google Workspace directories. | 16 |
earthquakesan/fox-py | A Python library for extracting knowledge from data using the Federated Knowledge Extraction Framework. | 7 |
xlab-steampunk/ansible-doc-extractor | A tool to extract and format documentation from Ansible modules. | 16 |
malfrats/xeuledoc | A tool to fetch information about public Google documents from various services | 846 |
anonyfox/elixir-scrape | A tool for extracting structured data from web resources using information-retrieval techniques. | 328 |
bikash/documentunderstanding | Research and development of tools and techniques for extracting information from images and PDFs using deep learning and graph neural networks. | 96 |
geeks-of-data/knowledge-gpt | Extracts and stores information from various sources using AI models to generate answers. | 279 |
recrm/archivetools | A collection of tools for extracting and analyzing data from web archives | 69 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |