toxy
Document extractor
A .NET framework for extracting text from various document formats across multiple platforms.
.net text extraction framework
362 stars
39 watching
107 forks
Language: C#
last commit: 4 months ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
| Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
| An OCaml library for segmenting Unicode text into grapheme clusters, words, and sentences. | 23 |
| An open-source wrapper around LLMs to extract structured data from text | 1,638 |
| A framework for extracting information from tables in scientific literature using a rule-based approach. | 42 |
| A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 78 |
| Automates scraping and indexing of documentation content into a search engine | 297 |
| A C# library for reading and writing files using standard format markers | 0 |
| A utility for systematically extracting URLs from web pages and printing them to the console. | 268 |
| A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
| A flexible XML serialization library for .NET Framework and .NET Core | 0 |
| A lightweight, CSS-based selector for extracting structured data from HTML documents. | 273 |
| Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis. | 0 |
| Automates web page scraping and text extraction to make any webpage readable | 343 |
| A tool to extract relevant information from text | 17 |