toxy
Document extractor
A .NET framework for extracting text from various document formats across multiple platforms.
.net text extraction framework
362 stars
39 watching
107 forks
Language: C#
last commit: about 1 year ago
Linked from 2 awesome lists
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
| | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
| | An OCaml library for segmenting Unicode text into grapheme clusters, words, and sentences. | 23 |
| | An open-source wrapper around LLMs to extract structured data from text | 1,638 |
| | A framework for extracting information from tables in scientific literature using a rule-based approach. | 42 |
| | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 78 |
| | Automates scraping and indexing of documentation content into a search engine | 297 |
| | A C# library for reading and writing files using standard format markers | 0 |
| | A utility for systematically extracting URLs from web pages and printing them to the console. | 268 |
| | A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
| | A flexible XML serialization library for .NET Framework and .NET Core | 0 |
| | A lightweight, CSS-based selector for extracting structured data from HTML documents. | 273 |
| | Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis. | 0 |
| | Automates web page scraping and text extraction to make any webpage readable | 343 |
| | A tool to extract relevant information from text | 17 |