toxy
Document extractor
A .NET framework for extracting text from various document formats across multiple platforms.
.net text extraction framework
361 stars
39 watching
107 forks
Language: C#
last commit: about 1 month ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
ckorzen/pdf-text-extraction-benchmark | Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
dbuenzli/uuseg | An OCaml library for segmenting Unicode text into grapheme clusters, words, and sentences. | 23 |
eyurtsev/kor | Extracts structured data from unstructured text using large language models | 1,629 |
nikolamilosevic86/tabinout | A framework for extracting information from tables in scientific literature using a rule-based approach. | 41 |
xyntopia/pydoxtools | A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. | 77 |
meilisearch/docs-scraper | Automates scraping and indexing of documentation content into a search engine | 290 |
sillsdev/standardformatlib | A C# library for reading and writing files using standard format markers | 0 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 265 |
jjelosua/doga_scraper | A tool that extracts and converts Galician Official journal documents to different formats based on input year. | 0 |
sinairv/yaxlib | A flexible XML serialization library for .NET Framework and .NET Core | 0 |
feichao93/temme | A lightweight, CSS-based selector for extracting structured data from HTML documents. | 273 |
fielddb/multilingualcorporaextractor | Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis. | 0 |
tjatse/node-readability | Automates web page scraping and text extraction to make any webpage readable | 343 |
aymericbeaumet/squeeze | A tool to extract relevant information from text | 17 |