toxy

Document extractor

A .NET framework for extracting text from various document formats across multiple platforms.

.net text extraction framework

362 stars

39 watching

107 forks

Language: C#

last commit: almost 2 years ago

Linked from 2 awesome lists

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
ckorzen/pdf-text-extraction-benchmark	Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles	65
dbuenzli/uuseg	An OCaml library for segmenting Unicode text into grapheme clusters, words, and sentences.	23
eyurtsev/kor	An open-source wrapper around LLMs to extract structured data from text	1,638
nikolamilosevic86/tabinout	A framework for extracting information from tables in scientific literature using a rule-based approach.	42
xyntopia/pydoxtools	A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines.	78
meilisearch/docs-scraper	Automates scraping and indexing of documentation content into a search engine	297
sillsdev/standardformatlib	A C# library for reading and writing files using standard format markers	0
s0rg/crawley	A utility for systematically extracting URLs from web pages and printing them to the console.	268
jjelosua/doga_scraper	A tool that extracts and converts Galician Official journal documents to different formats based on input year.	0
sinairv/yaxlib	A flexible XML serialization library for .NET Framework and .NET Core	0
feichao93/temme	A lightweight, CSS-based selector for extracting structured data from HTML documents.	273
fielddb/multilingualcorporaextractor	Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis.	0
tjatse/node-readability	Automates web page scraping and text extraction to make any webpage readable	343
aymericbeaumet/squeeze	A tool to extract relevant information from text	17