webcorpus
Text processor
A collection of scripts and programs for processing crawled data into a usable text corpus.
webcorpus pipeline
8 stars
4 watching
0 forks
Language: C++
last commit: over 9 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
ericzimmerman/bstrings | A utility for searching and processing strings in various formats and encodings. | 121 |
senselogic/pendown | A text-to-HTML conversion tool with integrated styling and tag customization | 49 |
wooorm/dioscuri | A tool for parsing and transforming text formats used in online communication | 41 |
kzykhys/text | A simple text manipulation library with a fluent interface. | 53 |
eliaskosunen/scnlib | A modern C++ library for safer and more efficient input parsing. | 1,098 |
esemplastic/unis | A common architecture for string utilities in the Go programming language | 70 |
nysol/mcmd | A set of commands for high-speed processing of large-scale CSV data | 33 |
gagolews/stringi | A package providing a fast and portable way to process character strings with Unicode support | 306 |
spreads/spreads | A high-performance library for real-time data processing and time series manipulation | 430 |
zepgram/module-multi-threading | A module that enables parallel processing of large data sets in Magento 2 using multiple child processes. | 80 |
ziglibs/fontaine | A text rendering library providing basic font layouting and glyph information for rendering text in arbitrary contexts. | 34 |
cpitclaudel/alectryon | A tool for processing Coq and Lean 4 code embedded in text documents | 237 |
zix99/rare | A tool that provides fast and efficient text analysis and visualization capabilities | 275 |
semiversus/python-broqer | A reactive data processing library with publish-subscribe functionality and asyncio support. | 74 |
ezrosent/frawk | A small programming language for processing textual data with improved performance compared to AWK. | 1,256 |