webcorpus
Text processor
A collection of scripts and programs for processing crawled data into a usable text corpus.
webcorpus pipeline
8 stars
4 watching
0 forks
Language: C++
last commit: almost 10 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A utility for searching and processing strings in various formats and encodings. | 121 |
| A text-to-HTML conversion tool with integrated styling and tag customization | 49 |
| A tool for parsing and transforming text formats used in online communication | 41 |
| A simple text manipulation library with a fluent interface. | 53 |
| A modern C++ library for safer and more efficient input parsing. | 1,098 |
| A common architecture for string utilities in the Go programming language | 70 |
| A set of commands for high-speed processing of large-scale CSV data | 33 |
| A package providing a fast and portable way to process character strings with Unicode support | 306 |
| A high-performance library for real-time data processing and time series manipulation | 430 |
| A module that enables parallel processing of large data sets in Magento 2 using multiple child processes. | 80 |
| A text rendering library providing basic font layouting and glyph information for rendering text in arbitrary contexts. | 34 |
| A tool for processing Coq and Lean 4 code embedded in text documents | 237 |
| A tool that provides fast and efficient text analysis and visualization capabilities | 275 |
| A reactive data processing library with publish-subscribe functionality and asyncio support. | 74 |
| A small programming language for processing textual data with improved performance compared to AWK. | 1,256 |