SQLite3-ICU
Chinese tokenizer
A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature.
SQLite3 ICU Tokenizer
6 stars
2 watching
3 forks
Language: C
last commit: over 9 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An extension that adds full-text search capabilities to SQLite with Snowball stemming. | 34 |
| A C++ wrapper around the SQLite3 API to simplify its use in C++ applications. | 610 |
| A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
| An extension for generating UUIDs in a SQLite database | 48 |
| A utility for parsing and breaking down CSS3 code into smaller components | 87 |
| A C# wrapper for ICU4C's subset of libraries providing Unicode and Globalization support | 62 |
| A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
| A Snowball stemmer tokenizer extension for FTS5 in SQLite | 48 |
| Automates the generation of TDengine log, data, and configuration files | 0 |
| Provides PostgreSQL type definitions and Ecto extensions for international standards in data storage | 10 |
| A tokenizer that supports fast substring search with FTS (full text search) capabilities | 83 |
| A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 66 |
| An open-source OCR project using the PaddleOCR framework to recognize Chinese characters and text. | 1,338 |
| A C++ tokenizer that tokenizes Hungarian text | 14 |
| Provides a Cgo binding to detect and convert text encoding in a Unicode-based C library | 21 |