SQLite3-ICU
Chinese tokenizer
A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature.
SQLite3 ICU Tokenizer
6 stars
2 watching
3 forks
Language: C
last commit: over 9 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
illarionov/sqlite3-unicodesn | An extension that adds full-text search capabilities to SQLite with Snowball stemming. | 34 |
iwongu/sqlite3pp | A C++ wrapper around the SQLite3 API to simplify its use in C++ applications. | 606 |
xujiajun/gotokenizer | A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
benwebber/sqlite3-uuid | An extension for generating UUIDs in a SQLite database | 48 |
gorilla/css | A utility for parsing and breaking down CSS3 code into smaller components | 87 |
sillsdev/icu-dotnet | A C# wrapper for ICU4C's subset of libraries providing Unicode and Globalization support | 62 |
c4n/pythonlexto | A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
abiliojr/fts5-snowball | A Snowball stemmer tokenizer extension for FTS5 in SQLite | 47 |
glzhao89/auto_taos_cfg | Automates the generation of TDengine log, data, and configuration files | 0 |
frost/isn | Provides PostgreSQL type definitions and Ecto extensions for international standards in data storage | 10 |
haifengkao/sqlitesubstringsearch | A tokenizer that supports fast substring search with FTS (full text search) capabilities | 83 |
languagemachines/ucto | A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 65 |
wangfreexx/wangfreexx-tianruoocr-cl-paddle | An open-source OCR project using the PaddleOCR framework to recognize Chinese characters and text. | 1,337 |
nytud/quntoken | A C++ tokenizer that tokenizes Hungarian text | 14 |
goodsign/icu | Provides a Cgo binding to detect and convert text encoding in a Unicode-based C library | 21 |