SQLite3-ICU

Chinese tokenizer

A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature.

SQLite3 ICU Tokenizer

6 stars

2 watching

3 forks

Language: C

last commit: about 11 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

mindreframer/awesome-sqlite

Related projects:

Repository	Description	Stars
illarionov/sqlite3-unicodesn	An extension that adds full-text search capabilities to SQLite with Snowball stemming.	34
iwongu/sqlite3pp	A C++ wrapper around the SQLite3 API to simplify its use in C++ applications.	610
xujiajun/gotokenizer	A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese	21
benwebber/sqlite3-uuid	An extension for generating UUIDs in a SQLite database	48
gorilla/css	A utility for parsing and breaking down CSS3 code into smaller components	87
sillsdev/icu-dotnet	A C# wrapper for ICU4C's subset of libraries providing Unicode and Globalization support	62
c4n/pythonlexto	A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications.	1
abiliojr/fts5-snowball	A Snowball stemmer tokenizer extension for FTS5 in SQLite	48
glzhao89/auto_taos_cfg	Automates the generation of TDengine log, data, and configuration files	0
frost/isn	Provides PostgreSQL type definitions and Ecto extensions for international standards in data storage	10
haifengkao/sqlitesubstringsearch	A tokenizer that supports fast substring search with FTS (full text search) capabilities	83
languagemachines/ucto	A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing	66
wangfreexx/wangfreexx-tianruoocr-cl-paddle	An open-source OCR project using the PaddleOCR framework to recognize Chinese characters and text.	1,338
nytud/quntoken	A C++ tokenizer that tokenizes Hungarian text	14
goodsign/icu	Provides a Cgo binding to detect and convert text encoding in a Unicode-based C library	21