jieba-php

Chinese tokenizer

A PHP module for Chinese text segmentation and word breaking

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

GitHub

1k stars

56 watching

261 forks

Language: PHP

last commit: almost 3 years ago

Linked from 3 awesome lists

chinese-text-segmentationmachine-learningnatural-language-processingnlp

Screenshot of fukuball/jieba-php website

jieba-php.fukuball.com

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
mimosa/jieba-jruby	Provides a Ruby port of the popular Chinese language processing library Jieba	8
452896915/jieba-android	An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization	153
fangpenlin/loso	A Python library for Chinese text segmentation using a Hidden Makov Model algorithm	83
xujiajun/gotokenizer	A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese	21
6/tiny_segmenter	A Ruby port of a Japanese text tokenization algorithm	21
hit-scir/chinese-mixtral-8x7b	An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary.	645
lichunqiang/wordcolor.php	A PHP class that generates color codes based on words	1
duanhongyi/genius	A Python library implementing Conditional Random Field-based segmenter for Chinese text processing	234
c4n/pythonlexto	A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications.	1
sinovation/zen	A pre-trained BERT-based Chinese text encoder with enhanced N-gram representations	645
jiahuadong/fiss	Implementations of federated incremental semantic segmentation in PyTorch.	34
jonsafari/tok-tok	A fast and simple tokenizer for multiple languages	28
cebe/markdown	A fast and extensible Markdown parser for PHP	1,002
wangwang4git/sqlite3-icu	A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature.	6
arleyguolei/wx-words-pk	A set of tools and components for building Chinese input methods, focusing on character prediction and suggestion algorithms.	895