jieba-php

Chinese tokenizer

A PHP module for Chinese text segmentation and word breaking

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

GitHub

1k stars
56 watching
260 forks
Language: PHP
last commit: over 2 years ago
Linked from 3 awesome lists

chinese-text-segmentationmachine-learningnatural-language-processingnlp

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mimosa/jieba-jruby Provides a Ruby port of the popular Chinese language processing library Jieba 8
452896915/jieba-android An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization 152
fangpenlin/loso An implementation of a Chinese segmentation system using Hidden Makov Model algorithm 83
xujiajun/gotokenizer A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese 21
6/tiny_segmenter A Ruby port of a Japanese text tokenization algorithm 21
hit-scir/chinese-mixtral-8x7b An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. 641
lichunqiang/wordcolor.php A PHP class that generates color codes based on words 1
duanhongyi/genius A Python library implementing Conditional Random Field-based segmenter for Chinese text processing 234
c4n/pythonlexto A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. 1
sinovation/zen A pre-trained BERT-based Chinese text encoder with enhanced N-gram representations 643
jiahuadong/fiss Implementations of federated incremental semantic segmentation in PyTorch. 33
jonsafari/tok-tok A fast and simple tokenizer for multiple languages 28
cebe/markdown A fast and extensible Markdown parser for PHP 999
wangwang4git/sqlite3-icu A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature. 6
arleyguolei/wx-words-pk A set of tools and components for building Chinese input methods, focusing on character prediction and suggestion algorithms. 886