jieba-php
Chinese tokenizer
A PHP module for Chinese text segmentation and word breaking
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.
1k stars
56 watching
261 forks
Language: PHP
last commit: over 2 years ago
Linked from 3 awesome lists
chinese-text-segmentationmachine-learningnatural-language-processingnlp
Related projects:
Repository | Description | Stars |
---|---|---|
| Provides a Ruby port of the popular Chinese language processing library Jieba | 8 |
| An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization | 153 |
| A Python library for Chinese text segmentation using a Hidden Makov Model algorithm | 83 |
| A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
| A Ruby port of a Japanese text tokenization algorithm | 21 |
| An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. | 645 |
| A PHP class that generates color codes based on words | 1 |
| A Python library implementing Conditional Random Field-based segmenter for Chinese text processing | 234 |
| A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
| A pre-trained BERT-based Chinese text encoder with enhanced N-gram representations | 645 |
| Implementations of federated incremental semantic segmentation in PyTorch. | 34 |
| A fast and simple tokenizer for multiple languages | 28 |
| A fast and extensible Markdown parser for PHP | 1,002 |
| A C-based implementation of a Chinese tokenizer for SQLite3 using ICU's Analysis feature. | 6 |
| A set of tools and components for building Chinese input methods, focusing on character prediction and suggestion algorithms. | 895 |