loso
Chinese Segmentation System
An implementation of a Chinese segmentation system using Hidden Makov Model algorithm
Chinese segmentation library
83 stars
6 watching
23 forks
Language: Python
last commit: over 13 years ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
xujiajun/gotokenizer | A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
fukuball/jieba-php | A PHP module for Chinese text segmentation and word breaking | 1,323 |
thisiscetin/textoken | A gem for extracting words from text with customizable tokenization rules | 31 |
c4n/pythonlexto | A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
zencephalon/tactful_tokenizer | A Ruby library that tokenizes text into sentences using a Bayesian statistical model | 80 |
6/tiny_segmenter | A Ruby port of a Japanese text tokenization algorithm | 21 |
diasks2/pragmatic_tokenizer | A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances. | 90 |
arbox/tokenizer | A Ruby-based library for splitting written text into tokens for natural language processing tasks. | 46 |
lfcipriani/punkt-segmenter | An implementation of a sentence boundary detection algorithm in Ruby. | 92 |
languagemachines/ucto | A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 65 |
lxneng/xpinyin | A Python library for translating Chinese characters to pinyin | 823 |
zseder/huntoken | A tool for tokenizing raw text into words and sentences in multiple languages. | 3 |
shonfeder/tokenize | A Prolog-based tokenization library for lexing text into common tokens | 11 |
bzick/tokenizer | A high-performance tokenization library for Go, capable of parsing various data formats and syntaxes. | 98 |
duanhongyi/genius | A Python library implementing Conditional Random Field-based segmenter for Chinese text processing | 234 |