loso
Chinese Segmentation Library
A Python library for Chinese text segmentation using a Hidden Makov Model algorithm
Chinese segmentation library
83 stars
6 watching
23 forks
Language: Python
last commit: almost 14 years ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
| A PHP module for Chinese text segmentation and word breaking | 1,331 |
| A gem for extracting words from text with customizable tokenization rules | 31 |
| A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
| A Ruby library that tokenizes text into sentences using a Bayesian statistical model | 80 |
| A Ruby port of a Japanese text tokenization algorithm | 21 |
| A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances. | 90 |
| A Ruby-based library for splitting written text into tokens for natural language processing tasks. | 46 |
| A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text | 92 |
| A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 66 |
| A Python library for translating Chinese characters to pinyin | 826 |
| A tool for tokenizing raw text into words and sentences in multiple languages, including Hungarian. | 4 |
| A Prolog-based tokenization library for lexing text into common tokens | 11 |
| A high-performance tokenization library for Go, capable of parsing various data formats and syntaxes. | 103 |
| A Python library implementing Conditional Random Field-based segmenter for Chinese text processing | 234 |