gotokenizer

Chinese Tokenizer Library

A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)

GitHub

21 stars

3 watching

7 forks

Language: Go

last commit: over 6 years ago

Linked from 2 awesome lists

golangsegmentationtokenizer

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
bzick/tokenizer	A high-performance tokenization library for Go, capable of parsing various data formats and syntaxes.	103
fangpenlin/loso	A Python library for Chinese text segmentation using a Hidden Makov Model algorithm	83
jonsafari/tok-tok	A fast and simple tokenizer for multiple languages	28
thisiscetin/textoken	A gem for extracting words from text with customizable tokenization rules	31
xujiajun/gorouter	A fast and feature-rich HTTP router for Go that supports regular expressions.	532
fukuball/jieba-php	A PHP module for Chinese text segmentation and word breaking	1,331
mimosa/jieba-jruby	Provides a Ruby port of the popular Chinese language processing library Jieba	8
zencephalon/tactful_tokenizer	A Ruby library that tokenizes text into sentences using a Bayesian statistical model	80
xujiajun/pattern-guidance	A comprehensive guide to design patterns in Go programming language	268
diasks2/pragmatic_tokenizer	A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances.	90
6/tiny_segmenter	A Ruby port of a Japanese text tokenization algorithm	21
abitdodgy/words_counted	A Ruby library that tokenizes input and provides various statistical measures about the tokens	159
zseder/huntoken	A tool for tokenizing raw text into words and sentences in multiple languages, including Hungarian.	4
arbox/tokenizer	A Ruby-based library for splitting written text into tokens for natural language processing tasks.	46
tiancaiamao/shen-go	A Go implementation of Shen, a portable functional programming language with features like pattern matching and macro support.	56