words_counted

Tokenizer

A Ruby library that tokenizes input and provides various statistical measures about the tokens

A Ruby natural language processor.

GitHub

159 stars

12 watching

29 forks

Language: Ruby

last commit: almost 4 years ago

Linked from 2 awesome lists

natural-language-processingnlprubyrubynlpword-counterwordcountwordscounter

Screenshot of abitdodgy/words_counted website

rubywordcount.com

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
arbox/tokenizer	A Ruby-based library for splitting written text into tokens for natural language processing tasks.	46
zencephalon/tactful_tokenizer	A Ruby library that tokenizes text into sentences using a Bayesian statistical model	80
thisiscetin/textoken	A gem for extracting words from text with customizable tokenization rules	31
diasks2/pragmatic_tokenizer	A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances.	90
juliatext/wordtokenizers.jl	A set of high-performance tokenizers for natural language processing tasks	96
shonfeder/tokenize	A Prolog-based tokenization library for lexing text into common tokens	11
6/tiny_segmenter	A Ruby port of a Japanese text tokenization algorithm	21
arbox/treetagger-ruby	A Ruby wrapper for a statistical language modeling tool for part-of-speech tagging and chunking	16
lfcipriani/punkt-segmenter	A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text	92
denosaurs/tokenizer	A simple tokenizer library for parsing and analyzing text input in various formats.	17
jonsafari/tok-tok	A fast and simple tokenizer for multiple languages	28
bzick/tokenizer	A high-performance tokenization library for Go, capable of parsing various data formats and syntaxes.	103
c4n/pythonlexto	A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications.	1
tkellen/ruby-ngram	Breaks text into contiguous sequences of words or phrases	12
mathewsanders/mustard	A Swift library for tokenizing strings with customizable matching behavior	689