punkt-segmenter

Sentence tokenizer

A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text

Ruby port of the NLTK Punkt sentence segmentation algorithm

GitHub

92 stars

2 watching

10 forks

Language: Ruby

last commit: over 7 years ago

Linked from 1 awesome list

nlp-librarynltkpunkt-segmenterrubyruby-portrubynlpsentence-boundariessentence-tokenizertokenized-sentences

Backlinks from these awesome lists:

arbox/nlp-with-ruby

Related projects:

Repository	Description	Stars
6/tiny_segmenter	A Ruby port of a Japanese text tokenization algorithm	21
diasks2/pragmatic_segmenter	A rule-based sentence boundary detection gem that works across many languages	559
zencephalon/tactful_tokenizer	A Ruby library that tokenizes text into sentences using a Bayesian statistical model	80
arbox/tokenizer	A Ruby-based library for splitting written text into tokens for natural language processing tasks.	46
tkellen/ruby-ngram	Breaks text into contiguous sequences of words or phrases	12
abitdodgy/words_counted	A Ruby library that tokenizes input and provides various statistical measures about the tokens	159
thisiscetin/textoken	A gem for extracting words from text with customizable tokenization rules	31
nipunsadvilkar/pysbd	A Python package for out-of-the-box sentence boundary detection using rule-based algorithms.	821
jonsafari/tok-tok	A fast and simple tokenizer for multiple languages	28
c4n/pythonlexto	A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications.	1
languagemachines/ucto	A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing	66
diasks2/pragmatic_tokenizer	A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances.	90
lefnord/rstt	A Ruby wrapper around the Stuttgarter Tree Tagger for natural language processing tasks	6
louismullie/scalpel	A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases.	51
burdettelamar/markdown_helper	A Ruby gem for pre-processing markdown files with file inclusion and formatting options.	40