ruby-ngram

Text segmenter

Breaks text into contiguous sequences of words or phrases

Break words and phrases into ngrams.

GitHub

12 stars
4 watching
2 forks
Language: Ruby
last commit: almost 11 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pharo-ai/ngrammodel A tool for splitting text into sequences of words 4
lfcipriani/punkt-segmenter An implementation of a sentence boundary detection algorithm in Ruby. 92
6/tiny_segmenter A Ruby port of a Japanese text tokenization algorithm 21
ankane/fasttext-ruby Efficient text classification and representation learning library for Ruby 203
reddavis/n-gram Generates sequences of characters from a given text, useful for data analysis and modeling 37
nelstrom/vim-textobj-rubyblock A Vim plugin for selecting Ruby blocks 331
postmodern/raingrams A flexible ngrams library in Ruby allowing users to model and generate text 69
ankane/torchtext-ruby A Ruby library providing data loaders and abstractions for text and NLP tasks 34
abitdodgy/words_counted A Ruby library that tokenizes input and provides various statistical measures about the tokens 159
diasks2/pragmatic_segmenter A rule-based sentence boundary detection gem that works across many languages 551
tmm1/rblineprof A line profiler for Ruby programming language 771
patterns-ai-core/langchainrb A Ruby library providing an interface to Large Language Model (LLM) providers for text generation and embedding 1,415
ankane/ngt-ruby A high-performance approximate nearest neighbors search library for Ruby 50
yohasebe/lemmatizer A Ruby library that provides a lemmatizer for text in English. 108
louismullie/scalpel A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. 51