pragmatic_segmenter

Sentence segmenter

A rule-based sentence boundary detection gem that works across many languages

Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.

GitHub

551 stars
16 watching
55 forks
Language: Ruby
last commit: 3 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lfcipriani/punkt-segmenter An implementation of a sentence boundary detection algorithm in Ruby. 92
uglytoad/pragmaticsegmenternet A C# implementation of sentence boundary detection with rule-based approach. 33
diasks2/pragmatic_tokenizer A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances. 90
nipunsadvilkar/pysbd A Python package for out-of-the-box sentence boundary detection using rule-based algorithms. 807
apohllo/srx-english A Ruby library containing English sentence and word segmentation rules based on the SRX standard. 18
tkellen/ruby-ngram Breaks text into contiguous sequences of words or phrases 12
diasks2/chat_correct A tool that highlights errors in user input to help improve English language skills 43
louismullie/scalpel A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. 51
6/tiny_segmenter A Ruby port of a Japanese text tokenization algorithm 21
lartpang/pysodmetrics A library providing an implementation of various metrics for object segmentation and saliency detection in computer vision. 144
diasks2/word_count_analyzer An analyzer tool to account for variations in word count calculations 20
cslu-nlp/detectormorse A tool for automatically detecting sentence boundaries in natural language text using machine learning and handcrafted features. 90
dcjones/proseg An open-source software package for probabilistic cell segmentation in spatial transcriptomics 45
juntang-zhuang/shelfnet An implementation of a lightweight semantic segmentation model with real-time performance capabilities 252
aalto-speech/morfessor A tool for unsupervised and semi-supervised morphological segmentation of text data 185