pragmatic_segmenter

Sentence segmenter

A rule-based sentence boundary detection gem that works across many languages

Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.

GitHub

559 stars
16 watching
54 forks
Language: Ruby
last commit: 5 months ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lfcipriani/punkt-segmenter A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text 92
uglytoad/pragmaticsegmenternet A C# implementation of sentence boundary detection with rule-based approach. 33
diasks2/pragmatic_tokenizer A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances. 90
nipunsadvilkar/pysbd A Python package for out-of-the-box sentence boundary detection using rule-based algorithms. 821
apohllo/srx-english A Ruby library providing sentence segmentation rules based on the SRX standard for English language text processing. 18
tkellen/ruby-ngram Breaks text into contiguous sequences of words or phrases 12
diasks2/chat_correct A tool that highlights errors in user input to help improve English language skills 43
louismullie/scalpel A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. 51
6/tiny_segmenter A Ruby port of a Japanese text tokenization algorithm 21
lartpang/pysodmetrics A library providing an implementation of various metrics for object segmentation and saliency detection in computer vision. 150
diasks2/word_count_analyzer An analyzer tool to account for variations in word count calculations 20
cslu-nlp/detectormorse A tool for automatically detecting sentence boundaries in natural language text using machine learning and handcrafted features. 90
dcjones/proseg An open-source software package for probabilistic cell segmentation in spatial transcriptomics 46
juntang-zhuang/shelfnet An implementation of a lightweight semantic segmentation model with real-time performance capabilities 252
aalto-speech/morfessor A tool for unsupervised and semi-supervised morphological segmentation in text data 186