segtok

Sentence splitter

Provides tools for splitting text into sentences and words

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.

GitHub

170 stars
11 watching
22 forks
Language: Python
last commit: almost 3 years ago

Related projects:

Repository Description Stars
c4n/pythonlexto A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. 1
binwang28/sbert-wk-sentence-embedding A method to generate sentence embeddings from pre-trained language models 177
neurosnap/sentences A command line tool to split text into individual sentences 439
smark-1/wagtailterms Adds support for glossary terms entity to Draftail in Wagtail 2
atgreen/cl-text-splitter A Common Lisp library for splitting text into manageable segments based on document structure and layout characteristics. 7
ju-bezdek/langchain-decorators Provides syntactic sugar for writing custom LangChain prompts and chains, making it easier to write more pythonic code. 228
lfcipriani/punkt-segmenter An implementation of a sentence boundary detection algorithm in Ruby. 92
jonsafari/tok-tok A fast and simple tokenizer for multiple languages 28
facebookresearch/senteval Tool for evaluating the quality of sentence embeddings as features in various downstream tasks. 2,087
pucktada/cutkum A tool for segmenting Thai text into words using Recurrent Neural Networks in TensorFlow. 154
magicstack/magicpython A Python syntax highlighter package for multiple text editors. 1,413
louismullie/scalpel A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. 51
fox-it/dissect.eventlog This is a Python module that parses Windows log file formats 6
foxyseta/tree-sitter-prolog Provides a Prolog grammar and parser for tree-sitter, enabling parsing of various Prolog formats. 2
johngiorgi/declutr A tool for training and evaluating sentence embeddings using deep contrastive learning 379