segtok
Sentence splitter
Provides tools for splitting text into sentences and words
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.
170 stars
11 watching
22 forks
Language: Python
last commit: almost 3 years ago Related projects:
Repository | Description | Stars |
---|---|---|
c4n/pythonlexto | A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
binwang28/sbert-wk-sentence-embedding | A method to generate sentence embeddings from pre-trained language models | 177 |
neurosnap/sentences | A command line tool to split text into individual sentences | 439 |
smark-1/wagtailterms | Adds support for glossary terms entity to Draftail in Wagtail | 2 |
atgreen/cl-text-splitter | A Common Lisp library for splitting text into manageable segments based on document structure and layout characteristics. | 7 |
ju-bezdek/langchain-decorators | Provides syntactic sugar for writing custom LangChain prompts and chains, making it easier to write more pythonic code. | 228 |
lfcipriani/punkt-segmenter | An implementation of a sentence boundary detection algorithm in Ruby. | 92 |
jonsafari/tok-tok | A fast and simple tokenizer for multiple languages | 28 |
facebookresearch/senteval | Tool for evaluating the quality of sentence embeddings as features in various downstream tasks. | 2,087 |
pucktada/cutkum | A tool for segmenting Thai text into words using Recurrent Neural Networks in TensorFlow. | 154 |
magicstack/magicpython | A Python syntax highlighter package for multiple text editors. | 1,413 |
louismullie/scalpel | A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. | 51 |
fox-it/dissect.eventlog | This is a Python module that parses Windows log file formats | 6 |
foxyseta/tree-sitter-prolog | Provides a Prolog grammar and parser for tree-sitter, enabling parsing of various Prolog formats. | 2 |
johngiorgi/declutr | A tool for training and evaluating sentence embeddings using deep contrastive learning | 379 |