segtok
Sentence splitter
Provides tools for splitting text into sentences and words
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.
171 stars
11 watching
22 forks
Language: Python
last commit: about 3 years ago Related projects:
Repository | Description | Stars |
---|---|---|
| A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
| A method to generate sentence embeddings from pre-trained language models | 178 |
| A command line tool to split text into individual sentences | 441 |
| Adds support for glossary terms entity to Draftail in Wagtail | 4 |
| A Common Lisp library for splitting text into manageable segments based on document structure and layout characteristics. | 7 |
| Provides syntactic sugar for writing custom LangChain prompts and chains, making it easier to write more pythonic code. | 228 |
| A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text | 92 |
| A fast and simple tokenizer for multiple languages | 28 |
| Tool for evaluating the quality of sentence embeddings as features in various downstream tasks. | 2,086 |
| A tool for segmenting Thai text into words using Recurrent Neural Networks in TensorFlow. | 154 |
| A Python syntax highlighter package for multiple text editors. | 1,414 |
| A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. | 51 |
| Provides parsers for parsing Windows log file formats | 6 |
| Provides a Prolog grammar and parser for tree-sitter, enabling parsing of various Prolog formats. | 2 |
| A tool for training and evaluating sentence embeddings using deep contrastive learning | 380 |