morfessor 
 segmenter
 A tool for unsupervised and semi-supervised morphological segmentation in text data
Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
186 stars
 23 watching
 29 forks
 
Language: Python 
last commit: about 5 years ago 
Linked from   1 awesome list  
  pythonsegmentationsubword-segmentationsubword-units 
 Related projects:
| Repository | Description | Stars | 
|---|---|---|
|    |  An NLTK-based parser that provides morphological annotation for languages using KR-style annotations. | 4 | 
|    |  A rule-based sentence boundary detection gem that works across many languages | 559 | 
|    |  Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 | 
|    |  Provides tools for splitting text into sentences and words | 171 | 
|    |  Provides PyTorch implementations of various models and pipelines for semantic segmentation in deep learning. | 1,729 | 
|    |  A PyTorch implementation of semantic segmentation models with support for multiprocessing training and various backbones. | 1,347 | 
|    |  Monorepo implementing PyTorch-based neural network architecture for image segmentation | 1,787 | 
|    |  A Python wrapper around a Java library for segmenting Thai text into individual words | 3 | 
|    |  A tokenizer for segmenting words into morphological components | 27 | 
|    |  Lemmatization tool for natural language processing | 146 | 
|    |  A Ruby library providing sentence segmentation rules based on the SRX standard for English language text processing. | 18 | 
|    |  A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text | 92 | 
|    |  A Japanese morphological analyzer that splits words into grammatical components and segments phrases for efficient text processing | 833 | 
|    |  A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. | 51 | 
|    |  A tool for automatically detecting sentence boundaries in natural language text using machine learning and handcrafted features. | 90 |