morfessor
segmenter
A tool for unsupervised and semi-supervised morphological segmentation in text data
Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
186 stars
23 watching
29 forks
Language: Python
last commit: over 4 years ago
Linked from 1 awesome list
pythonsegmentationsubword-segmentationsubword-units
Related projects:
Repository | Description | Stars |
---|---|---|
recski/hunparse | An NLTK-based parser that provides morphological annotation for languages using KR-style annotations. | 4 |
diasks2/pragmatic_segmenter | A rule-based sentence boundary detection gem that works across many languages | 559 |
machinalis/yalign | Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
fnl/segtok | Provides tools for splitting text into sentences and words | 171 |
zijundeng/pytorch-semantic-segmentation | Provides PyTorch implementations of various models and pipelines for semantic segmentation in deep learning. | 1,729 |
hszhao/semseg | A PyTorch implementation of semantic segmentation models with support for multiprocessing training and various backbones. | 1,347 |
nvidia/semantic-segmentation | Monorepo implementing PyTorch-based neural network architecture for image segmentation | 1,787 |
remixman/pythonlexto | A Python wrapper around a Java library for segmenting Thai text into individual words | 3 |
amir-zeldes/rftokenizer | A tokenizer for segmenting words into morphological components | 27 |
adbar/simplemma | Lemmatization tool for natural language processing | 146 |
apohllo/srx-english | A Ruby library providing sentence segmentation rules based on the SRX standard for English language text processing. | 18 |
lfcipriani/punkt-segmenter | A Ruby port of the NLTK algorithm to detect sentence boundaries in unstructured text | 92 |
ikawaha/kagome | A Japanese morphological analyzer that splits words into grammatical components and segments phrases for efficient text processing | 833 |
louismullie/scalpel | A Ruby library that uses a simple rule-based approach to segment sentences into individual words or phrases. | 51 |
cslu-nlp/detectormorse | A tool for automatically detecting sentence boundaries in natural language text using machine learning and handcrafted features. | 90 |