cstlemma
Lemmatizer tool
A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.
Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
35 stars
6 watching
7 forks
Language: C++
last commit: 4 months ago
Linked from 1 awesome list
affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix
Related projects:
Repository | Description | Stars |
---|---|---|
adbar/simplemma | Lemmatization tool for natural language processing | 145 |
sorenlind/lemmy | Lemmatizer for Danish and Swedish languages | 75 |
yohasebe/lemmatizer | A Ruby library that provides a lemmatizer for text in English. | 108 |
ixa-ehu/ixa-pipe-pos | Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. | 17 |
justlucdewit/cod | A language and compiler that supports a unique concatenative stack-based programming paradigm. | 24 |
mnsignalprocessing/barelyml | A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure | 15 |
ldmt-muri/alignment-with-openfst | An implementation of a CRF autoencoder framework for aligning text data | 21 |
lemire/dictionary | An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets | 103 |
ldmt-muri/kin-morph-fst | An analysis tool for breaking down words into their component parts in the Kinyarwanda language | 6 |
mingrammer/cfmt | A package providing contextual formatting functions with similar usage to the fmt package. | 103 |
liyuanlucasliu/lm-lstm-crf | A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations. | 846 |
lowresourcelanguages/hltdi-morphology | Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
lspitzner/brittany | A tool to format Haskell source code according to certain formatting rules | 692 |
divvun/omegat-hfst-tokenizer | Tool providing fst-based tokenization for natural language processing applications | 2 |
weavejester/cljfmt | A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency. | 1,116 |