cstlemma

Lemmatizer tool

A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.

GitHub

35 stars
6 watching
7 forks
Language: C++
last commit: 4 months ago
Linked from 1 awesome list

affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
adbar/simplemma Lemmatization tool for natural language processing 145
sorenlind/lemmy Lemmatizer for Danish and Swedish languages 75
yohasebe/lemmatizer A Ruby library that provides a lemmatizer for text in English. 108
ixa-ehu/ixa-pipe-pos Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. 17
justlucdewit/cod A language and compiler that supports a unique concatenative stack-based programming paradigm. 24
mnsignalprocessing/barelyml A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure 15
ldmt-muri/alignment-with-openfst An implementation of a CRF autoencoder framework for aligning text data 21
lemire/dictionary An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets 103
ldmt-muri/kin-morph-fst An analysis tool for breaking down words into their component parts in the Kinyarwanda language 6
mingrammer/cfmt A package providing contextual formatting functions with similar usage to the fmt package. 103
liyuanlucasliu/lm-lstm-crf A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations. 846
lowresourcelanguages/hltdi-morphology Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. 5
lspitzner/brittany A tool to format Haskell source code according to certain formatting rules 692
divvun/omegat-hfst-tokenizer Tool providing fst-based tokenization for natural language processing applications 2
weavejester/cljfmt A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency. 1,116