cstlemma

Lemmatizer tool

A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.

GitHub

36 stars
6 watching
7 forks
Language: C++
last commit: 8 months ago
Linked from 1 awesome list

affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
adbar/simplemma Lemmatization tool for natural language processing 146
sorenlind/lemmy Lemmatizer for Danish and Swedish languages 76
yohasebe/lemmatizer A Ruby library that provides a lemmatizer for text in English. 108
ixa-ehu/ixa-pipe-pos Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. 18
mnsignalprocessing/barelyml A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure 16
ldmt-muri/alignment-with-openfst An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation 21
lemire/dictionary An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets 104
ldmt-muri/kin-morph-fst An analysis tool for breaking down words into their component parts in the Kinyarwanda language 6
mingrammer/cfmt A package providing contextual formatting functions with similar usage to the fmt package. 103
liyuanlucasliu/lm-lstm-crf A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations. 846
lowresourcelanguages/hltdi-morphology Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. 5
lspitzner/brittany A tool to format Haskell source code according to certain formatting rules 691
divvun/omegat-hfst-tokenizer Tool providing fst-based tokenization for natural language processing applications 2
weavejester/cljfmt A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency. 1,123