cstlemma

Lemmatizer tool

A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.

GitHub

36 stars

6 watching

7 forks

Language: C++

last commit: about 1 year ago

Linked from 1 awesome list

affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix

Backlinks from these awesome lists:

fnielsen/awesome-danish

Related projects:

Repository	Description	Stars
adbar/simplemma	Lemmatization tool for natural language processing	146
sorenlind/lemmy	Lemmatizer for Danish and Swedish languages	76
yohasebe/lemmatizer	A Ruby library that provides a lemmatizer for text in English.	108
ixa-ehu/ixa-pipe-pos	Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models.	18
mnsignalprocessing/barelyml	A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure	16
ldmt-muri/alignment-with-openfst	An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation	21
lemire/dictionary	An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets	104
ldmt-muri/kin-morph-fst	An analysis tool for breaking down words into their component parts in the Kinyarwanda language	6
mingrammer/cfmt	A package providing contextual formatting functions with similar usage to the fmt package.	103
liyuanlucasliu/lm-lstm-crf	A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations.	846
lowresourcelanguages/hltdi-morphology	Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages.	5
lspitzner/brittany	A tool to format Haskell source code according to certain formatting rules	691
divvun/omegat-hfst-tokenizer	Tool providing fst-based tokenization for natural language processing applications	2
weavejester/cljfmt	A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency.	1,123