cstlemma
Lemmatizer tool
A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.
Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
36 stars
6 watching
7 forks
Language: C++
last commit: over 1 year ago
Linked from 1 awesome list
affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | Lemmatization tool for natural language processing | 146 |
| | Lemmatizer for Danish and Swedish languages | 76 |
| | A Ruby library that provides a lemmatizer for text in English. | 108 |
| | Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. | 18 |
| | A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure | 16 |
| | An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation | 21 |
| | An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets | 104 |
| | An analysis tool for breaking down words into their component parts in the Kinyarwanda language | 6 |
| | A package providing contextual formatting functions with similar usage to the fmt package. | 103 |
| | A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations. | 846 |
| | Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
| | A tool to format Haskell source code according to certain formatting rules | 691 |
| | Tool providing fst-based tokenization for natural language processing applications | 2 |
| | A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency. | 1,123 |