cstlemma
Lemmatizer tool
A lemmatiser tool for multiple languages using affix rules and supervised learning from full-form dictionaries.
Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
36 stars
6 watching
7 forks
Language: C++
last commit: 8 months ago
Linked from 1 awesome list
affixdutchgermaninfixlemmatiserlemmatizerprefixsuffix
Related projects:
Repository | Description | Stars |
---|---|---|
| Lemmatization tool for natural language processing | 146 |
| Lemmatizer for Danish and Swedish languages | 76 |
| A Ruby library that provides a lemmatizer for text in English. | 108 |
| Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. | 18 |
| A markup language that combines elements from Markdown and DokuWiki to display text with formatting and structure | 16 |
| An implementation of the CRF autoencoder framework for tasks in natural language processing and machine translation | 21 |
| An optimized C++ implementation of dictionary coding using SIMD instructions for efficient compression and decompression of large datasets | 104 |
| An analysis tool for breaking down words into their component parts in the Kinyarwanda language | 6 |
| A package providing contextual formatting functions with similar usage to the fmt package. | 103 |
| A PyTorch-based tool for sequence labeling using a combination of CRF and LSTM models to capture label dependencies and leverage contextualized representations. | 846 |
| Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
| A tool to format Haskell source code according to certain formatting rules | 691 |
| Tool providing fst-based tokenization for natural language processing applications | 2 |
| A tool that detects and fixes formatting errors in Clojure code to improve readability and consistency. | 1,123 |