OmegaT-hfst-tokenizer
Tokenizer
Tool providing fst-based tokenization for natural language processing applications
OmegaT-hfst-tokenizer provides fst-based tokenisation in OmegaT
2 stars
7 watching
0 forks
Language: Java
last commit: almost 5 years ago
Linked from 1 awesome list
finite-state-machinelemmatizerminority-languagemorphological-analysisnatural-languageomegat
Related projects:
Repository | Description | Stars |
---|---|---|
| A tool for tokenizing raw text into words and sentences in multiple languages, including Hungarian. | 4 |
| An implementation of the Fast Fourier Transform algorithm in GDScript | 39 |
| A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 66 |
| Provides an implementation of JSON Web Tokens in Swift | 762 |
| A gem for extracting words from text with customizable tokenization rules | 31 |
| A fast and simple tokenizer for multiple languages | 28 |
| Mechanized proof of soundness for a type-theoretic foundation for languages like Scala | 155 |
| A fast Fourier transform library designed to accelerate multidimensional mathematical operations on GPUs | 1,562 |
| A tool for formatting D source code according to specific styles and conventions. | 204 |
| A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
| A modern formatting library with a fast and safe API for string formatting | 0 |
| A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
| A tokenizer for segmenting words into morphological components | 27 |
| Provides a Rust library for tokenizing text with OpenAI models using tiktoken. | 266 |
| An implementation of online multi-label ranking boosting using VFDT as weak learners | 4 |