OmegaT-hfst-tokenizer
Tokenizer
Tool providing fst-based tokenization for natural language processing applications
OmegaT-hfst-tokenizer provides fst-based tokenisation in OmegaT
2 stars
7 watching
0 forks
Language: Java
last commit: over 4 years ago
Linked from 1 awesome list
finite-state-machinelemmatizerminority-languagemorphological-analysisnatural-languageomegat
Related projects:
Repository | Description | Stars |
---|---|---|
zseder/huntoken | A tool for tokenizing raw text into words and sentences in multiple languages. | 3 |
tavurth/godot-fft | An implementation of the Fast Fourier Transform algorithm in GDScript | 39 |
languagemachines/ucto | A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing | 65 |
kylef/jsonwebtoken.swift | Provides an implementation of JSON Web Tokens in Swift | 762 |
thisiscetin/textoken | A gem for extracting words from text with customizable tokenization rules | 31 |
jonsafari/tok-tok | A fast and simple tokenizer for multiple languages | 28 |
namin/dot | Mechanized proof of soundness for a type-theoretic foundation for languages like Scala | 154 |
dtolm/vkfft | A fast Fourier transform library designed to accelerate multidimensional mathematical operations on GPUs | 1,549 |
dlang-community/dfmt | A tool for formatting D source code according to specific styles and conventions. | 204 |
c4n/pythonlexto | A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. | 1 |
kassane/fmt | A modern formatting library with a fast and safe API for string formatting | 0 |
xujiajun/gotokenizer | A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese | 21 |
amir-zeldes/rftokenizer | A tokenizer for segmenting words into morphological components | 27 |
zurawiki/tiktoken-rs | Provides a Rust library for tokenizing text with OpenAI models using tiktoken. | 256 |
younghjung/onlinemlrboostingwithvfdt | An implementation of online multi-label ranking boosting using VFDT as weak learners | 4 |