OmegaT-hfst-tokenizer

Tokenizer

Tool providing fst-based tokenization for natural language processing applications

OmegaT-hfst-tokenizer provides fst-based tokenisation in OmegaT

GitHub

2 stars
7 watching
0 forks
Language: Java
last commit: over 4 years ago
Linked from 1 awesome list

finite-state-machinelemmatizerminority-languagemorphological-analysisnatural-languageomegat

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
zseder/huntoken A tool for tokenizing raw text into words and sentences in multiple languages. 3
tavurth/godot-fft An implementation of the Fast Fourier Transform algorithm in GDScript 39
languagemachines/ucto A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing 65
kylef/jsonwebtoken.swift Provides an implementation of JSON Web Tokens in Swift 762
thisiscetin/textoken A gem for extracting words from text with customizable tokenization rules 31
jonsafari/tok-tok A fast and simple tokenizer for multiple languages 28
namin/dot Mechanized proof of soundness for a type-theoretic foundation for languages like Scala 154
dtolm/vkfft A fast Fourier transform library designed to accelerate multidimensional mathematical operations on GPUs 1,549
dlang-community/dfmt A tool for formatting D source code according to specific styles and conventions. 204
c4n/pythonlexto A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications. 1
kassane/fmt A modern formatting library with a fast and safe API for string formatting 0
xujiajun/gotokenizer A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese 21
amir-zeldes/rftokenizer A tokenizer for segmenting words into morphological components 27
zurawiki/tiktoken-rs Provides a Rust library for tokenizing text with OpenAI models using tiktoken. 256
younghjung/onlinemlrboostingwithvfdt An implementation of online multi-label ranking boosting using VFDT as weak learners 4