OmegaT-hfst-tokenizer

Tokenizer

Tool providing fst-based tokenization for natural language processing applications

OmegaT-hfst-tokenizer provides fst-based tokenisation in OmegaT

GitHub

2 stars

7 watching

0 forks

Language: Java

last commit: over 6 years ago

Linked from 1 awesome list

finite-state-machinelemmatizerminority-languagemorphological-analysisnatural-languageomegat

Backlinks from these awesome lists:

richardlitt/low-resource-languages

Related projects:

Repository	Description	Stars
zseder/huntoken	A tool for tokenizing raw text into words and sentences in multiple languages, including Hungarian.	4
tavurth/godot-fft	An implementation of the Fast Fourier Transform algorithm in GDScript	39
languagemachines/ucto	A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing	66
kylef/jsonwebtoken.swift	Provides an implementation of JSON Web Tokens in Swift	762
thisiscetin/textoken	A gem for extracting words from text with customizable tokenization rules	31
jonsafari/tok-tok	A fast and simple tokenizer for multiple languages	28
namin/dot	Mechanized proof of soundness for a type-theoretic foundation for languages like Scala	155
dtolm/vkfft	A fast Fourier transform library designed to accelerate multidimensional mathematical operations on GPUs	1,562
dlang-community/dfmt	A tool for formatting D source code according to specific styles and conventions.	204
c4n/pythonlexto	A Python wrapper around the Thai word segmentator LexTo, allowing developers to easily integrate it into their applications.	1
kassane/fmt	A modern formatting library with a fast and safe API for string formatting	0
xujiajun/gotokenizer	A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese	21
amir-zeldes/rftokenizer	A tokenizer for segmenting words into morphological components	27
zurawiki/tiktoken-rs	Provides a Rust library for tokenizing text with OpenAI models using tiktoken.	266
younghjung/onlinemlrboostingwithvfdt	An implementation of online multi-label ranking boosting using VFDT as weak learners	4