tokenizer

Tokenizer

A high-performance tokenization library for Go, capable of parsing various data formats and syntaxes.

Tokenizer (lexer) for golang

GitHub

98 stars
2 watching
6 forks
Language: Go
last commit: 15 days ago
Linked from 2 awesome lists

golanglexerparseparsertokenizertokenizing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
diasks2/pragmatic_tokenizer A multilingual tokenizer to split strings into tokens, handling various language and formatting nuances. 90
xujiajun/gotokenizer A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese 21
shonfeder/tokenize A Prolog-based tokenization library for lexing text into common tokens 11
denosaurs/tokenizer A simple tokenizer library for parsing and analyzing text input in various formats. 17
jonsafari/tok-tok A fast and simple tokenizer for multiple languages 28
juliatext/wordtokenizers.jl A set of high-performance tokenizers for natural language processing tasks 96
arbox/tokenizer A Ruby-based library for splitting written text into tokens for natural language processing tasks. 46
andrewrk/xml A library that tokenizes XML data into smaller units for easier processing 24
zseder/huntoken A tool for tokenizing raw text into words and sentences in multiple languages. 3
abitdodgy/words_counted A Ruby library that tokenizes input and provides various statistical measures about the tokens 159
thisiscetin/textoken A gem for extracting words from text with customizable tokenization rules 31
neurosnap/sentences A command line tool to split text into individual sentences 439
gorilla/css A utility for parsing and breaking down CSS3 code into smaller components 87
goccmack/gocc A tool for generating lexers and parsers from a BNF file with semantic actions. 615
jirkamarsik/trainable-tokenizer A tool for creating customizable tokenization rules for natural languages 22