jieba

Chinese tokenizer

A comprehensive Python library for Chinese text segmentation and word extraction.

结巴中文分词

33k stars

1k watching

7k forks

Language: Python

last commit: 11 months ago

Linked from 5 awesome lists

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
452896915/jieba-android	An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization	153
fukuball/jieba-php	A PHP module for Chinese text segmentation and word breaking	1,331
mimosa/jieba-jruby	Provides a Ruby port of the popular Chinese language processing library Jieba	8
mxrch/penglab	A Google Colab setup for cracking hashes using multiple tools	929
mmmaaaggg/ibats_huobifeeder_old	Automates real-time market data retrieval and storage from Huobi exchange, publishing updates to Redis for use in backtesting and analysis.	39
ioseb/geokbd	A JavaScript library designed to simplify Georgian keyboard layout support	57
toshi0383/ipanema	Analyzes and prints useful information from IPA files used in iOS app development.	10
edolphin-ydf/goimpl.nvim	Generates stubs for interfaces in code completion tools	60
jalkoby/squasher	A tool to compress and remove unnecessary migration history from database schema	1,499
zerbea/hcxtools	Converts packet capture files to usable hashes for Hashcat or John the Ripper analysis.	2,039
hustcc/babel-plugin-optimize-i18n	Optimizes internationalization text files by reducing bundle size through code substitution	14
ma-ha/kicad-laser-stencil-plugin	Generates G-Code files for laser cutting solder paste stencils in KiCAD PCBs.	16
rek7/mxtract	Analyzes and dumps memory to extract sensitive information from running processes	582
reb311ion/replica	An enhancement tool for Ghidra's binary analysis capabilities	289
hobbyquaker/hm-discover	A tool to scan and discover Homematic devices on a network.	6