jieba

Chinese tokenizer

A comprehensive Python library for Chinese text segmentation and word extraction.

结巴中文分词

GitHub

33k stars
1k watching
7k forks
Language: Python
last commit: 3 months ago
Linked from 5 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
452896915/jieba-android An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization 152
fukuball/jieba-php A PHP module for Chinese text segmentation and word breaking 1,323
mimosa/jieba-jruby Provides a Ruby port of the popular Chinese language processing library Jieba 8
mxrch/penglab A Google Colab setup for cracking hashes using multiple tools 925
mmmaaaggg/ibats_huobifeeder_old Automates real-time market data retrieval and storage from Huobi exchange, publishing updates to Redis for use in backtesting and analysis. 39
ioseb/geokbd A JavaScript library designed to simplify Georgian keyboard layout support 58
toshi0383/ipanema Analyzes and prints useful information from IPA files used in iOS app development. 10
edolphin-ydf/goimpl.nvim Generates stubs for interfaces in code completion tools 57
jalkoby/squasher A tool to compress and remove unnecessary migration history from database schema 1,496
zerbea/hcxtools Converts packet capture files to usable hashes for Hashcat or John the Ripper analysis. 2,014
hustcc/babel-plugin-optimize-i18n Optimizes internationalization text files by reducing bundle size through code substitution 14
ma-ha/kicad-laser-stencil-plugin Generates G-Code files for laser cutting solder paste stencils in KiCAD PCBs. 16
rek7/mxtract Analyzes and dumps memory to extract sensitive information from running processes 582
reb311ion/replica An enhancement tool for Ghidra's binary analysis capabilities 287
hobbyquaker/hm-discover A tool to scan and discover Homematic devices on a network. 6