jieba
Chinese tokenizer
A comprehensive Python library for Chinese text segmentation and word extraction.
结巴中文分词
33k stars
1k watching
7k forks
Language: Python
last commit: 3 months ago
Linked from 5 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
452896915/jieba-android | An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization | 152 |
fukuball/jieba-php | A PHP module for Chinese text segmentation and word breaking | 1,323 |
mimosa/jieba-jruby | Provides a Ruby port of the popular Chinese language processing library Jieba | 8 |
mxrch/penglab | A Google Colab setup for cracking hashes using multiple tools | 925 |
mmmaaaggg/ibats_huobifeeder_old | Automates real-time market data retrieval and storage from Huobi exchange, publishing updates to Redis for use in backtesting and analysis. | 39 |
ioseb/geokbd | A JavaScript library designed to simplify Georgian keyboard layout support | 58 |
toshi0383/ipanema | Analyzes and prints useful information from IPA files used in iOS app development. | 10 |
edolphin-ydf/goimpl.nvim | Generates stubs for interfaces in code completion tools | 57 |
jalkoby/squasher | A tool to compress and remove unnecessary migration history from database schema | 1,496 |
zerbea/hcxtools | Converts packet capture files to usable hashes for Hashcat or John the Ripper analysis. | 2,014 |
hustcc/babel-plugin-optimize-i18n | Optimizes internationalization text files by reducing bundle size through code substitution | 14 |
ma-ha/kicad-laser-stencil-plugin | Generates G-Code files for laser cutting solder paste stencils in KiCAD PCBs. | 16 |
rek7/mxtract | Analyzes and dumps memory to extract sensitive information from running processes | 582 |
reb311ion/replica | An enhancement tool for Ghidra's binary analysis capabilities | 287 |
hobbyquaker/hm-discover | A tool to scan and discover Homematic devices on a network. | 6 |