jieba
Chinese tokenizer
A comprehensive Python library for Chinese text segmentation and word extraction.
结巴中文分词
33k stars
1k watching
7k forks
Language: Python
last commit: 7 months ago
Linked from 5 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| An Android implementation of the Chinese word segmentation algorithm jieba, optimized for fast initialization and tokenization | 153 |
| A PHP module for Chinese text segmentation and word breaking | 1,331 |
| Provides a Ruby port of the popular Chinese language processing library Jieba | 8 |
| A Google Colab setup for cracking hashes using multiple tools | 929 |
| Automates real-time market data retrieval and storage from Huobi exchange, publishing updates to Redis for use in backtesting and analysis. | 39 |
| A JavaScript library designed to simplify Georgian keyboard layout support | 57 |
| Analyzes and prints useful information from IPA files used in iOS app development. | 10 |
| Generates stubs for interfaces in code completion tools | 60 |
| A tool to compress and remove unnecessary migration history from database schema | 1,499 |
| Converts packet capture files to usable hashes for Hashcat or John the Ripper analysis. | 2,039 |
| Optimizes internationalization text files by reducing bundle size through code substitution | 14 |
| Generates G-Code files for laser cutting solder paste stencils in KiCAD PCBs. | 16 |
| Analyzes and dumps memory to extract sensitive information from running processes | 582 |
| An enhancement tool for Ghidra's binary analysis capabilities | 289 |
| A tool to scan and discover Homematic devices on a network. | 6 |