ZEN

Chinese text encoder

A pre-trained BERT-based Chinese text encoder with enhanced N-gram representations

A BERT-based Chinese Text Encoder Enhanced by N-gram Representations

GitHub

643 stars
22 watching
104 forks
Language: Python
last commit: over 2 years ago

Related projects:

Repository Description Stars
zminghua/sentencoding A software package providing tools to encode and process text data using a specific neural network architecture. 16
soloice/chinese-character-recognition This project demonstrates how to build and train a convolutional neural network (CNN) to recognize Chinese characters. 200
cluebenchmark/cluepretrainedmodels Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. 804
yxuansu/tacl Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations. 92
taosir/cnn_handwritten_chinese_recognition A Python-based web application that recognizes handwritten Chinese characters using a Convolutional Neural Network (CNN), allowing users to input text via an online writing board and receive recognition results. 508
zhangxiann/skip-gram A Python implementation of a neural network model for learning word embeddings from text data 6
xujiajun/gotokenizer A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese 21
lonepatient/nezha_chinese_pytorch An implementation of a Chinese language model using PyTorch and transformer architecture. 262
bootphon/phonemizer Converts text to phonetic transcriptions in multiple languages using various backends and algorithms 1,231
lxneng/xpinyin A Python library for translating Chinese characters to pinyin 823
sy-xuan/pink This project enables multi-modal language models to understand and generate text about visual content using referential comprehension. 76
zhegan27/convsent Trains an autoencoder to learn generic sentence representations using convolutional neural networks 34
brightmart/xlnet_zh Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks 230
zhuiyitechnology/wobert A pre-trained Chinese language model that uses word embeddings and is designed to process Chinese text 458
ymcui/chinese-xlnet Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture 1,653