ZEN

Chinese text encoder

A pre-trained BERT-based Chinese text encoder with enhanced N-gram representations

A BERT-based Chinese Text Encoder Enhanced by N-gram Representations

GitHub

645 stars

22 watching

104 forks

Language: Python

last commit: about 3 years ago

Related projects:

Repository	Description	Stars
zminghua/sentencoding	A software package providing tools to encode and process text data using a specific neural network architecture.	16
soloice/chinese-character-recognition	This project demonstrates how to build and train a convolutional neural network (CNN) to recognize Chinese characters.	200
cluebenchmark/cluepretrainedmodels	Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models.	806
yxuansu/tacl	Improves pre-trained language models by encouraging an isotropic and discriminative distribution of token representations.	92
taosir/cnn_handwritten_chinese_recognition	A Python-based web application that recognizes handwritten Chinese characters using a Convolutional Neural Network (CNN), allowing users to input text via an online writing board and receive recognition results.	511
zhangxiann/skip-gram	A Python implementation of a neural network model for learning word embeddings from text data	6
xujiajun/gotokenizer	A tokenizer based on dictionary and Bigram language models for text segmentation in Chinese	21
lonepatient/nezha_chinese_pytorch	An implementation of a Chinese language model using PyTorch and transformer architecture.	262
bootphon/phonemizer	Converts text to phonetic transcriptions in multiple languages using various backends and algorithms	1,249
lxneng/xpinyin	A Python library for translating Chinese characters to pinyin	826
sy-xuan/pink	This project enables multi-modal language models to understand and generate text about visual content using referential comprehension.	79
zhegan27/convsent	Trains an autoencoder to learn generic sentence representations using convolutional neural networks	34
brightmart/xlnet_zh	Trains a large Chinese language model on massive data and provides a pre-trained model for downstream tasks	230
zhuiyitechnology/wobert	A Word-based Chinese BERT model trained on large-scale text data using pre-trained models as a foundation	460
ymcui/chinese-xlnet	Provides pre-trained models for Chinese natural language processing tasks using the XLNet architecture	1,652