JWE

Chinese word embedding trainer

This is a software project that trains and evaluates word embeddings for Chinese words, characters, and fine-grained subcharacter components.

Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

GitHub

99 stars
9 watching
33 forks
Language: C
last commit: over 5 years ago

Related projects:

Repository Description Stars
dalinvip/cw2vec A software framework for learning Chinese word embeddings with stroke n-gram information 274
ray1007/gwe A software implementation of a word embedding method using character glyphs, enhancing traditional Chinese language processing 30
cluebenchmark/cluepretrainedmodels Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. 804
hkust-knowcomp/r-net An implementation of R-Net, a machine reading comprehension model using TensorFlow. 578
vzhong/embeddings Provides fast and efficient word embeddings for natural language processing. 223
cluebenchmark/cluecorpus2020 A large-scale pre-training corpus for Chinese language models 925
leonard-xu/cwe Improves word embeddings by considering internal character structures in Chinese words 299
cluebenchmark/electra Trains and evaluates a Chinese language model using adversarial training on a large corpus. 140
jwieting/charagram A tool for training and using character n-gram based word and sentence embeddings in natural language processing. 125
arleyguolei/wx-words-pk A set of tools and components for building Chinese input methods, focusing on character prediction and suggestion algorithms. 886
hanzhenlei767/nlp_learn A comprehensive collection of NLP-related code snippets and notes on various models and techniques, including pre-trained language models and Chinese text processing methods. 25
jwieting/paragram-word Trains word embeddings from a paraphrase database to represent semantic relationships between words. 30
malllabiisc/wordgcn A deep learning model that generates word embeddings by predicting words based on their dependency context 290
zhezhaoa/ngram2vec A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics 846
hassygo/charngram2vec A repository providing a re-implementation of character n-gram embeddings for pre-training in natural language processing tasks 23