Chinese-Word-Vectors

Word vectors

Provides pre-trained vectors with various properties for downstream tasks in natural language processing

100+ Chinese Word Vectors 上百种预训练中文词向量

GitHub

12k stars
285 watching
2k forks
Language: Python
last commit: about 1 year ago
chinesechinese-word-segmentationembeddingembeddingsvectors-trainedword-embeddings

Related projects:

Repository Description Stars
dalinvip/cw2vec A software framework for learning Chinese word embeddings with stroke n-gram information 274
zhezhaoa/ngram2vec A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics 846
cluebenchmark/cluepretrainedmodels Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models. 804
vzhong/embeddings Provides fast and efficient word embeddings for natural language processing. 223
hkust-knowcomp/jwe This is a software project that trains and evaluates word embeddings for Chinese words, characters, and fine-grained subcharacter components. 99
chengyuegongr/frequency-agnostic Improves word embeddings by using adversarial training to make them less dependent on word frequencies 118
uhh-lt/sensegram Tools and techniques for analyzing word meanings from word embeddings 212
brightmart/text_classification An NLP project offering various text classification models and techniques for deep learning exploration 7,861
cluebenchmark/cluecorpus2020 A large-scale pre-training corpus for Chinese language models 925
hit-scir/chinese-mixtral-8x7b An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary. 641
plasticityai/magnitude A fast and efficient utility package for utilizing vector embeddings in machine learning models 1,627
kyubyong/wordvectors Provides pre-trained word vectors for multiple languages to facilitate NLP tasks 2,215
malllabiisc/wordgcn A deep learning model that generates word embeddings by predicting words based on their dependency context 290
xiaoqijiao/coling2018 Provides training and testing code for a CNN-based sentence embedding model 2
dccuchile/spanish-word-embeddings A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. 356