Chinese-Word-Vectors

Word vectors

Provides pre-trained vectors with various properties for downstream tasks in natural language processing

100+ Chinese Word Vectors 上百种预训练中文词向量

GitHub

12k stars

285 watching

2k forks

Language: Python

last commit: over 2 years ago

chinesechinese-word-segmentationembeddingembeddingsvectors-trainedword-embeddings

Related projects:

Repository	Description	Stars
dalinvip/cw2vec	A software framework for learning Chinese word embeddings with stroke n-gram information	274
zhezhaoa/ngram2vec	A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics	848
cluebenchmark/cluepretrainedmodels	Provides pre-trained models for Chinese language tasks with improved performance and smaller model sizes compared to existing models.	806
vzhong/embeddings	Provides fast and efficient word embeddings for natural language processing.	223
hkust-knowcomp/jwe	This is a software project that trains and evaluates word embeddings for Chinese words, characters, and fine-grained subcharacter components.	99
chengyuegongr/frequency-agnostic	Improves word embeddings by training with adversarial objectives	118
uhh-lt/sensegram	Tools and techniques for analyzing word meanings from word embeddings	212
brightmart/text_classification	An NLP project offering various text classification models and techniques for deep learning exploration	7,881
cluebenchmark/cluecorpus2020	A large-scale Chinese corpus for pre-training language models.	927
hit-scir/chinese-mixtral-8x7b	An implementation of a large language model for Chinese text processing, focusing on MoE (Multi-Headed Attention) architecture and incorporating a vast vocabulary.	645
plasticityai/magnitude	A fast and efficient utility package for utilizing vector embeddings in machine learning models	1,635
kyubyong/wordvectors	Provides pre-trained word vectors for multiple languages to facilitate NLP tasks	2,216
malllabiisc/wordgcn	A deep learning model that generates word embeddings by predicting words based on their dependency context	291
xiaoqijiao/coling2018	Provides training and testing code for a CNN-based sentence embedding model	2
dccuchile/spanish-word-embeddings	A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods.	354