fastText_multilingual

Multilingual word embeddings

A repository providing aligned multilingual word vectors for 78 languages using the SVD method.

Multilingual word vectors in 78 languages

GitHub

1k stars

55 watching

121 forks

Language: Jupyter Notebook

last commit: over 2 years ago

Linked from 4 awesome lists

distributed-representationsmachine-learningmachine-translationnatural-language-processingnlpword-vectors

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
benathi/multisense-prob-fasttext	An implementation of a probabilistic FastText model for multi-sense word embeddings	148
botcenter/spanishwordembeddings	This project generates Spanish word embeddings using fastText on large corpora.	9
kyubyong/wordvectors	Provides pre-trained word vectors for multiple languages to facilitate NLP tasks	2,216
talschuster/crosslingualcontextualemb	Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks	99
eleutherai/polyglot	Large language models designed to perform well in multiple languages and address performance issues with current multilingual models.	476
bigredt/vico	Multi-sense word embeddings learned from visual cooccurrences	25
juliatext/embeddings.jl	Provides access to pre-trained word embeddings for NLP tasks.	81
bheinzerling/bpemb	A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.	1,189
galuhsahid/indonesian-word-embedding	Demonstrates word embedding in Indonesian language using pre-trained Word2vec models	20
hit-scir/elmoformanylangs	Provides pre-trained ELMo representations for multiple languages to improve NLP tasks.	1,462
dccuchile/spanish-word-embeddings	A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods.	354
uw-madison-lee-lab/cobsat	Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks	30
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92
untra/polyglot	A plugin for Jekyll blogs that enables support for multiple languages and internationalization.	425
microsoft/unicoder	This repository provides pre-trained models and code for understanding and generation tasks in multiple languages.	89