fastText_multilingual

Multilingual word embeddings

A repository providing aligned multilingual word vectors for 78 languages using the SVD method.

Multilingual word vectors in 78 languages

GitHub

1k stars
55 watching
121 forks
Language: Jupyter Notebook
last commit: over 1 year ago
Linked from 4 awesome lists

distributed-representationsmachine-learningmachine-translationnatural-language-processingnlpword-vectors

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
benathi/multisense-prob-fasttext An implementation of a probabilistic FastText model for multi-sense word embeddings 149
botcenter/spanishwordembeddings This project generates Spanish word embeddings using fastText on large corpora. 9
kyubyong/wordvectors Provides pre-trained word vectors for multiple languages to facilitate NLP tasks 2,215
talschuster/crosslingualcontextualemb Enables alignment of word embeddings across multiple languages to facilitate cross-lingual text analysis and machine learning tasks 98
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 475
bigredt/vico Multi-sense word embeddings learned from visual cooccurrences 25
juliatext/embeddings.jl Provides access to pre-trained word embeddings for NLP tasks. 81
bheinzerling/bpemb A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks. 1,184
galuhsahid/indonesian-word-embedding Demonstrates word embedding in Indonesian language using pre-trained Word2vec models 20
hit-scir/elmoformanylangs Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. 1,463
dccuchile/spanish-word-embeddings A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. 356
uw-madison-lee-lab/cobsat Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks 28
neulab/pangea An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts 91
untra/polyglot A plugin for Jekyll blogs that enables support for multiple languages and internationalization. 417
microsoft/unicoder This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. 88