bpemb

Subword embeddings

A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

GitHub

1k stars
28 watching
101 forks
Language: Python
last commit: 2 months ago
Linked from 2 awesome lists

embeddingsmultilingualnatural-language-processingnlpsubword-embeddings

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
vzhong/embeddings Provides fast and efficient word embeddings for natural language processing. 223
botcenter/spanishwordembeddings This project generates Spanish word embeddings using fastText on large corpora. 9
rguthrie3/morphologicalpriorsforwordembeddings A project implementing a method to incorporate morphological information into word embeddings using a neural network model 52
embeddings-benchmark/mteb Provides tools and benchmarks for evaluating text embedding models 1,992
hit-scir/elmoformanylangs Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. 1,463
binwang28/sbert-wk-sentence-embedding A method to generate sentence embeddings from pre-trained language models 177
ermlab/polish-word-embeddings-review An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks. 4
jwieting/paragram-word Trains word embeddings from a paraphrase database to represent semantic relationships between words. 30
dsv77/hashembedding Software component providing efficient word representation using hash embeddings 42
clhchtcjj/bine This repository provides an implementation of a bipartite network embedding algorithm for collaborative filtering and link prediction tasks. 227
dccuchile/spanish-word-embeddings A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. 356
harsh19/spine Transforms existing word embeddings into more interpretable ones by applying a novel extension of k-sparse autoencoder with stricter sparsity constraints 52
kudkudak/word-embeddings-benchmarks Provides methods for evaluating word embeddings on various benchmarks 437
zhezhaoa/ngram2vec A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics 846
commonsense/conceptnet-numberbatch A pre-trained word embedding model informed by a large-scale knowledge graph, providing a nuanced representation of word meanings. 1,296