bpemb

Subword embeddings

A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

GitHub

1k stars

28 watching

101 forks

Language: Python

last commit: 10 months ago

Linked from 2 awesome lists

embeddingsmultilingualnatural-language-processingnlpsubword-embeddings

Screenshot of bheinzerling/bpemb website

nlp.h-its.org/bpemb

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
vzhong/embeddings	Provides fast and efficient word embeddings for natural language processing.	223
botcenter/spanishwordembeddings	This project generates Spanish word embeddings using fastText on large corpora.	9
rguthrie3/morphologicalpriorsforwordembeddings	A project implementing a method to incorporate morphological information into word embeddings using a neural network model	52
embeddings-benchmark/mteb	Provides tools and benchmarks for evaluating text embedding models	2,021
hit-scir/elmoformanylangs	Provides pre-trained ELMo representations for multiple languages to improve NLP tasks.	1,462
binwang28/sbert-wk-sentence-embedding	A method to generate sentence embeddings from pre-trained language models	178
ermlab/polish-word-embeddings-review	An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks.	4
jwieting/paragram-word	Trains word embeddings from a paraphrase database to represent semantic relationships between words.	30
dsv77/hashembedding	Software component providing efficient word representation using hash embeddings	42
clhchtcjj/bine	This repository provides an implementation of a bipartite network embedding algorithm for collaborative filtering and link prediction tasks.	227
dccuchile/spanish-word-embeddings	A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods.	354
harsh19/spine	Transforms existing word embeddings into more interpretable ones by applying a novel extension of k-sparse autoencoder with stricter sparsity constraints	52
kudkudak/word-embeddings-benchmarks	Provides methods for evaluating word embeddings on various benchmarks	437
zhezhaoa/ngram2vec	A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics	848
commonsense/conceptnet-numberbatch	A pre-trained word embedding model informed by a large-scale knowledge graph, providing a nuanced representation of word meanings.	1,296