bpemb
Subword embeddings
A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
1k stars
28 watching
101 forks
Language: Python
last commit: 2 months ago
Linked from 2 awesome lists
embeddingsmultilingualnatural-language-processingnlpsubword-embeddings
Related projects:
Repository | Description | Stars |
---|---|---|
vzhong/embeddings | Provides fast and efficient word embeddings for natural language processing. | 223 |
botcenter/spanishwordembeddings | This project generates Spanish word embeddings using fastText on large corpora. | 9 |
rguthrie3/morphologicalpriorsforwordembeddings | A project implementing a method to incorporate morphological information into word embeddings using a neural network model | 52 |
embeddings-benchmark/mteb | Provides tools and benchmarks for evaluating text embedding models | 1,992 |
hit-scir/elmoformanylangs | Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. | 1,463 |
binwang28/sbert-wk-sentence-embedding | A method to generate sentence embeddings from pre-trained language models | 177 |
ermlab/polish-word-embeddings-review | An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks. | 4 |
jwieting/paragram-word | Trains word embeddings from a paraphrase database to represent semantic relationships between words. | 30 |
dsv77/hashembedding | Software component providing efficient word representation using hash embeddings | 42 |
clhchtcjj/bine | This repository provides an implementation of a bipartite network embedding algorithm for collaborative filtering and link prediction tasks. | 227 |
dccuchile/spanish-word-embeddings | A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. | 356 |
harsh19/spine | Transforms existing word embeddings into more interpretable ones by applying a novel extension of k-sparse autoencoder with stricter sparsity constraints | 52 |
kudkudak/word-embeddings-benchmarks | Provides methods for evaluating word embeddings on various benchmarks | 437 |
zhezhaoa/ngram2vec | A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics | 846 |
commonsense/conceptnet-numberbatch | A pre-trained word embedding model informed by a large-scale knowledge graph, providing a nuanced representation of word meanings. | 1,296 |