bpemb
Subword embeddings
A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
1k stars
28 watching
101 forks
Language: Python
last commit: 5 months ago
Linked from 2 awesome lists
embeddingsmultilingualnatural-language-processingnlpsubword-embeddings
Related projects:
Repository | Description | Stars |
---|---|---|
| Provides fast and efficient word embeddings for natural language processing. | 223 |
| This project generates Spanish word embeddings using fastText on large corpora. | 9 |
| A project implementing a method to incorporate morphological information into word embeddings using a neural network model | 52 |
| Provides tools and benchmarks for evaluating text embedding models | 2,021 |
| Provides pre-trained ELMo representations for multiple languages to improve NLP tasks. | 1,462 |
| A method to generate sentence embeddings from pre-trained language models | 178 |
| An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks. | 4 |
| Trains word embeddings from a paraphrase database to represent semantic relationships between words. | 30 |
| Software component providing efficient word representation using hash embeddings | 42 |
| This repository provides an implementation of a bipartite network embedding algorithm for collaborative filtering and link prediction tasks. | 227 |
| A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. | 354 |
| Transforms existing word embeddings into more interpretable ones by applying a novel extension of k-sparse autoencoder with stricter sparsity constraints | 52 |
| Provides methods for evaluating word embeddings on various benchmarks | 437 |
| A toolkit for learning high-quality word and text representations from ngram co-occurrence statistics | 848 |
| A pre-trained word embedding model informed by a large-scale knowledge graph, providing a nuanced representation of word meanings. | 1,296 |