mteb

Text Embedding Benchmark

Provides tools and benchmarks for evaluating text embedding models

MTEB: Massive Text Embedding Benchmark

GitHub

2k stars

15 watching

285 forks

Language: Jupyter Notebook

last commit: over 1 year ago

Linked from 1 awesome list

benchmarkbitext-miningclusteringinformation-retrievalmultilingual-nlpneural-searchrerankingretrievalsbertsemantic-searchsentence-transformerssgptststext-classificationtext-embedding

Screenshot of embeddings-benchmark/mteb website

arxiv.org/abs/2210.07316

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
ermlab/polish-word-embeddings-review	An evaluation framework for Polish word embeddings prepared by various research groups using analogy tasks.	4
mnqu/pte	An implementation of the Predictive Text Embedding model for learning word representations from large-scale heterogeneous text networks.	96
kudkudak/word-embeddings-benchmarks	Provides methods for evaluating word embeddings on various benchmarks	437
nlprinceton/text_embedding	A utility class for generating and evaluating document representations using word embeddings.	54
botcenter/spanishwordembeddings	This project generates Spanish word embeddings using fastText on large corpora.	9
bheinzerling/bpemb	A collection of pre-trained subword embeddings in 275 languages, useful for natural language processing tasks.	1,189
vzhong/embeddings	Provides fast and efficient word embeddings for natural language processing.	223
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
krisselden/ember-macro-benchmark	An Ember application benchmarking tool to measure the effects of small changes on web applications.	25
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
ncbi-nlp/biosentvec	Pre-trained word and sentence embeddings for biomedical text analysis	578
rguthrie3/morphologicalpriorsforwordembeddings	A project implementing a method to incorporate morphological information into word embeddings using a neural network model	52
alexandres/lexvec	An implementation of a word embedding model that uses character n-grams and achieves state-of-the-art results in multiple NLP tasks	803
bencheeorg/benchee	A tool for benchmarking Elixir code and comparing performance statistics	1,422
binwang28/sbert-wk-sentence-embedding	A method to generate sentence embeddings from pre-trained language models	178