betterprompt

Prompt evaluator

An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation

Test suite for LLM prompts

GitHub

38 stars
3 watching
4 forks
Language: Python
last commit: 6 months ago

Related projects:

Repository Description Stars
vaibkumr/prompt-optimizer A tool to reduce the complexity of text prompts to minimize API costs and model computations. 241
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 20
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
pkunlp-icler/pca-eval An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks 100
ruixiangcui/agieval Evaluates foundation models on human-centric tasks with diverse exams and question types 708
rlancemartin/auto-evaluator An evaluation tool for question-answering systems using large language models and natural language processing techniques 1,063
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
emrekavur/chaos-evaluation Evaluates segmentation performance in medical imaging using multiple metrics 57
princeton-nlp/charxiv An evaluation suite for assessing chart understanding in multimodal large language models. 75
milvlg/prophet An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks. 267
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
obss/jury A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. 188
dfki-nlp/gevalm Evaluates German transformer language models with syntactic agreement tests 7
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528