betterprompt

Prompt evaluator

An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation

Test suite for LLM prompts

43 stars

3 watching

4 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
vaibkumr/prompt-optimizer	A tool to reduce the complexity of text prompts to minimize API costs and model computations.	246
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
pkunlp-icler/pca-eval	An open-source benchmark and evaluation tool for assessing multimodal large language models' performance in embodied decision-making tasks	99
ruixiangcui/agieval	Evaluates foundation models on human-centric tasks with diverse exams and question types	714
rlancemartin/auto-evaluator	An evaluation tool for question-answering systems using large language models and natural language processing techniques	1,065
open-compass/vlmevalkit	An evaluation toolkit for large vision-language models	1,514
emrekavur/chaos-evaluation	Evaluates segmentation performance in medical imaging using multiple metrics	57
princeton-nlp/charxiv	An evaluation suite for assessing chart understanding in multimodal large language models.	85
milvlg/prophet	An implementation of a two-stage framework designed to prompt large language models with answer heuristics for knowledge-based visual question answering tasks.	270
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
chenllliang/mmevalpro	A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline.	22
obss/jury	A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation.	187
dfki-nlp/gevalm	Evaluates German transformer language models with syntactic agreement tests	7
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535