jailbreak-evaluation

Control evaluation

Evaluates language model attempts to determine their control and trustworthiness

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

GitHub

20 stars

0 watching

3 forks

Language: Python

last commit: 11 months ago

Linked from 1 awesome list

Screenshot of controllability/jailbreak-evaluation website

arxiv.org/abs/2404.06407

Backlinks from these awesome lists:

corca-ai/awesome-llm-security

Related projects:

Repository	Description	Stars
sony/pyieoe	Develops an interpretable evaluation procedure for off-policy evaluation (OPE) methods to quantify their sensitivity to hyper-parameter choices and/or evaluation policy choices.	31
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
psycoy/mixeval	An evaluation suite and dynamic data release platform for large language models	230
tonicai/tonic_validate	A framework for evaluating and monitoring the quality of large language model outputs in Retrieval Augmented Generation applications.	271
expyriment/expyriment	A Python library designed to support the development of timing-critical experiments in cognitive science and neuroscience.	115
django-behave/django-behave	Provides a way to run Behavior-Driven Development tests in Django applications	197
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
lartpang/pysodevaltoolkit	A comprehensive Python toolbox for evaluating salient object detection and camouflaged object detection tasks	168
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535
princeton-nlp/charxiv	An evaluation suite for assessing chart understanding in multimodal large language models.	85
behave/behave-django	A BDD testing framework for Django applications	205
ys-zong/vlguard	Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks	47
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
ukgovernmentbeis/inspect_ai	A framework for evaluating large language models	669
cisco-open/inclusive-language	Tools and resources for identifying biased language in code and content.	21