jailbreak-evaluation

Control evaluation

Evaluates language model attempts to determine their control and trustworthiness

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

GitHub

20 stars
0 watching
3 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sony/pyieoe Develops an interpretable evaluation procedure for off-policy evaluation (OPE) methods to quantify their sensitivity to hyper-parameter choices and/or evaluation policy choices. 31
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 56
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 230
tonicai/tonic_validate A framework for evaluating and monitoring the quality of large language model outputs in Retrieval Augmented Generation applications. 271
expyriment/expyriment A Python library designed to support the development of timing-critical experiments in cognitive science and neuroscience. 115
django-behave/django-behave Provides a way to run Behavior-Driven Development tests in Django applications 197
openai/simple-evals Evaluates language models using standardized benchmarks and prompting techniques. 2,059
lartpang/pysodevaltoolkit A comprehensive Python toolbox for evaluating salient object detection and camouflaged object detection tasks 168
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 535
princeton-nlp/charxiv An evaluation suite for assessing chart understanding in multimodal large language models. 85
behave/behave-django A BDD testing framework for Django applications 205
ys-zong/vlguard Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks 47
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
ukgovernmentbeis/inspect_ai A framework for evaluating large language models 669
cisco-open/inclusive-language Tools and resources for identifying biased language in code and content. 21