jailbreak-evaluation

Control evaluation

Evaluates language model attempts to determine their control and trustworthiness

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

GitHub

19 stars
0 watching
3 forks
Language: Python
last commit: 20 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sony/pyieoe Develops an interpretable evaluation procedure for off-policy evaluation (OPE) methods to quantify their sensitivity to hyper-parameter choices and/or evaluation policy choices. 31
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
tonicai/tonic_validate A framework for evaluating and monitoring the quality of large language model outputs in Retrieval Augmented Generation applications. 258
expyriment/expyriment A lightweight Python library for designing and conducting timing-critical behavioral and neuroimaging experiments. 115
django-behave/django-behave Provides a way to run Behavior-Driven Development tests in Django applications 198
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
lartpang/pysodevaltoolkit A comprehensive Python toolbox for evaluating salient object detection and camouflaged object detection tasks 167
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
princeton-nlp/charxiv An evaluation suite for assessing chart understanding in multimodal large language models. 75
behave/behave-django A BDD testing framework for Django applications 205
ys-zong/vlguard Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks 45
allenai/olmo-eval An evaluation framework for large language models. 311
ukgovernmentbeis/inspect_ai A framework for evaluating large language models 615
cisco-open/inclusive-language Tools and resources for identifying biased language in code and content. 21