vllm-safety-benchmark
Vision model safety test
A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks.
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
72 stars
4 watching
3 forks
Language: Python
last commit: about 1 year ago adversarial-attacksbenchmarkdatasetsllmmultimodal-llmrobustnesssafetyvision-language-model
Related projects:
Repository | Description | Stars |
---|---|---|
| An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 84 |
| An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models | 19 |
| Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks | 47 |
| A toolkit to detect and protect against vulnerabilities in Large Language Models. | 122 |
| A set of tools and guidelines for assessing the security vulnerabilities of language models in AI applications | 28 |
| A unified benchmark for safe reinforcement learning algorithms and environments. | 410 |
| A toolkit for assessing trustworthiness in large language models | 491 |
| Evaluates and benchmarks the robustness of deep learning models to various corruptions and perturbations in computer vision tasks. | 1,030 |
| A large language model designed to process and generate visual information | 956 |
| Develops a PyTorch implementation of an enhanced vision language model | 93 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| A PyTorch implementation of an encoder-free vision-language model that can be fine-tuned for various tasks and modalities | 246 |
| A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,923 |
| A PyTorch toolbox for supporting research and development of domain adaptation, generalization, and semi-supervised learning methods in computer vision. | 1,236 |
| A benchmark for evaluating large language models' ability to process multimodal input | 322 |