FoolyourVLLMs

Attack framework

An attack framework to manipulate the output of large language models and vision-language models

[ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

GitHub

14 stars
1 watching
2 forks
Language: Python
last commit: about 1 year ago
adversarial-attacksllmsmcqvision-and-language

Related projects:

Repository Description Stars
yunqing-me/attackvlm An adversarial attack framework on large vision-language models 161
ys-zong/vlguard Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks 45
ys-zong/vl-icl A benchmarking suite for multimodal in-context learning models 28
hfzhang31/a3fl A framework for attacking federated learning systems with adaptive backdoor attacks 22
ethz-spylab/rlhf_trojan_competition Detecting backdoors in language models to prevent malicious AI usage 107
yuxie11/r2d2 A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese 157
jeremy313/fl-wbc A defense mechanism against model poisoning attacks in federated learning 37
junyizhu-ai/r-gap A tool to demonstrate and analyze attacks on private data in machine learning models using gradients 34
zjunlp/knowlm A framework for training and utilizing large language models with knowledge augmentation capabilities 1,239
weisong-ucr/mab-malware An open-source reinforcement learning framework to generate adversarial examples for malware classification models. 40
lhfowl/robbing_the_fed This implementation allows an attacker to directly obtain user data from federated learning gradient updates by modifying the shared model architecture. 23
yiyangzhou/lure Analyzing and mitigating object hallucination in large vision-language models to improve their accuracy and reliability. 134
kaiyuanzh/flip A framework for defending against backdoor attacks in federated learning systems 44
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
jind11/textfooler A tool for generating adversarial examples to attack text classification and inference models 494