A Provable Energy-Guided Test-Time Defense Boosting Adversarial Robustness of Large Vision-Language Models

Abstract

Despite the rapid progress in multimodal models and Large Visual-Language Models (LVLM), they remain highly susceptible to adversarial perturbations, raising serious concerns about their reliability in real-world use. While adversarial training has become the leading paradigm for building models that are robust to adversarial attacks, Test-Time Transformations (TTT) have emerged as a promising strategy to boost robustness at inference. In light of this, we propose Energy-Guided Test-Time Transformation ET3, a lightweight, training-free defense that enhances the robustness by minimizing the energy of the input samples. Our method is grounded in a theory that proves our transformation succeeds in classification under reasonable assumptions. We present extensive experiments demonstrating that ET3 provides a strong defense for classifiers, zero-shot classification with CLIP, and also for boosting the robustness of LVLMs in tasks such as Image Captioning and Visual Question Answering. Code is available on https://github.com/OmnAI-Lab/Energy-Guided-Test-Time-Defense.

Publication
IEEE/CVF Computer Vision and Pattern Recognition (CVPR)
Mirza Mujtaba Hussain
Mirza Mujtaba Hussain
PhD Student

Hi there! 👋 I’m Hussain, a Ph.D. student at Sapienza University. Currently I’m diving into Adversarial Machine Learning and Explainable AI to find practical solutions for real-world challenges. My goal is to use AI to make a positive impact on our society.

Antonio D'Orazio
Antonio D'Orazio
PhD Student

Hi! 👋 I’m Antonio, a Ph.D. student at Sapienza University. Fascinated by the world of Computer Graphics, my research interests span from solving inverse problems in the Graphics domain to finding explainable representations using Neuro-Symbolic AI.

Iacopo Masi
Iacopo Masi
Associate Professor (PI)

My research interests include computer vision, biometrics, AI.