We question the current evaluation practice on diffusion-based purification
methods. Diffusion-based purification methods aim to remove adversarial effects
from an input data point at test time. The approach gains increasing attention
as an alternative to adversarial training due to the disentangling between
training and testing. Well-known white-box attacks are often employed to
measure the robustness of the purification. However, it is unknown whether
these attacks are the most effective for the diffusion-based purification since
the attacks are often tailored for adversarial training. We analyze the current
practices and provide a new guideline for measuring the robustness of
purification methods against adversarial attacks. Based on our analysis, we
further propose a new purification strategy showing competitive results against
the state-of-the-art adversarial training approaches.
Go to Source of this post
Author Of this post: <a href="http://arxiv.org/find/cs/1/au:+Lee_M/0/1/0/all/0/1">Minjong Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+Kim_D/0/1/0/all/0/1">Dongwoo Kim</a>