TY - JOUR
T1 - Constrained Gradient Descent
T2 - 39th International Conference on Machine Learning, ICML 2022
AU - Lin, Weiran
AU - Lucas, Keane
AU - Bauer, Lujo
AU - Reiter, Michael K.
AU - Sharif, Mahmood
N1 - Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - We propose new, more efficient targeted white-box attacks against deep neural networks. Our attacks better align with the attacker's goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an ε-distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives-misclassification and bounded ℓp-norm-in a principled manner, as part of the optimization, instead of via ad hoc post-processing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9-4.2%) and ImageNet (8.6-13.6%) than state-of-the-art attacks while consuming less time (11.4-18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of ε.
AB - We propose new, more efficient targeted white-box attacks against deep neural networks. Our attacks better align with the attacker's goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an ε-distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives-misclassification and bounded ℓp-norm-in a principled manner, as part of the optimization, instead of via ad hoc post-processing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9-4.2%) and ImageNet (8.6-13.6%) than state-of-the-art attacks while consuming less time (11.4-18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of ε.
UR - http://www.scopus.com/inward/record.url?scp=85163123264&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85163123264
SN - 2640-3498
VL - 162
SP - 13405
EP - 13430
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 17 July 2022 through 23 July 2022
ER -