Abstract
We propose new, more efficient targeted white-box attacks against deep neural networks. Our attacks better align with the attacker's goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an ε-distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives-misclassification and bounded ℓp-norm-in a principled manner, as part of the optimization, instead of via ad hoc post-processing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9-4.2%) and ImageNet (8.6-13.6%) than state-of-the-art attacks while consuming less time (11.4-18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of ε.
Original language | English |
---|---|
Pages (from-to) | 13405-13430 |
Number of pages | 26 |
Journal | Proceedings of Machine Learning Research |
Volume | 162 |
State | Published - 2022 |
Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 |
Funding
Funders | Funder number |
---|---|
Blavatnik Family Foundation | |
Department of Defense | |
Maof prize for excellent young faculty | |
NSF | |
National Security Agency | H9823018D0008 |
National Science Foundation | 2113345, 2112562, 1801391 |
U.S. Department of Defense | FA8702-15-D-0002 |