Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin

Dec 14, 2019·

K. Wu

Y. Yu

· 0 min read

PDF Cite Code Poster URL

Abstract

Deep models, while being extremely versatile and accurate, are vulnerable to adversarial attacks: slight perturbations that are imperceptible to humans can completely flip the prediction of deep models. Many attack and defense mechanisms have been proposed, although a satisfying solution still largely remains elusive. In this work, we give strong evidence that during training, deep models maximize the minimum margin in order to achieve high accuracy, but at the same time decrease the average margin hence hurting robustness. Our empirical results highlight an intrinsic trade-off between accuracy and robustness for current deep model training. We also show that such trade-off can be broken by robust training methods.

Type

Book section

Publication

NeurIPS Workshop on Maching Learning with Guarantees

Last updated on Dec 14, 2019

Conference

← Convergence of Gradient Methods on Bilinear Zero-Sum Games Dec 19, 2019

Least-Squares Estimation of Weakly Convex Functions Jul 1, 2019 →