PhD Seminar • Cryptography, Security, and Privacy (CrySP) • DAD Knows Best: Mitigating Adversarial Examples via Deep Adversarial Distillation | Cheriton School of Computer Science

Please note: This PhD seminar will take place online.

Andre Kassis, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Urs Hengartner

Numerous defenses against adversarial attacks have been proposed. Yet, to date, no viable defense exists as advanced and adaptive attacks continue to emerge, defeating all defenses. Inspired by adversarial examples, we explore a novel strategy wherein we turn this very property of neural networks into a defense against adversarial attacks.

Our method exploits the ability of adversarial examples to influence neural networks’ decisions through minor modifications. Specifically, we employ such perturbations as a distillation mechanism that extracts structural information from user-provided inputs. Our defense operates on these signatures extracted in the form of adversarial perturbations. When the signatures are incorporated into a secret vector known only to the model owner, they will complete a partial fingerprint of their class embedded in this vector. Samples of different classes, even if perturbed adversarially, are structurally different and, when distilled, will not have the ability to complete the partial fingerprint, thereby failing to prove that they belong to the target class. This property leaves the defense highly robust to adversarial attacks as long as the secret vector remains hidden. The attacker cannot perform useful optimizations even when the defense is fully whitebox and adaptive attacks are mounted.

We introduce and formalize our novel deep adversarial distillation strategy and propose a concrete design to realize it in the form of a generator-discriminator network that is generic and easily applicable to any task or architecture. We provide empirical evaluations cementing the robustness of our design and proving its ability to provide resistance to several state-of-the-art adversarial attacks when used to protect various models designed for tasks from different domains. Our evaluations include experiments with a variety of adaptive attacks, demonstrating the defense’s ability to detect those as well.