# Generating provably robust adversarial examples

#### Dimitar I. Dimitrov

#### April 21, 2022

On the image above we show an image of the digit $0$ from MNIST ($x_\text{orig}$) and a region around it in red that depicts the set of geometrically perturbed images for which we expect a given neural network to be robust.
Further, in green we depict a subregion where the neural network is not robust. Traditionally, in order to assess the robustness of the network one uses adversarial attacks to generate the examples $x_1$ and $x_2$.
While robustness can be assessed that way, the information that the whole green region is adversarial is lost. This in turn might lead to never-seen-before network behaviour in the future.
One advantage of the classical approach of assessing robustness, however, is that generating $x_1$ and $x_2$ is fast. In contrast, generating the green region is computationally infeasible.
In this work, we present an algorithm called **PARADE** that exploits classical adversarial attacks to generate as large as possible regions that are provably adversarial. Similarly to the green region in the figure, these regions summarize many indvidual advarsarial attacks while also being practical to compute.
We call them provably robust adversarial examples.

### Algorithm overview

There are three main steps to **PARADE**. First, we generate an initial box region that might contain non-adversarial points using off-the-shelf adversarial attacks.
We refer to this region as the overapproximation box $\mathcal{O}$. Then, we use a black-box verifier to shrink this overapproximation box to a smaller box that provably contains only adversarial points. We call this region the underapproximation box $\mathcal{U}$.
Finally, we use $\mathcal{O}$ and $\mathcal{U}$ to generate a polyhedral region $\mathcal{U}\subseteq\mathcal{P}\subseteq\mathcal{O}$ that we also prove only contains adversarial points using the same black-box verifier. Both $\mathcal{U}$ and $\mathcal{P}$ fit our definition of
provably robust adversarial examples but differ in terms of shape and precision. To this end, the generation of $\mathcal{P}$ is only an optional way to make our provably robust adversarial examples more precise. Next, we present the **PARADE** steps in details.

### Generating the overapproximation box $\mathcal{O}$

To generate the overapproximation box $\mathcal{O}$, we sample attacks from an adversarial attack algorithm, such as **PGD**. Then, we fit a box around them. The process is illustateted in the animation above.
We note that depending on the success of the attack algorithm, a small part of the ground truth adversarial region $\mathcal{T}$ might be excluded from $\mathcal{O}$.

### Generating the underapproximation box $\mathcal{U}$

We aim to generate the underapproximation box $\mathcal{U}$ in a way that it can be proven to only contain adversarial examples while also being as large as possible. Due to the complexity of this objective, we do this heuristically. In particular, we start by initializing $\mathcal{U}$ to the overapproximation box $\mathcal{O}$. At each iteration $i$, we execute a black-box verification procedure. If the procedure verifies that the box from the previous iteration, $\mathcal{U}_{i-1}$, contains only adversarial examples, we return it. Otherwise, we obtain from the verifier a linear contraint, which can be added to $\mathcal{U}_{i-1}$ in order to make it provably robust. Unfortunately, the constraint is usually too conservative, as the black-box verifier relies on overapproximation of the set of possible network outputs. To this end, we relax the constraint by bias-adjusting it. We make sure that we cannot relax the constraint too much, such that it becomes meaningless. We use the bias-adjusted contraint to shrink $\mathcal{U}_{i-1}$ such that the constraint is not violated but the smallest possible amount of volume is lost. The procedure is repeated until the verification succeeds. The process is depicted in the animation above.

### Generating the polyhedral region $\mathcal{P}$

Finally, we present the optional step of generating polyhedral provably robust adversarial example $\mathcal{P}$ from the box provably robust adversarial example $\mathcal{U}$.
The additional flexibility of the polyhedral shape allows for larger regions $\mathcal{P}$ compared to $\mathcal{U}$ in exchange for computational complexity. As generating polyhedral regions is hard, we again do this heuristically.
Starting with the overapproximation box $\mathcal{O}$, we iteratively add linear containts to it until we arrive at a polyhedron $\mathcal{P}$ that can be proved to only contain adversarial examples by the black-box verifier.
Similarly to the generation process of $\mathcal{U}$, we use the verification at iteration $i$ to generate linear contraints.
Unlike the generation process of $\mathcal{U}$, we use not only linear constraints from the final verification objective but also linear constraints that make the *ReLU* activation neurons in the network decided.
Unfortunately, the resulting constraints might generate polyhedron $\mathcal{P}$ that is smaller than $\mathcal{O}$. To prevenet that, we leverage the fact that $\mathcal{U}$ is itself provably robust and we bias-adjust the constraints in such a way that they do not remove volume from $\mathcal{U}$.
The procedure is concludes when the verifier succeeds. We outline the procedure in the animation above.

### Experiments

We experimented with two different types of provably robust adversarial examples - robust to pixel intensity changes ($\ell_\infty$ changes) and to geometric changes. We show the pixel intensity experiment below:

Network | $\epsilon$ | PARADE Box # Regions |
PARADE Box Time |
PARADE Box # Attacks |
PARADE Poly # Regions |
PARADE Poly Time |
PARADE Poly # Attacks |
---|---|---|---|---|---|---|---|

MNIST 8x200 |
0.045 | 53/53 | 114s | $10^{121}$ | 53/53 | 1556s | $10^{121} < \cdot < 10^{191}$ |

MNIST ConvSmall |
0.12 | 32/32 | 74s | $10^{494}$ | 32/32 | 141s | $10^{494} < \cdot < 10^{561}$ |

MNIST ConvBig |
0.05 | 28/29 | 880s | $10^{137}$ | 28/29 | 5636s | $10^{137} < \cdot < 10^{173}$ |

CIFAR-10 ConvSmall |
0.006 | 44/44 | 113s | $10^{486}$ | 44/44 | 264s | $10^{486} < \cdot < 10^{543}$ |

CIFAR-10 ConvBig |
0.008 | 36/36 | 404s | $10^{573}$ | 36/36 | 610s | $10^{573} < \cdot < 10^{654}$ |

We note **PARADE** is highly effective - it generates regions successfully for all but $1$ image for which the classical adversarial attacks succeeded. Further, the regions generated contain a very large set of adversarial examples that are infeasible to generate individually.
Finally, we note that the polyhedral adversarial examples take more time to generate but contain more examples. Calculating the exact number of concrete attacks within the polyhedral regions is computationally hard so instead we approximate the number as precisely as possible from above and below using boxes.

Next, we show the results for adversarial examples provably robust to geometric changes:

Network | Transform | PARADE Box # Regions |
PARADE Box Time |
PARADE Box # Attacks |
---|---|---|---|---|

MNIST ConvSmall |
Rotate + Scale + Shear | 51/54 | 774s | $10^{96} < \cdot < 10^{195}$ |

MNIST ConvSmall |
Scale + Translate2D | 51/56 | 521s | $10^{71} < \cdot < 10^{160}$ |

MNIST ConvSmall |
Scale + Rotate + Brightness | 40/48 | 370s | $10^{70} < \cdot < 10^{455}$ |

MNIST ConvBig |
Rotate + Scale + Shear | 44/50 | 835s | $10^{77} < \cdot < 10^{205}$ |

MNIST ConvBig |
Scale + Translate2D | 42/46 | 441s | $10^{64} < \cdot < 10^{174}$ |

MNIST ConvBig |
Scale + Rotate + Brightness | 46/52 | 537s | $10^{119} < \cdot < 10^{545}$ |

CIFAR-10 ConvSmall |
Rotate + Scale + Shear | 29/29 | 1369s | $10^{599} < \cdot < 10^{1173}$ |

CIFAR-10 ConvSmall |
Scale + Translate2D | 32/32 | 954s | $10^{66} < \cdot < 10^{174}$ |

CIFAR-10 ConvSmall |
Scale + Rotate + Brightness | 21/25 | 1481s | $10^{513} < \cdot < 10^{2187}$ |

We see that again **PARADE** is capable of generating examples for most images where classical adversarial attacks succeeded. We note that we use *DeepG* for verification.
Since DeepG generates image polyhedra, we have to approximate the number of concrete attacks similarly to **PARADE Poly** above. We also note that DeepG is more computationally expensive resulting is longer runtime for our algorithm, as well.

### Visualizing PARADE regions

Above we visualize some of the provably robust adversarial examples generated by **PARADE** for both pixel and geometric transformations. Each pixel’s color represents the number of possible values that pixel can have within our box regions.

### Summary

We introduced and motivated the concept of provably robust adversarial examples. We further showed an outline of our algorithm, **PARADE**, that generates such examples in the shape of boxes or polyhedra.
Emperically we demonstrated that regions produced by **PARADE** can summarize very large number of individual adversarial examples making them an useful tool to asses the robustness of neural networks.
We hope that we piqued your interest in our work. For further details and experiments, please refer to our *ICLR 2022 paper*.

### Acknowledgments

I would like to thank all of my collaborators for contributing to this paper. In particular, I want to thank *Gagandeep Singh*, who supervised me on the project and is now professor at UIUC, for his help and mentorship.