The U.S. PETs Prize Challenge is a contest aimed at discovering innovative solutions to two critical issues where privacy plays a crucial role.

The first challenge, Financial Crime Prevention, focuses on enhancing collaboration between banks and SWIFT in detecting fraudulent transactions without disclosing private information among the parties involved. The proposed solutions encompass a range of advanced and highly tailored cryptographic protocols designed to jointly flag fraudulent transactions by the participants.

The second challenge, Pandemic Response, seeks to improve the prediction of infection probabilities for individuals, a highly relevant issue that raises numerous privacy concerns when data needs to be shared across governments or jurisdictions. The solutions presented in this challenge employ a wide array of techniques, ranging from slightly modified standard federated learning setups to highly tailored probabilistic models for pandemic forecasting.

As a Red Team, we were tasked with evaluating various solutions from each track within a limited timeframe. To effectively analyze and attack all of them, we adopted a multi-step plan illustrated in the figure below:

First, we dedicated a substantial amount of time to understanding each task and solution individually, extracting several key pieces of information relevant to Red Teaming. In particular, we investigated the privacy and utility requirements of the tasks and the data supplied, as well as the parties involved and their interactions. Further, we examined the privacy assumptions and claims of the proposed solutions in detail, as well as their proposed theoretical and software techniques. Although some aspects are closely linked to privacy, it’s worth noting that we did not solely concentrate on that aspect of these components. We believe that Red Teams should adopt a more comprehensive perspective on the entire problem since all solution components are vital to the end-to-end process. Moreover, several submission issues, even when not directly related to privacy, can emerge naturally during a thorough analysis of the solution and may give rise to additional privacy concerns.

Secondly, we leveraged the gained in-depth understanding of the tasks and solutions to identify potential attack vectors. These are issues discovered within the given solution that could potentially lead to privacy concerns or related problems. It is worth noting that some of these attack vectors are fairly generic, such as our newest attack TabLeak that will be presented at ICML’23, and, thus, they should be incorporated as part of a proper “standard” evaluation of a given solution. We consider testing the success of generic attacks a critical subcomponent of an effective Red Teaming report. We provide several examples of attack vectors in the figure above, but it is important to note that most of these stem from discrepancies between the components identified in step 1 and/or an unsoundness in one of them. In fact, we were surprised to find how many solutions were already flawed due to misunderstandings regarding the precise requirements or the interactions between parties.

In Step 3, we start the process of attacking the given solutions, identifying four primary categories of uncovered issues that should be included in a Red Team report. Privacy breaches are the most critical set of issues but not the sole focus of our Red Teaming Report. We argue that a comprehensive analysis of the Privacy-Utility Trade-Off is equally vital to include in a report, as trivially private solutions are easily obtainable if utility is not required. Indeed, some of our reports were centered on this trade-off, as we demonstrated that while the implemented solutions were (relatively) secure, they either behaved so similarly to random models that this was anticipated, or that simplifications of the proposed mechanisms resulted in a superior privacy-utility trade-off. In addition to these essential components, we also incorporate the conceptual and theoretical flaws identified in the second step even when they didn’t directly lead to privacy attacks. We observed that addressing these flaws could either enhance the performance of the given solution or could lead to undesirable consequences for privacy.

Finally, we consolidate all uncovered issues into a single report including recommendations to correct the vulnerabilities discovered in the solution. This report should be precise and provide actionable suggestions for implementing patches to address the identified issues, or in cases where the privacy issue is inherent to the solution, recommend against using the system altogether in practice. It is also crucial to acknowledge when a solution is simply effective: the primary objective of a Red Team should not be to dismantle a system, but rather to rigorously evaluate it under stressful conditions and pinpoint problems. If breaking the system becomes the sole focus of the Red Team, we encounter the same issue as to why Blue Teams cannot conduct this analysis independently: this bias in the report skews the results, with less regard for the origins of the numbers presented.

The above explanation should offer a clear understanding of why Red Teaming for PETs is crucial, but we would like to emphasize this point further. A Red Teaming report provides a comprehensive evaluation of the solution across various dimensions, with privacy being the primary focus. A Red Team can objectively assess the solution’s performance, which might be more challenging for Blue Teams who directly benefit from a successful solution. Moreover, the complexity of a Red Team’s task is often inherently more difficult than that of the Blue Team due to the interactions between all the critical components identified earlier. Finally, it is worth noting that we managed to significantly compromise all the systems we evaluated, demonstrating that even when solutions are deemed good enough to progress to the final phase of a prestigious PETs challenge, issues can still arise and persist.

]]>Machine learning algorithms have set the state-of-the-art on most tasks where large amount of training data is available. While the improvements brought by these algorithms are impressive, their applications to settings where private data is used remain limited due to the privacy concerns posed by the large centralized datasets required by the training procedures. Recently, the federated learning framework has emerged as a promising alternative to collecting and training with centralized datasets.
In federated learning, multiple data owners (clients) collaboratively optimize a loss function $l$ w.r.t. the parameters $\theta$ of a global model $h$ on their own dataset $\mathcal{D}_i$ **without** sharing the data in $\mathcal{D}_i$ with the other participants:

To this end, the optimization is carried out in communication rounds. In particular, given the global parameters $\theta_t$ at round $t$, multiple clients compute model updates $g$ on their own data and then share them with a central server that aggregates them and produces a new global parameters $\theta_{t+1}$. After several communication rounds the model parameters converge to an optimum. One common implementation of this generic framework is the FedSGD algorithm, where updates are given by the gradient of $l$ w.r.t. $\theta_t$ on a single batch $\{(x^b_i, y^b_i)\}\sim \mathcal{D}_i$ of client data of size $B$:

\[\begin{equation*} g(\theta_t,\mathcal{D}_i) = \frac{1}{B} \sum_{b=1}^B \nabla_\theta \left[ l(h_{\theta_t}(x^b_i), y^b_i) \right]. \end{equation*}\]Federated learning in theory allows for improved data privacy, as the client data does not leave the individual clients. Unfortunately, several recent works have shown that updates $g$ computed by common federated algorithms such as FedSGD can be used by a malicious server during the aggregation phase to approximately reconstruct the client’s data. So far, prior work has focused on exposing this issue in the image domain where strong image priors help the reconstruction. In this work, we show that such approximate reconstruction is also possible in the text domain, where federated learning is very commonly applied.

In order to obtain approximate input reconstructions $\{\tilde{x}^b_i\}$ from the FedSGD update of some client $i$, with updates as described above, prior works typically solve the following optimization problem at some communication round $t$:

\[\begin{equation} \min_{\\{\tilde{x}^b_i\\}} \sum_{i=1}^n \mathcal{L}_{rec}\left( \left(\frac{1}{B} \sum_{b=1}^B \nabla_\theta l(h_\theta(\tilde{x}^b_i), y^b_i)\right), g(\theta_t, \mathcal{D}_i) \right) + \alpha_{rec}\,R(\{\tilde{x}^b_i\}), \end{equation}\]where $\mathcal{L}$_{rec} is distance metric - e.g., $L_1$, $L_2$ or cosine, that measures the gradient reconstruction error, $R(\{\tilde{x}^b_i\})$ is some domain specific prior - e.g. Total Variation (TV) in the image domain, that assesses the quality of the reconstructed inputs, and $\alpha_{rec}$ is hyperparameter balancing between the two. Note that $\theta_t$ and $g(\theta_t, \mathcal{D}_i)$ are known by the malicious server as the former is computed by it and the latter is sent to it by client $i$ at the end of the round. Often the batch labels $\{y^b_i\}$ can be obtained by the server using specific label reconstruction attacks, that are beyond the scope of this blog post, or just guessed by running the reconstruction with all possible labels due to their discrete nature, so throughout the post we only focus on reconstructing $\{\tilde{x}^b_i\}$. In our previous blog post, we have shown that solving the optimization problem above is equivalent to finding the Bayesian optimal adversary in this setting.

In the image domain, the optimization problem is typically solved using gradient descent on the batch of randomly initialized images $\{x^b_i\}$ using an image-specific prior $R$. In the next section, we first discuss why such a solution is not well suited to language data and we then discuss our method, **LAMP**, that combines a text-specific prior with a new way to solve the optimization problem above by alternating discrete and continuous optimization steps to obtain our state-of-the-art gradient leakage framework for text.

In this work, we focus our attention on transformer-based models $h_\theta$, as they are the state-of-the-art for modeling text across various language tasks. As these models operate on continuous vectors, typically they assume fixed-size vocabulary of size $V$ and embed each word into a different $\mathbb{R}^d$ vector. For a sequence of words of size $n$, we refer to the individual words in it with $t_1,\ldots,t_n$ and to their corresponding embeddings with $x_1,\ldots,x_n$.

In order to solve the gradient leakage optimization problem from the previous section, we choose to optimize directly over the embeddings $x_i$ as they, similarly to images, are represented by continuous values we can optimize over. However, uniquely to the text domain, only a finite subset of vectors in $\mathbb{R}^d$ are valid word embeddings. To this end, when we obtain our reconstructed embeddings $\tilde{x}_i$ for each of them we then select the most similar in cosine similarity token in the vocabulary to create a reconstruction of the sequence of words $\tilde{t}_1,\ldots,\tilde{t}_n$.

An additional issue that is specific to the text domain and, in particular, the transformer architecture is that the transformer outputs depend on word order only through positional embeddings. Therefore, the model gradient reconstruction loss $\mathcal{L}$_{rec} is not as affected by wrongly reconstructed word order as it is by the wrongly reconstructed word embeddings themselves. In practice, this results into the continuous optimization process often getting stuck in local minima caused by an embedding of a token that reconstructs the correct word at the wrong position. These local minima are hard to escape from continuously. To this end, we introduce a discrete optimization step that reorders the sentence periodically, allowing to escape the local minima. The discrete step works by first proposing several word order changes such as swapping the positions of two words or moving a sentence prefix to the end of the sentence. The different order changes are then assessed based on the combination of the gradient reconstruction $\mathcal{L}$_{rec} and the perplexity of the sentence $\mathcal{L}$_{lm} computed by auxiliary language model such as GPT-2 on the projected words $\tilde{t}_i$:

where $\alpha_{lm}$ is a hyperparameter balancing the two parts. The resulting end-to-end alternating optimization is demonstrated in the image below: where green boxes show the discrete optimization steps and the blue boxes demonstrate the continuous gradient descent optimization steps of the gradient leakage objective presented in the previous section.

Finally, similarly to the image domain, we introduce a new prior specific to text that improves our reconstruction. To this end, we made the empirical observation that during optimization often the embedding vectors $x_i$ grow in length even when their direction doesn’t change a lot. To this end, we regularize the average vector length of the embeddings in a sequence $\tilde{x}_i$ to be close to the average embedding length in the vocabulary $l_e$:

\[\begin{equation} R(\tilde{x}_i) = \left(\frac{1}{n}\sum_{i=1}^n \| \tilde{x}_i \|_2 - l_e\right)^2 \end{equation}\]This allows our embeddings to remain in the correct range of values, which in turn results in a more stable and accurate reconstruction of the embeddings $\tilde{x}_i$.

We evaluated LAMP on several standard sentiment classification datasets and architectures based on the BERT language models. As is typically the case with language models, we assume our models are pretrained to make word predictions on large text corpora and that federated learning is used only to fine-tune the models on the classification task at hand. We consider two versions of LAMP - one where $\mathcal{L}$_{rec} is a weighted sum of L1 and L2 distances (denoted LAMP_{L1+L2}), and another one where $\mathcal{L}$_{rec} is the cosine similarity (denoted LAMP_{cos}). We compare them to the state-of-the-art attacks - TAG based on the same L1+L2 distance, and DLG based on L2 distance alone. We evaluate the methods in terms of the Rouge-1 metric (R1) which measures the percentage of correctly reconstructed words and the Rouge-2 metric (R2) which measures the percentage of correctly reconstructed bigrams. We note one can interpret R2 as a proxy measurement of how well the order of the sentence has been reconstructed. We present a subset of the results shown in our paper on the CoLA dataset and batch size of 1 below:

$\text{TinyBERT}_6$ | $\text{BERT}_{BASE}$ | $\text{BERT}_{LARGE}$ | ||||
---|---|---|---|---|---|---|

Method | R1 | R2 | R1 | R2 | R1 | R2 |

DLG | 37.7 | 3.0 | 59.3 | 7.7 | 82.7 | 10.5 |

TAG | 43.9 | 3.8 | 78.9 | 10.2 | 82.9 | 14.6 |

$\text{LAMP}_{\cos}$ | 93.9 | 59.3 |
89.6 |
51.9 |
92.0 |
56.0 |

$\text{LAMP}_{\text{L1}+\text{L2}}$ | 94.5 |
52.1 | 87.5 | 47.5 | 91.2 | 47.8 |

We see that LAMP_{cos} is consistently recovering more words compared to the alternatives with LAMP_{L1+L2} close behind. Further, LAMP recovers substantially better sentence ordering. It is worth noting that the improvement over the baselines for both R1 and R2 is most pronounced on the smallest model $\text{TinyBERT}_6$ where recovery is the hardest. Additionally, we also experimented with recovering text in the setting where the batch size is bigger than 1. We are the first to present results in this setting and we show them below for the CoLA dataset:

B=1 | B=2 | B=4 | ||||
---|---|---|---|---|---|---|

Method | R1 | R2 | R1 | R2 | R1 | R2 |

DLG | 59.3 | 7.7 | 49.7 | 5.7 | 37.6 | 1.7 |

TAG | 78.9 | 10.2 | 68.8 | 7.6 | 56.2 | 6.7 |

$\text{LAMP}_{\cos}$ | 89.6 |
51.9 |
74.4 | 29.5 | 55.2 | 14.5 |

$\text{LAMP}_{\text{L1}+\text{L2}}$ | 87.5 | 47.5 | 78.0 |
31.4 |
66.2 |
21.8 |

We see that despite the worse quality of reconstruction, even batch size of 4 still leaks a substantial amount of data. Further, we observe that for bigger batch sizes LAMP_{L1+L2} performs better than LAMP_{cos}. Both LAMP methods, however, substantially improve upon the results of the baselines.
Finally, we show an example sentence reconstruction from LAMP and TAG on multiple datasets below:
Here, yellow signifies a single correctly reconstructed word and green signifies a tuple of correctly recovered words. We see that LAMP recovers the word order drastically better and often is even able to reconstruct it perfectly. In addition, LAMP recovers more individual words. This confirms qualitatively the effectiveness of our attack.

In this blog post, we introduced LAMP, a new framework for gradient leakage of text data from gradient updates in federated learning. Our key ideas are the alternating of continuous and discrete optimization steps and the introduction of an auxiliary text model which we use in the discrete part of our optimization to judge how well a piece of text is reconstructed. Thanks to these elements, our attack is able to produce substantially better text reconstructions compared to the state-of-the-art attacks both quantitatively and qualitatively. We, thus, show that many practical federated learning systems based on text are vulnerable and better mitigation steps should be taken. For more details please see our NeurIPS 2022 paper.

]]>We focus on the common *ML as a service* scenario, a two-party setting where a client offloads intensive computation (commonly NN inference) to the server.
The client data is of sensitive nature in many of the applications (e.g., financial, judicial), which motivated work in the field of *privacy-preserving NN inference*, aiming to build methods that perform the computation without the server learning the client data.
One of the most common techniques for this is *fully-homomorphic encryption* (FHE), which is rapidly becoming more practical.

Orthogonal to privacy, a long line of work focuses on enabling *NN inference with reliability guarantees*.
For example, in a loan prediction setting, augmenting predictions with *fairness* guarantees is in the interest of both parties, as it increases trust in the system and may be essential to ensure regulatory compliance.
Some of the latest works in this direction are LASSI and FARE, focusing on two aspects of the fairness problem.
Another common example is *robustness*, where for example, a medical image analysis system should be able to prove to clients that the diagnosis is robust to naturally-occurring measurement errors (see e.g., our latest work SABR).

While the problems of privacy-preserving and reliable inference are both well-established, there was no prior attempt to consolidate the work in these two fields. Thus, service providers who offer reliability guarantees currently have no simple way to transition their service to a privacy-preserving setting, a requirement which is becoming increasingly relevant. This is the problem we address in Phoenix, proposing a system that supports both: client data privacy and reliability guarantees. To this end, we lift the key building blocks of randomized smoothing, a technique for augmenting predictions with reliability guarantees, to the popular RNS-CKKS FHE scheme. The key challenges that Phoenix overcomes stem from the missing native support for control flow and evaluation of non-polynomial functions in the FHE scheme.

We now recall randomized smoothing on a high level.

As the service provider, we receive an input $x$ (in the illustration above, a cat image), and we aim to return a prediction $y$ (for some classification task) augmented with a reliability guarantee, for a property such as robustness.
We duplicate the input $n$ times, add independently sampled Gaussian noise to each copy, and perform batched NN inference to obtain the logit vectors, i.e., unnormalized probabilities.
Next, we apply the Argmax function to transform logits to predictions, and aggregate those predictions across $n$ samples to get the *counts*, indicating how many times each output class was predicted.
Finally, we perform a statistical test on the counts, which, if successful, produces a probabilistic reliability guarantee, ensuring that the prediction $y$ is robust with known high probability.

The key question is how this procedure needs to change if we attempt to execute it while protecting client data privacy, i.e., if the data is encrypted using FHE by the client. The key steps are illustrated below.

For the batched NN inference (dashed line) we directly utilize prior work, which offers efficient algorithms for the RNS-CKKS scheme. Further, the addition of noise is simple, as the noise can be directly added as a plaintext due to the homomorphic property. However, computing Argmax is a key challenge due to the difficulty of computing non-polynomial functions—we elaborate on this shortly. In the aggregation step we combine several methods from prior work with scheme-specific optimizations, and develop a novel heuristic for randomized smoothing, necessary to obtain a computationally feasible procedure. Finally, we perform a rewrite of the one-sided binomial test applied to counts to make it FHE-suitable. The output of Phoenix is a single ciphertext, which when decrypted with the secret key of the client, reveals both the prediction and the computed reliability guarantee. We next discuss the Argmax approximation in more detail, and refer to our paper for details regarding all other steps.

To efficiently approximate Argmax, we use the recent paper of Cheon et al. (ASIACRYPT ‘20), which propose *SgnHE*, a sign function approximation for FHE as a composition of low-degree polynomials, illustrated below.
Our approximate Argmax is built on several applications of *SgnHE* (see the paper for the full algorithm).
Most importantly, in our case it is crucial to have guarantees on the approximation quality of *SgnHE*—otherwise, the randomized smoothing reliability guarantee returned to the clients may in some cases be invalid, fundamentally compromising the protocol.

The *SgnHE* function is parametrized such that for desired parameters $a,b \in \mathbb{R}$, we can obtain an $(a,b)$-close approximation, meaning that for inputs $x \in [a, 1]$ the output is guaranteed to be in $[1 - 2^{-b}, 1]$ (similarly for the negative case).
However, as the server is unable to directly observe the intermediate values due to encryption, it is hard to ensure the above precondition is satisfied for logit values which are the input to the Argmax, and the first of the sign function applications it utilizes.

To overcome this we impose two *conditions* on the logit vectors, constraining the range and differences of their values, allowing us to appropriately rescale them and reason about the approximation quality.
As we can not prove for an arbitrary NN that such conditions on logits will always hold (e.g., be in some range), we use confidence intervals and a finite sample to upper bound the condition violation probability.
When reporting the guarantee to the client, the computed probability (approximation error) is added to the usual error probability of randomized smoothing as the probabilistic procedure (algorithmic error).
The resulting value represents the total error probability of our guarantee, maintaining the behavior of the non-private case.

In our extensive experiments across multiple scenarios we observe values for the total error probability of around 1%, confirming that our procedure leads to viable high-probability guarantees.
We further observe non-restrictive latencies and communication cost and high *consistency*, i.e., the results obtained with the FHE version of randomized smoothing are in almost 100% of the cases identical to those of the non-private baseline, confirming that transitioning a service to FHE using Phoenix does not sacrifice the key metrics.
Our Microsoft SEAL implementation is publicly available on GitHub.

We believe Phoenix is an important first step towards merging the worlds of reliable and privacy-preserving machine learning. For more details of the Argmax approximation, omitted parts of the protocol, as well as detailed experimental results including microbenchmarks, please refer to our paper.

]]>We attempt to explain the phenomenon where most state-of-the-art methods for certified training based on convex relaxations (such as FastIBP or the latest breakthrough SABR) focus on the loose interval propagation (IBP/Box), while intuitively, tighter relaxations (i.e., the ones that more tightly overapproximate the non-linearities in the network) should lead to better results. This was already observed in many prior works, which proposed several hypotheses. However, the paradox was never investigated in a principled way.

We start by proposing a way to quantify tightness (see the paper for details), and thoroughly reproducing the paradox: Across a wide range of settings, tighter relaxations consistently lead to lower certified robustness (in %) than the loose IBP relaxation. An example is shown in the following table, grouping equivalent methods from prior work (below we will refer to each group using the name in bold):

Relaxation | Tightness | Certified (%) |
---|---|---|

IBP / Box |
0.73 | 86.8 |

hBox / Symbolic Intervals |
1.76 | 83.7 |

CROWN / DeepPoly |
3.36 | 70.2 |

DeepZ / CAP / FastLin / Neurify |
3.00 | 69.8 |

CROWN-IBP (R) |
2.15 | 75.4 |

Our key observation is that there are other latent properties of relaxations, besides tightness, that affect success when relaxations are used in a training procedure.
More concretely, each of the tighter relaxations has either unfavorable *continuity* (i.e., the corresponding loss function is discontinuous with respect to network weights) or unfavorable *sensitivity* (i.e., the corresponding loss function is highly sensitive to small perturbations of network weights), both preventing successful optimization. By observing all three properties jointly, we can more easily interpret the seemingly counterintuitive results.

The plot below shows the relaxation of the ReLU non-linearity used by CROWN, for the example input range defined by $l=-5$ and $u=8$. By reducing $u$ (using the bottom slider), we can directly observe the discontinuity of CROWN, when its heuristic choice of the lower linear bound changes at $|l|=|u|$. Using the same plot we can observe the discontinuities of hBox at $l=0$. These observations imply discontinuities in the loss when a relaxation is used in training, which we further empirically observe in real scenarios. Finally, we can use the plot below to observe that no discontinuities can be found for IBP and DeepZ—a formal proof of their continuity in the general case is given in the paper.

While the sensitivity of the loss functions is harder to illustrate on a toy example as above, our derivation (Section 4.3 of the paper) demonstrates that the bounds used by CROWN, CROWN-IBP (R) and DeepZ lead to certified training losses highly sensitive to small changes in network weights, while the losses of IBP and hBox are not sensitive and induce more favorable loss landscapes. With these observations, we expand the table shown earlier to include all three relaxation properties: tightness, continuity and sensitivity. This illustrates that attempts to use tighter relaxations in certified training have introduced unfavorable properties of the loss, which resulted in the failure to outperform the continuous and non-sensitive IBP.

Relaxation | Tightness | Continuity | Sensitivity | Certified (%) |
---|---|---|---|---|

IBP / Box |
0.73 | $\checkmark$ | $\checkmark$ | 86.8 |

hBox / Symbolic Intervals |
1.76 | $\times$ | $\checkmark$ | 83.7 |

CROWN / DeepPoly |
3.36 | $\times$ | $\times$ | 70.2 |

DeepZ / CAP / FastLin / Neurify |
3.00 | $\checkmark$ | $\times$ | 69.8 |

CROWN-IBP (R) |
2.15 | $\times$ | $\times$ | 75.4 |

A natural question that arises is the one of improving unfavorable properties of relaxations to make them more suitable for certified training. In the paper we systematically investigate modifications of existing relaxations and find that designing a relaxation with all favorable properties is difficult, as the properties induce complex tradeoffs that depend on the setting. Still, such relaxations may exist, and future work might be able to utilize our findings to discover them.

A more promising approach seems to be the use of existing convex relaxations with modified training procedures designed to best utilize the benefits of each relaxation. Recent successful examples of this approach include COLT, which includes the counterexample search in training, CROWN-IBP, which heuristically combines the losses of two relaxations in training, and two recent methods which focus on IBP, aiming to improve its training procedure via better initialization and regularization (FastIBP) or the propagation of smaller input regions in training (SABR).

Finally, it is worth noting that there are other promising approaches for neural network certification that do not use convex relaxations and are thus not affected by tradeoffs between relaxation properties. Examples in this direction include Randomized Smoothing, offering high-probability robustness certificates, and custom certification-friendly model architectures such as $l_\infty$-distance nets. While not affected by limitations of convex relaxations, these approaches introduce other challenges such as optimization difficulties and additional inference-time work.

]]>There are three main steps to **PARADE**. First, we generate an initial box region that might contain non-adversarial points using off-the-shelf adversarial attacks.
We refer to this region as the overapproximation box $\mathcal{O}$. Then, we use a black-box verifier to shrink this overapproximation box to a smaller box that provably contains only adversarial points. We call this region the underapproximation box $\mathcal{U}$.
Finally, we use $\mathcal{O}$ and $\mathcal{U}$ to generate a polyhedral region $\mathcal{U}\subseteq\mathcal{P}\subseteq\mathcal{O}$ that we also prove only contains adversarial points using the same black-box verifier. Both $\mathcal{U}$ and $\mathcal{P}$ fit our definition of
provably robust adversarial examples but differ in terms of shape and precision. To this end, the generation of $\mathcal{P}$ is only an optional way to make our provably robust adversarial examples more precise. Next, we present the **PARADE** steps in details.

To generate the overapproximation box $\mathcal{O}$, we sample attacks from an adversarial attack algorithm, such as **PGD**. Then, we fit a box around them. The process is illustateted in the animation above.
We note that depending on the success of the attack algorithm, a small part of the ground truth adversarial region $\mathcal{T}$ might be excluded from $\mathcal{O}$.

We aim to generate the underapproximation box $\mathcal{U}$ in a way that it can be proven to only contain adversarial examples while also being as large as possible. Due to the complexity of this objective, we do this heuristically. In particular, we start by initializing $\mathcal{U}$ to the overapproximation box $\mathcal{O}$. At each iteration $i$, we execute a black-box verification procedure. If the procedure verifies that the box from the previous iteration, $\mathcal{U}_{i-1}$, contains only adversarial examples, we return it. Otherwise, we obtain from the verifier a linear contraint, which can be added to $\mathcal{U}_{i-1}$ in order to make it provably robust. Unfortunately, the constraint is usually too conservative, as the black-box verifier relies on overapproximation of the set of possible network outputs. To this end, we relax the constraint by bias-adjusting it. We make sure that we cannot relax the constraint too much, such that it becomes meaningless. We use the bias-adjusted contraint to shrink $\mathcal{U}_{i-1}$ such that the constraint is not violated but the smallest possible amount of volume is lost. The procedure is repeated until the verification succeeds. The process is depicted in the animation above.

Finally, we present the optional step of generating polyhedral provably robust adversarial example $\mathcal{P}$ from the box provably robust adversarial example $\mathcal{U}$.
The additional flexibility of the polyhedral shape allows for larger regions $\mathcal{P}$ compared to $\mathcal{U}$ in exchange for computational complexity. As generating polyhedral regions is hard, we again do this heuristically.
Starting with the overapproximation box $\mathcal{O}$, we iteratively add linear containts to it until we arrive at a polyhedron $\mathcal{P}$ that can be proved to only contain adversarial examples by the black-box verifier.
Similarly to the generation process of $\mathcal{U}$, we use the verification at iteration $i$ to generate linear contraints.
Unlike the generation process of $\mathcal{U}$, we use not only linear constraints from the final verification objective but also linear constraints that make the *ReLU* activation neurons in the network decided.
Unfortunately, the resulting constraints might generate polyhedron $\mathcal{P}$ that is smaller than $\mathcal{O}$. To prevenet that, we leverage the fact that $\mathcal{U}$ is itself provably robust and we bias-adjust the constraints in such a way that they do not remove volume from $\mathcal{U}$.
The procedure is concludes when the verifier succeeds. We outline the procedure in the animation above.

We experimented with two different types of provably robust adversarial examples - robust to pixel intensity changes ($\ell_\infty$ changes) and to geometric changes. We show the pixel intensity experiment below:

Network | $\epsilon$ | PARADE Box # Regions |
PARADE Box Time |
PARADE Box # Attacks |
PARADE Poly # Regions |
PARADE Poly Time |
PARADE Poly # Attacks |
---|---|---|---|---|---|---|---|

MNIST 8x200 |
0.045 | 53/53 | 114s | $10^{121}$ | 53/53 | 1556s | $10^{121} < \cdot < 10^{191}$ |

MNIST ConvSmall |
0.12 | 32/32 | 74s | $10^{494}$ | 32/32 | 141s | $10^{494} < \cdot < 10^{561}$ |

MNIST ConvBig |
0.05 | 28/29 | 880s | $10^{137}$ | 28/29 | 5636s | $10^{137} < \cdot < 10^{173}$ |

CIFAR-10 ConvSmall |
0.006 | 44/44 | 113s | $10^{486}$ | 44/44 | 264s | $10^{486} < \cdot < 10^{543}$ |

CIFAR-10 ConvBig |
0.008 | 36/36 | 404s | $10^{573}$ | 36/36 | 610s | $10^{573} < \cdot < 10^{654}$ |

We note **PARADE** is highly effective - it generates regions successfully for all but $1$ image for which the classical adversarial attacks succeeded. Further, the regions generated contain a very large set of adversarial examples that are infeasible to generate individually.
Finally, we note that the polyhedral adversarial examples take more time to generate but contain more examples. Calculating the exact number of concrete attacks within the polyhedral regions is computationally hard so instead we approximate the number as precisely as possible from above and below using boxes.

Next, we show the results for adversarial examples provably robust to geometric changes:

Network | Transform | PARADE Box # Regions |
PARADE Box Time |
PARADE Box # Attacks |
---|---|---|---|---|

MNIST ConvSmall |
Rotate + Scale + Shear | 51/54 | 774s | $10^{96} < \cdot < 10^{195}$ |

MNIST ConvSmall |
Scale + Translate2D | 51/56 | 521s | $10^{71} < \cdot < 10^{160}$ |

MNIST ConvSmall |
Scale + Rotate + Brightness | 40/48 | 370s | $10^{70} < \cdot < 10^{455}$ |

MNIST ConvBig |
Rotate + Scale + Shear | 44/50 | 835s | $10^{77} < \cdot < 10^{205}$ |

MNIST ConvBig |
Scale + Translate2D | 42/46 | 441s | $10^{64} < \cdot < 10^{174}$ |

MNIST ConvBig |
Scale + Rotate + Brightness | 46/52 | 537s | $10^{119} < \cdot < 10^{545}$ |

CIFAR-10 ConvSmall |
Rotate + Scale + Shear | 29/29 | 1369s | $10^{599} < \cdot < 10^{1173}$ |

CIFAR-10 ConvSmall |
Scale + Translate2D | 32/32 | 954s | $10^{66} < \cdot < 10^{174}$ |

CIFAR-10 ConvSmall |
Scale + Rotate + Brightness | 21/25 | 1481s | $10^{513} < \cdot < 10^{2187}$ |

We see that again **PARADE** is capable of generating examples for most images where classical adversarial attacks succeeded. We note that we use *DeepG* for verification.
Since DeepG generates image polyhedra, we have to approximate the number of concrete attacks similarly to **PARADE Poly** above. We also note that DeepG is more computationally expensive resulting is longer runtime for our algorithm, as well.

Above we visualize some of the provably robust adversarial examples generated by **PARADE** for both pixel and geometric transformations. Each pixel’s color represents the number of possible values that pixel can have within our box regions.

We introduced and motivated the concept of provably robust adversarial examples. We further showed an outline of our algorithm, **PARADE**, that generates such examples in the shape of boxes or polyhedra.
Emperically we demonstrated that regions produced by **PARADE** can summarize very large number of individual adversarial examples making them an useful tool to asses the robustness of neural networks.
We hope that we piqued your interest in our work. For further details and experiments, please refer to our *ICLR 2022 paper*.

I would like to thank all of my collaborators for contributing to this paper. In particular, I want to thank *Gagandeep Singh*, who supervised me on the project and is now professor at UIUC, for his help and mentorship.

In a nutshell, the neural network verification problem can be stated as follows:

*Given a network and an input, prove that all points in a small region around that input are classified correctly, i.e., that no adversarial example exists.*

To formalize this a bit, we consider a network $f: \mathcal{X} \to \mathcal{Y}$, an input region $\mathcal{D} \subseteq \mathcal{X}$, and a linear property $\mathcal{P}\subseteq \mathcal{Y}$ over the output neurons $y\in\mathcal{Y}$, and we try to prove that

\[f(x) \in \mathcal{P}, \forall x \in \mathcal{D}.\]For the sake of explanation, we consider a fully connected $L$-layer network with ReLU activations but note that we can handle all common architectures. We denote the weights and biases of neurons in the $i$-th layer as $W^{(i)}$ and $b^{(i)}$ and define the neural network as

\[f(x) := \hat{z}^{(L)}(x), \qquad \hat{z}^{(i)}(x) := W^{(i)}z^{(i-1)}(x) + b^{(i)}, \qquad z^{(i)}(x) := \max(0, \hat{z}^{(i)}(x)).\]Where $z^{(0)}(x) = x$ denotes the input, $\hat{z}$ are the pre-activation values, and $z$ the post-activation values. For readability, we omit the dependency of intermediate activations on $x$ from here on.

Let $\mathcal{D}$ be an $\ell_\infty$ ball around an input point $x_0$ of radius $\epsilon$: \(\mathcal{D}_\epsilon(x_0) = \left\{ x \in \mathcal{X} \mid \lVert x - x_0\rVert _{\infty} \leq \epsilon \right\}.\)

Since we can encode any linear property over output neurons into an additional affine layer, we can simplify the general formulation of $f(x) \in \mathcal{P}$ to $f(x) > {0}$. Any such property can now be verified by proving that a lower bound to the following optimization problem is greater than $0$:

\[\begin{align*} \min_{x \in \mathcal{D}_\epsilon(x_0)} \qquad &f(x) = \hat{z}^{(L)} \tag{1} \\ s.t. \quad & \hat{z}^{(i)} = W^{(i)}z^{(i-1)} + b^{(i)}\\ & z^{(i)} = \max({0}, \hat{z}^{(i)})\\ \end{align*}\]Recently, the *Branch-and-Bound* (**BaB**) approach, first described for this task in Branch and Bound for Piecewise Linear Neural Network Verification, has been popularized. At a high level, it is based on splitting the hard optimization problem of Eq. 1 into multiple easier subproblems by adding additional constraints until we can show the desired bound of $f(x) > 0$ on them.

The high-level motivation is the following: the optimization problem in Eq. 1 would be efficiently solvable if not for the non-linearity of the ReLU function. Since a ReLU function is piecewise linear and composed of only two linear regions, we can make a case distinction between a single ReLU node being “active” (input $\geq 0$) or inactive (input $< 0$) and prove the property on the resulting cases where the ReLU behaves linearly.

In the limit where all ReLU nodes are split, the verification problem becomes fully linear and can be solved efficiently. However, the number of subproblems to be solved in the resulting Branch-and-Bound tree is exponential in the number of ReLU neurons on which we split. Therefore, splitting all ReLU nodes is computationally intractable for all interesting verification problems. To tackle this problem, we prune this Branch-and-Bound tree using the insight that we do not have to split a subproblem further, once we find a lower bound that is $>0$.

In pseudo-code, the Branch-and-Bound algorithm looks as follows:

To define one particular verification method that follows the Branch-and-Bound approach, such as MN-BaB, all we have to do is instantiate the **branch()** and **bound()** functions.

Before we do that, we need to understand *multi-neuron constraints* (**MNCs**), the second key building block of MN-BaB.

To bound the optimization problem in Eq. 1 efficiently, we want to replace the non-linear constraint $z = \max({0}, \hat{z})$ with its so-called linear relaxation, i.e., a set of linear constraints that is satisfied for all points satisfying the original non-linear constraint. If we consider just a single neuron, the tightest such linear relaxation is the convex hull of the function in its input-output space:

However, considering one neuron at a time comes with a fundamental precision limit, called the (single-neuron) convex relaxation barrier. It has since been shown, that this limit can be overcome by considering multiple neurons jointly, thereby capturing interactions between these neurons and obtaining tighter bounds. We illustrate this improvement, showing a projection of the 4d input-output space of two neurons, below.

We use the efficiently computable *multi-neuron constraints* from PRIMA, which can be expressed as a conjunction of linear constraints over the joint input-output space.

The goal of the **bound()** method is to derive a lower bound to Eq. 1 that’s as tight as possible. The tighter it is, the earlier the Branch-and-Bound process can be terminated.

Following previous works, we derive a lower bound of the network’s output as a linear function of the inputs:

\[\min_{x \in \mathcal{D}} f(x) \geq \min_{x \in \mathcal{D}} a_{inp}x + c_{inp}\]There, the minimization over $x \in \mathcal{D}$ has a closed-form solution given by Hölder’s inequality:

\[\min_{x \in \mathcal{D}} a_{inp}x + c^{(0)} \geq a_{inp}x_0 - \lVert a_{inp} \rVert_1 \epsilon + c_{inp}\]To arrive at such a linear lower bound of the output in terms of the input, we start with the trivial lower bound $f(x) \geq z^{(L)}W + b^{(L)}$ and replace $z^{(L)}$ with symbolic, linear bounds depending only on the previous layer’s values $z^{(L-1)}$. We proceed in this manner recursively until we obtain an expression only in terms of the inputs of the network.

These so-called linear relaxations of the different layer types determine the precision of the obtained bounding method. While affine layers (e.g., fully connected, convolutional, avg. pooling, normalization) can be captured exactly, non-linear activation layers remain challenging and their encoding is what differentiates MN-BaB. Most importantly, MN-BaB enforces MNCs in an efficiently optimizable fashion. The full details are given in the paper but are rather technical and notation-heavy, so we will skip them here.

To derive the linear relaxations for activation layers, we need bounds on the inputs of those layers ($l_x$ and $u_x$ in the illustrations). In order to compute these lower and upper bounds on every neuron, we apply the procedure described above to every neuron in the network, starting from the first activation layer.

Note that if those input bounds for a ReLU node are either both negative or both positive, the corresponding activation function becomes linear and we do not have to split this node during the Branch-and-Bound process. We call such nodes “stable” and correspondingly nodes where the input bounds contain zero “unstable”.

The **branch()** method takes a problem instance and splits it into two subproblems. This means deciding which unstable ReLU node to split and adding additional constraints to both resulting subproblems enforcing $\hat{z}<0$ or $\hat{z}\geq0$, on the input of the split neuron.

The choice of which node to split has a significant impact on how many subproblems we have to consider during the Branch-and-Bound process until we can prove a property. Therefore, we aim to choose a neuron that minimizes the total number of problems we have to consider. To do this, we define a proxy score trying to capture the bound improvement resulting from any particular split. Note that the optimal branching decision depends on the bounding method that is used, as different bounding methods might profit differently from additional constraints resulting from the split.

As our bounding method relies on MNCs, we design a proxy score that is specifically tailored to them, called the *Active Constraint Score* (**ACS**). ACS determines the sensitivity of the final optimization objective with respect to the MNCs and then, for each node, computes the cumulative sensitivity of all constraints containing that node. We then split the node with the highest cumulative sensitivity.

We further propose *Cost Adjusted Branching* (**CAB**) to scale this branching score by the expected cost of performing a particular split. This cost can differ significantly, as only the intermediate bounds after the split layer have to be recomputed, making splits in later layers computationally cheaper.

Using MNCs for bounding, while making the bounds more precise, is computationally costly. The intuitive argument why it still helps verification performance is that the number of subproblems solved during Branch-and-Bound grows exponentially with the depth of the subproblem tree. A more precise bounding method that can verify subproblems earlier (at a smaller depth), can therefore save us exponentially many subproblems that we do not need to solve, which more than compensates for the increased computational cost.

This benefit is more pronounced the larger the considered network and the more dependencies there are between neurons in the same layer. Most established benchmarks (e.g., from VNNComp) are based on very small networks or use training methods designed for ease of verification at the cost of natural accuracy. While this makes their certification tractable, they are less representative of networks used in practice. Therefore, we suggest focusing on larger, more challenging networks with higher natural accuracy (and more intra-layer dependencies) for the evaluation of the next generation of verifiers. There, the benefits of MNCs are particularly pronounced, leading us to believe that they represent a promising direction.

We study the effect of MN-BaB’s components in an ablation study on the first 100 test images of the CIFAR-10 dataset. We aim to prove that there is no adversarial example within an $\ell_\infty$ ball of radius $\epsilon=1/255$ and report the number of verified samples (within a timeout of 600 seconds) and the corresponding average runtime. We consider two networks of identical architecture that only differ in the strength of their adversarial training method. ResNet6-A is weakly regularized while ResNet6-B is more strongly regularized, i.e. employs stronger adversarial training.

As expected, we see that both MNCs and Active Constraint Score branching are much more effective on the weakly regularized ResNet6-A. There, we verify 31% more samples while being around 31% faster, while on ResNet6-B we only verify 10% more samples.

As a more fine-grained measure of performance, we analyze the ratio of runtimes and number of subproblems required for verification on a per-property level on ResNet6-A.

**Effectiveness of Multi-Neuron Constraints**: We plot the ratio of the number of subproblems required to prove a property during Branch-and-Bound without vs. with MNCs. Using MNCs reduces the number of subproblems by two orders of magnitude on average.

**Effectiveness of Active Constraint Score Branching**: We plot the ratio of the number of subproblems solved during Branch-and-Bound with BaBSR vs. ACS. Using ACS reduces the number of subproblems by an additional order of magnitude.

**Effectiveness of Cost Adjusted Branching**: Finally, we investigate the effect of Cost Adjusted Branching on mean verification time with ACS. Using Cost Adjusted Branching further reduces the verification time by 50%. It is particularly effective in combination with the ACS scores and multi-neuron constraints, where bounding costs vary more significantly.

MN-BaB combines precise multi-neuron constraints with the Branch-and-Bound paradigm and an efficient GPU-based implementation to become a new state-of-the-art verifier, especially for less regularized networks. For a full breakdown of all technical details and detailed experimental evaluations, we recommend reading our paper. If you want to play around with MN-BaB yourself, please check out our code.

]]>Consider a case of a company with several teams that would like to build ML models for different products using the same data.
One option would be for each team to train their own model and enforce fairness of the model by themselves.
However, the teams might not have the same definition of fairness or they might even lack expertise to train fair models.
*Fair representation learning* is a data pre-processing technique that transforms data into a new representation such that any classifier trained on top of this representation is fair.
Using representation learning enables us to pre-process data only once, and then give processed data to each team so that they can train their own model on this new data, while knowing that the model is fair, according to a single, pre-defined fairness definition.
The key question here is how to ensure that sensitive attributes cannot be recovered from the learned representations.
Typically, prior work has checked that this is the case by jointly learning representations and an auxiliary adversarial model which is trying to predict the sensitive attribute from the representations.
However, while these representations protect against adversaries considered during training, several recent papers have shown that stronger adversaries can often in fact still recover the sensitive attributes.
Our work tackles this issue by proposing non-adversarial fair representation learning approach based on normalizing flows which can in certain cases *guarantee* that no adversary can reconstruct the sensitive attributes.

To motivate our fair representation learning approach, we introduce a small example of a population consisting of a mixture of 4 Gaussians. Consider a distribution of samples $x = (x_1, x_2)$ divided into two groups, shown as blue and orange in the figure below, with color and shape denoting sensitive attribute and label, respectively.

The first group with a sensitive attribute $a = 0$ has a distribution $(x_1, x_2) \sim p_0$, where $p_0$ is a mixture of two Gaussians at the top half.
The second group with a sensitive attribute $a = 1$ has a distribution $(x_1, x_2) \sim p_1$, where $p_1$ is a mixture of two Gaussians at the bottom half.
The label of a point $(x_1, x_2)$ is defined by $y = 1$ if $x_1$ and $x_2$ have the same sign, and $y = 0$ otherwise.
Our goal is to learn a data representation $z = f(x, a)$ such that it is *impossible* to recover $a$ from $z$, but still possible to predict target $y$ from $z$.
Note that such a representation exists for our task: simply setting $z = f(x, a) = (-1)^ax$ makes it impossible to predict whether a particular $z$ corresponds to $a = 0$ or $a = 1$, while still allowing us to train a classifier $h$ with essentially perfect accuracy (e.g. $h(z) = 1$ if and only if $z_1 > 0$).
This example also motivates our general approach: can we somehow map distributions corresponding to the two groups in the population to new distributions which are guaranteed to be difficult to distinguish?

As shown in the figure above, original features are useful for solving some downstream prediction task: we can train a classifier $h$ which predicts task label from the original features $x$ with reasonable accuracy. However, at the same time, an adversary $g$ can recover sensitive attribute from $x$, and use it to potentially discriminate in a downstream task. Motivated by the previous example, our goal is now to learn a function $f$ which transforms a pair of features and sensitive attribute $(x, a)$ into a new representation $z$ from which it is difficult to recover the sensitive attribute $a$. As in the previous example, we are going to encode both distributions corresponding to $a = 0$ and $a = 1$ using a bijective transformation. Our approach learns two bijective functions $f_0(x)$ and $f_1(x)$, and we denote the transformation as $f(x, a) = f_a(x)$.

One important quantity which we are interested in computing is *statistical distance* which measures how well can adversary distinguish between the distributions corresponding to the two groups in the population.
Importantly, Madras et al. have shown that bounding statistical distance also bounds other fairness measures such as demographic parity or equalized odds.
The statistical distance between the two encoded distributions, denoted as \(\mathcal{Z}_0\) and \(\mathcal{Z}_1\) is defined as:

where \(\mu\colon \mathbb{R}^d \rightarrow \{0, 1\}\) is a function from the set of all binary classifiers $\mathcal{B}$, trying to discriminate between $\mathcal{Z}_0$ and $\mathcal{Z}_1$. We can show that supremum is attained for $\mu^*$ which, for some $z$, evaluates to \(1\) if and only if \(p_{Z_0}(z) \leq p_{Z_1}(z)\). This shows that for computing statistical distance we need to know how to evaluate probability densities in the latent space, which is difficult for standard neural architectures because any $z$ can correspond to the several different inputs $x$. However, our approach uses bijective transformation for an encoder, and it is easy to compute the latent probability density using the change of variables formula:

\[\begin{equation} \log p_{Z_a}(z) = \log p_a(f^{-1}_a(z)) + \log \left | \det \frac{\partial f^{-1}_a(z)}{\partial z} \right | \end{equation}\]We train the two encoders $f_0$ and $f_1$ to minimize the statistical distance $\Delta(p_{Z_0}, p_{Z_1})$, while at the same time we can also train an auxiliary classifier which helps learned representations to be informative for downstream prediction tasks. One issue with training this way is that statistical distance is non-differentiable, as optimal adversary $\mu^*$ makes discrete thresholding decision, so we instead train by minimizing a loss which is a smooth approximation of the statistical distance. After training is finished, we can evaluate statistical distance exactly, without any approximation. Our guarantees assume that we know input distributions $p_0$ and $p_1$, which we most often have to approximate in practice. You can find full details of how our guarantees change when input distributions are estimated in our paper.

We evaluated FNF on several standard datasets from the fairness literature. For each dataset, we take one of the features to be a sensitive attribute (e.g. race), and we train a model which balances between accuracy, measuring how well can it predict a task label, and fairness, measuring how well can it debias the learned representations from the sensitive attributes.

The above figure shows results on Law School, Crime and Health Heritage Prize datasets, with each point representing a single model with different fairness-accuracy tradeoff. As a fairness metric, we measure statistical distance introduced earlier. We can observe that for all datasets FNF can effectively balance fairness and accuracy. In general, drop in accuracy is steeper for datasets where the task label is more correlated with the sensitive attribute. We provide more experimental results in our paper, including experiments with discrete datasets and comparison with prior work.

As mentioned earlier, FNF provides provable upper bound on the maximum accuracy of the adversary trying to recover the sensitive attribute, for the estimated probability densities of the input distribution. We show our upper bound on the adversarial accuracy computed from the statistical distance using the estimated densities (diagonal dashed line), together with adversarial accuracies obtained by training an adversary, a multilayer perceptron (MLP) with two hidden layers of 50 neurons, for each model from the figure. We can observe that FNF can successfully bound accuracy of the strong adversaries, even though the guarantees were computed on the estimated distributions. In our paper, we also experiment with using FNF for other tasks such as algorithmic recourse and transfer learning.

In this blog post, we introduced Fair Normalizing Flows (FNF), a new method for learning representations ensuring that no adversary can predict sensitive attributes at the cost of a small decrease in accuracy. Our key idea was to use an encoder based on normalizing flows which allows computing the exact likelihood in the latent space, given an estimate of the input density. Our experimental evaluation on several datasets showed that FNF effectively enforces fairness without significantly sacrificing utility, while simultaneously allowing interpretation of the representations and transferring to unseen tasks. For more details please see our ICLR 2022 paper.

]]>Goal of federated learning is to train a model $h_\theta$ through a collaborative procedure involving different clients, without data leaving individual client devices. Typically $h_\theta$ is a neural network with parameters $\theta$, classifying an input $x$ to a label $y$. We assume that pairs $(x, y)$ are coming from a distribution $\mathcal{D}$. In the standard federated learning setting, there are $n$ clients with loss functions $l_1, …, l_n$, who are trying to jointly solve the optimization problem and find parameters $\theta$ which minimize their average loss:

\[\begin{equation*} \min_{\theta} \frac{1}{n} \sum_{i=1}^n \mathbb{E}_{(x, y) \sim \mathcal{D}} \left[ l_i(h_\theta(x), y) \right]. \end{equation*}\]In a single training step, each client $i$ first computes $\nabla_{\theta} l_i(h_\theta(x_i), y_i)$ on a batch of data $(x_i, y_i)$, then sends these to the central server that performs a gradient descent step to obtain the new parameters $\theta’ = \theta - \frac{\alpha}{n} \sum_{i=1}^n \nabla_{\theta} l_i(h_\theta(x_i), y_i)$, where $\alpha$ is a learning rate. We will consider a scenario where each client reports, instead of the true gradient $\nabla_{\theta} l_i(h_\theta(x_i), y_i)$, a noisy gradient $g$ sampled from a distribution $p(g|x)$, which we call a defense mechanism. This setup is fairly general and captures common defenses such as DP-SGD. In this post, we are interested in the privacy issue of federated learning: can the input $x$ be recovered from gradient update $g$? More specifically, we are interested in analyzing the Bayes optimal attack and connecting it to the attacks from prior work.

To measure how well can adversary reconstruct user data we introduce the notion of *adversarial risk* for gradient leakage, and then derive the Bayes optimal adversary that minimizes this risk.
The adversary can only observe the gradient $g$ and tries to reconstruct the input $x$ that produced $g$.
Formally, the adversary is a function $f: \mathbb{R}^k \rightarrow \mathcal{X}$ mapping gradients to inputs.
Given some $(x, g)$ sampled from the joint distribution $p(x, g)$, the adversary outputs the reconstruction $f(g)$ and incurs loss $\mathcal{L}(x, f(g))$, which is a function $\mathcal{L}: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$.

The loss measures whether the advesary was able to reconstruct the original data. Typically, we will consider a binary loss that evaluates to 0 if the adversary’s output is close to the original input, and 1 otherwise. If the adversary only wants to get to some $\delta$-neighbourhood of the input $x$ in the input space, an appropriate definition of the loss can be $\mathcal{L}(x, x’) := 1_{||x - x’||_2 > \delta}$. This definition is well suited for image data, where $\ell_2$ distance captures our perception of visual similarity. We can now define the risk $R(f)$ of the adversary $f$ as

\[\begin{equation} R(f) := \mathbb{E}_{x, g} \left[ \mathcal{L}(x, f(g)) \right] = \mathbb{E}_{x \sim p(x)} \mathbb{E}_{g \sim p(g|x)} \left[ \mathcal{L}(x, f(g)) \right]. \end{equation}\]We can then manipulate this expression and show that \(R(f) = 1 - \mathbb{E}_g \int_{B(f(g), \delta)} p(x|g) \,dx\). This allows us to work out what is the optimal adverasary $f$ which minimizes the adversarial risk $R(f)$:

\[\begin{align} f(g) &= \underset{x_0 \in \mathcal{X}}{\operatorname{argmax}} \int_{B(x_0, \delta)} p(x|g) \,dx \nonumber \\ &= \underset{x_0 \in \mathcal{X}}{\operatorname{argmax}} \int_{B(x_0, \delta)} \frac{p(g|x)p(x)}{p(g)} \nonumber \,dx \\ &= \underset{x_0 \in \mathcal{X}}{\operatorname{argmax}} \int_{B(x_0, \delta)} p(g|x)p(x) \,dx \nonumber \\ &= \underset{x_0 \in \mathcal{X}}{\operatorname{argmax}} \left[ \log \int_{B(x_0, \delta)} p(g|x)p(x) \,dx \right] \label{eq:optadv} %% &\geq \argmax_{x_0 \in \mathcal{X}} \int_{B(x_0, \delta)} \log p(g|x) + \log p(x) \,dx \label{eq:optadv} \end{align}\]While the above provides a formula for the optimal adversary in the form of an optimization problem, using this adversary for practical reconstruction is computationally difficult so we can approximate it by applying Jensen’s inequality \(\log C \int_{B(x_0, \delta)} p(g|x)p(x) \,dx \geq C \int_{B(x_0, \delta)} (\log p(g|x) + \log p(x)) \,dx.\)

We can then approximate the integral by Monte Carlo sampling and optimize the objective using gradient descent. As shown in the figure above, adversary can randomly initialize the input, and then optimize for the input which has highest likelihood of being close to the original input which produced the update gradient $g$. One interesting thing is that we can now recover attacks from prior work by using different priors $p(x)$ and conditional $p(g|x)$, meaning that attacks from prior work are different approximations of the Bayes optimal adversary. For example, DLG is recovered by using uniform prior and Gaussian conditional, another attack is recovered by using total variation prior and cosine conditional, while GradInversion uses combination of total variation and DeepInversion prior combined with Gaussian conditional.

In this experiment, we evaluate the three recently proposed defenses Soteria, ATS and PRECODE against strong gradient leakage attacks. While these defenses can protect privacy against weaker attackers (as shown in the respective papers), we show that they are not actually successful against strong attacks. Below we show our reconstructions for each defense, evaluated on CIFAR-10 dataset.

Each defense introduces a different $p(g|x)$ which we use to derive an approximation of Bayes optimal adversary. Our results indicate that these defenses do not reliably protect privacy under gradient leakage, especially in the early stages of the training. This suggests that creating effective defenses and strong evaluation methods remains a key challenge. We provide full description of these defenses and our attacks in our paper.

In the next set of experiments we compare Bayes optimal attack with several other, non-optimal attacks. We consider several different defenses all of which produce different distributions $p(g|x)$. One defense adds Gaussian noise, another randomly prunes out some elements of the gradient and then adds Gaussian noise, and the third one adds Laplacian noise after pruning. We measure PSNR of the reconstructions, where higher value means that reconstruction is closer to the original image.

We observe that Bayes optimal adversary generally performs best, showing that the optimal attack needs to leverage structure of the probability distribution of the gradients induced by the defense. Note that, in the case of Gaussian defense, $\ell_2$ and Bayes attacks are equivalent up to a constant factor, and it is expected that they achieve a similar result. In all other cases, Bayes optimal adversary outperforms the other attacks. Overall, this experiment provides empirical support for our theory, confirming practical utility of the Bayes optimal adversary.

In this blog post we considered the problem of privacy in federated learning and investigated the Bayes optimal adversary which tries to reconstruct original data from the gradient updates. We derived form of this adversary and showed that attacks proposed in prior work are different approximations of this optimal adversary. Experimentally, we showed that existing defenses do not protect against strong attackers, and that deriving good defense remains an open challenge. Furthermore, we showed that Bayes optimal adversary is stronger than other attacks when it can exploit structure in probability distributions, confirming our theoretical results. For more details, please check out our ICLR 2022 paper.

]]>A promising method providing such guarantees for large networks is Randomized Smoothing (RS). The core idea is to obtain probabilistic robustness guarantees with arbitrarily high confidence by adding noise to the input of a base classifier and computing the majority vote of the classification over a large number of perturbed inputs using Monte Carlo sampling.

In this blogpost, we consider *applying RS to ensembles* as base classifiers and explain why they are a particularly suitable choice. For this, we will first give a short recap on Randomized Smoothing, before explaining our approach and discussing our theoretical results. Finally, we show that our approach yields a new state-of-the-art in most settings, often even while using less compute than current methods.

We consider a (soft) base classifier $f \colon \mathbb{R}^d \mapsto \mathbb{R}^{n}$ predicting a numerical score for each class and let \(F(x) := \text{arg max}_{i} \, f_{i}(x)\) denote the corresponding hard classifier $\mathbb{R}^d \mapsto [1, \dots, n]$. Randomized Smoothing (RS) takes such a base classifier, evaluates it on a large number of slightly perturbed versions of an input, and then predicts the majority classification over the resulting predictions. The bigger the difference between the probability of the most likely and second most likely class, the more robust the resulting smoothed classifier.

Formally, we write \(G(x) := \text{arg max}_c \, \mathcal{P}_{\epsilon \sim \mathbb{N}(0, \sigma_{\epsilon}^2 I)}(F(x + \epsilon) = c)\) for the smoothed classifier. This smoothed classifier is guaranteed to be robust, i.e., predict $G(x + \delta) = c_A$, under all perturbations $\delta$ satisfying $\lVert \delta \rVert_2 < R$ with $R := \sigma_{\epsilon}\Phi^{-1}(\underline{p_A})$, where $c_A$ is the majority class, $\underline{p_A}$ the lower bound to its success probability $\mathcal{P}_{\epsilon}(F(x + \epsilon) = c_A) \geq \underline{p_A}$ and $\Phi^{-1}$ the inverse Gaussian CDF. As $\underline{p_A}$ increases, so does $R$.

Instead of a single model $f$, we now consider a soft ensemble of $k$ models $\{ f^l \}_{l=1}^k$:

\[\bar{f}(x) = \frac{1}{k} \sum_{l=1}^{k} f^l(x)\]We obtain different models $f^l$ by varying only the random seed for training.

Now, we will show theoretically why ensembles are particularly suitable base models. As shown in the illustration below, ensembling reduces the prediction’s variance over the noise introduced in RS, leading to a larger certification radius $R$.

Formally, we can introduce a random variable $z_i$ for the logit difference between class $c_A$ and the other classes $c_i$. We denote it as `classification margin’ and can compute its variance depending on the number $k$ of ensembled classifiers ($\sigma^2(k)$) and normalize it with that of a single classifier ($\sigma^2(1)$):

\[\frac{\sigma^2(k)}{\sigma^2(1)} = \frac{1 + \zeta_{} (k-1)}{k} \xrightarrow {k \to \infty} \zeta.\]Here, $\zeta$ is a small constant describing the degree of correlation between the ensembled classifiers. We observe that for weakly correlated classifiers, the variance is significantly reduced. Using Chebychev’s inequality, we can translate this reduction in variance into an increase in the lower bound to the success probability of the majority class $c_A$:

\[\underline{p_{A}} \geq 1 - \sum_{i \neq A} \frac{\sigma_i(k)^2}{\bar{z}_i^{\,2}}% = 1\]We see that the success probability goes towards 1 quadratically as the variance is reduced. Assuming Gaussian distributions and estimating all its parameters from real ResNet20, we obtain the following distribution over the classification margin to the runner-up class $c_i$:

Here, the success probability $p_A$ of the model corresponds directly to the portion of the area under the curve (the probability mass) to the right of the black line. While we see that the mean classification margin remains unchanged, this portion and thus the success probability increase significantly as we ensemble more models.

**Certified Radius** Having computed the success probability as a function of the number $k$ of ensembled models, we can now derive the probability distribution over the $\ell_2$-radius we can certify using RS.

As we increase the number of models we ensemble, the whole distribution shifts to larger radii. In contrast, simply increasing the number of samples used for the Monte Carlo estimation mostly concentrates the distribution and yields a much smaller increase in certified radius.

For a deeper dive and a full explanation and validation of all our assumptions, please check out our ICLR’22 Spotlight paper.

We conduct experiments on ImageNet and CIFAR-10 using a wide range of training methods, network architectures, and noise magnitudes and consistently observe that ensembles outperform their best constituting models.

**CIFAR-10**

Using an ensemble of up to ten ResNet110’s clearly outperforms the best constituting model (currently SOTA). Even an ensemble of just three ResNet20’s outperform a single ResNet110, despite requiring significantly less compute for training and inference.

Using more samples with just a single network barely improves the certified radius at all, unless mathematically necessary to achieve a sufficiently high confidence level. Note that this is only the case for very large radii (here 2.0), and, in contrast to our method, does not actually make the model more robust.

**ImageNet**

On ImageNet, an ensemble of just three ResNet50’s improves over the current state-of-the-art by more than 10%.

We propose a theoretically motivated and statistically sound approach to construct low variance base classifiers for Randomized Smoothing by ensembling. We show theoretically and empirically why this approach significantly increases certified accuracy yielding state-of-the-art results. If you are interested in more details please check out our ICLR 2022 paper.

]]>