Differentially Private Algorithms that Never Fail
Most differentially private algorithms fail with some nonzero probability. For example, when adding Gaussian or Laplace noise, there is some chance that the noise deviates significantly from its mean. But, fortunately, large deviations are unlikely. In this post we’re going to take a closer look at failure modes of DP algorithms and we’ll present some generic methods for reducing – or even eliminating – the failure probability.
Let’s be precise about what we mean by failure probability:
Let’s assume we have a
For example, if
To be clear, in this post we’re talking about failures of utility, which are different from failures of privacy.
In a previous post, we talked about privacy failures; roughly, the
Here’s our problem: We’re given a DP algorithm
Warmup: Absorbing the failure probability into
Let’s start with a simple trick to get zero failure probability. This trick should hopefully give you some intuition for why it’s even possible to have zero failure probability under DP.
Suppose that, in addition to the
Now let’s define
Clearly
Fix arbitrary neighbouring
This trick is neat since it lets us eliminate one of the parameters (
This trick only works if you already have a small failure probability
Avoiding silent failures
Above, we non-privately checked the failure condition
We can’t do much with an arbitrary loss function, so we need to make some assumptions.
First, we will assume the loss has sensitivity
Are these assumptions reasonable?
First, if the loss is high-sensitivity, then we can apply tricks like inverse sensitivity to get a low-sensitivity loss.
Second, we can contrast with the exponential mechanism, which guarantees loss
Now we can modify the algorithm
The benefit of this modified DP algorithm
Conditioning on success
To recap, we have a
Now we define our final algorithm
- Function
: - Repeat as long as necessary:
- Compute
. - If
, return and halt. Otherwise continue.
In other words, the output of
By construction, we have
Theorem 1. Let
be defined as above. Assume is -differentially private and, for all inputs , if , then .2 Then satisfies -differential privacy for
Proof.
Let
As long as
Theorem 2 Let
have sensitivity 1 in its second argument and let with . Let satisfy -DP and assume for all .
Then there existsthat is -DP and for all , where is the truncation threshold for -DP truncated Laplace noise.4
Note that our utility failure probability
Overall we pay a constant factor in the privacy parameters and suffer an additive increase in the loss in order to eliminate the failure probability. And (unlike the earlier trick) this is true even if the initial failure probability was quite large.
Conclusion
We’ve presented two methods for eliminating the failure probability from DP algorithms. The first method simply moves the failure from utility to privacy; this has obvious downsides. The second method avoids these downsides and is applicable even when the initial failure probability is large, but it blows up the privacy parameters by a multiplicative factor and requires some wiggle room in the loss. The second method is based on a result by Gupta, Ligett, McSherry, Roth, & Talwar [GLMRT10 Theorem 10.2].
In both cases, we crucially exploit the nonzero
For simplicity, this post has focused on fully eliminating the failure probability.
What if, instead, we just want to reduce it?
Is
If you want to learn more about extensions of the second method, read the papers of Liu & Talwar [LT19], Papernot & Steinke [PS22], and Cohen, Lyu, Nelson, Sarlós, & Stemmer [CLNSS23].
These methods are particularly useful in settings where the initial success probability is low, e.g.
The other key take-home message of this post is that the failure probability shouldn’t be a first-order concern, at least from a theoretical perspective. In particular, if we obtain bounds on the expected error, then we can obtain high-probability bounds via this method.9
In many cases the reductions we presented are not practical; it’s usually easier to directly modify the algorithm to reduce the failure probability. However, the fact that these generic methods exist offers an explanation for why, in practice, failure probabilities are relatively easy to manage.
-
Note that we’re implicitly assuming that the loss
is known – i.e., it is something we can compute when designing algorithms. In particular, the loss must be an empirical loss, rather than a population loss. ↩ ↩2 -
It’s important that we have a provable worst-case failure probability bound for the original algorithm
, since we want a provable privacy guarantee. In particular, if we only have a heuristic that works for most inputs , but fails badly on other inputs, then we cannot get a provable DP guarantee using these methods. It is possible that heuristics can be modified to fail gracefully and thus these methods can be salvaged, but that’s beyond the scope of this post. ↩ ↩2 ↩3 ↩4 ↩5 -
For simplicity, in this post we will (mostly) talk about failure as a boolean event; i.e., there is a hard utility threshold at
. Of course, in most cases, there is not a hard threshold and it makes sense to talk about the tail probability as a function of the threshold , rather than a single value. Note that we look at the worst-case over inputs ; that is, we aren’t in a statistical setting where inputs are random and we aren’t considering (non-private) statical errors. ↩ -
To achieve
-DP with , we can use Laplace noise truncted to magnitude . Truncated Laplace noise is folklore [L16]; Holohan et al. [HABA18] give a sharp analysis. ↩ ↩2 -
Note that we can always just define
or we can re-run until we achieve the desired loss. ↩ -
Recall, that the privacy failure probablity should be tiny – e.g.,
– for the privacy guarantee to be compelling. ↩ -
For simplicity, we’re setting the privacy parameters of the truncated Laplace to be the same as for
. In practice, this might be excessive and a different balance would work better. Also, some algorithms naturally output an estimate of their error and so this modification may not be necessary. ↩ -
If the utility threshold
isn’t known (e.g., if it depends on the input ), then there are other methods than can be used [LT19,PS22,CLNSS23], but this is beyond the scope of this blog post. ↩ -
To be precise, if we have
and , then Markov’s inequality gives for all . We can plug this bound into Theorem 2 to get a high-probability bound. ↩
[cite this] 0 Comments