A Better Privacy Analysis of the Exponential Mechanism
A basic and frequent task in data analysis is selection – given a set of options \(\mathcal{Y}\), output the (approximately) best one, where “best” is defined by some loss function \(\ell : \mathcal{Y} \times \mathcal{X}^n \to \mathbb{R}\) and a dataset \(x \in \mathcal{X}^n\). That is, we want to output some \(y \in \mathcal{Y}\) that approximately minimizes \(\ell(y,x)\). Naturally, we are interested in private selection – i.e., the output should be differentially private in terms of the dataset \(x\). This post discusses algorithms for private selection – in particular, we give an improved privacy analysis of the popular exponential mechanism.
The Exponential Mechanism
The most well-known algorithm for private selection is the exponential mechanism [MT07]. The exponential mechanism \(M : \mathcal{X}^n \to \mathcal{Y} \) is a randomized algorithm given by \[\forall x \in \mathcal{X}^n ~ \forall y \in \mathcal{Y} ~~~~~ \mathbb{P}[M(x) = y] = \frac{\exp(-\frac{\varepsilon}{2\Delta} \ell(y,x))}{\sum_{y’ \in \mathcal{Y}} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x)) }, \tag{1}\] where \(\Delta\) is the sensitivity of the loss function \(\ell\) given by \[\Delta = \sup_{x,x’ \in \mathcal{X}^n : d(x,x’) \le 1} \max_{y\in\mathcal{Y}} |\ell(y,x) - \ell(y,x’)|,\tag{2}\] where the supremum is taken over all datasets \(x\) and \(x’\) differing on the data of a single individual (which we denote by \(d(x,x’)\le 1\)).
In terms of utility, we can easily show that [BNSSSU16] \[\mathbb{E}[\ell(M(x),x)] \le \min_{y \in \mathcal{Y}} \ell(y,x) + \frac{2\Delta}{\varepsilon} \log |\mathcal{Y}|\] for all \(x \in \mathcal{X}^n\) (and we can also give high probability bounds).
It is easy to show that the exponential mechanism satisfies \(\varepsilon\)-differential privacy. But there is more to this story! We’re going to look at a more refined privacy analysis.
Bounded Range
The privacy guarantee of the exponential mechanism is more precisely characterized by bounded range. This was observed and defined by David Durfee and Ryan Rogers [DR19] and further analyzed later [DDR20].
Definition 1 (Bounded Range).1 A randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\eta\)-bounded range if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual, there exists some \(t \in \mathbb{R}\) such that \[\forall y \in \mathcal{Y} ~~~~~ \log\left(\frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]}\right) \in [t, t+\eta].\] Here \(t\) may depend on the pair of input datasets \(x,x’\), but not on the output \(y\).
To interpret this definition, we recall the definition of the privacy loss random variable: Define \(f : \mathcal{Y} \to \mathbb{R}\) by \[f(y) = \log\left(\frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]}\right).\] Then the privacy loss random variable \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) is given by \(Z = f(M(x))\).
Pure \(\varepsilon\)-differential privacy is equivalent to demanding that the privacy loss is bounded by \(\varepsilon\) – i.e., \(\mathbb{P}[|Z|\le\varepsilon]=1\). Approximate \((\varepsilon,\delta)\)-differential privacy is, roughly, equivalent to demanding that \(\mathbb{P}[Z\le\varepsilon]\ge1-\delta\).2
Now \(\eta\)-bounded range is simply demanding that the privacy loss \(Z\) is supported on some interval of length \(\eta\). This interval \([t,t+\eta]\) may depend on the pair \(x,x’\).
Bounded range and pure differential privacy are equivalent up to a factor of 2 in the parameters:
Lemma 2 (Bounded Range versus Pure Differential Privacy).
- \(\varepsilon\)-differential privacy implies \(\eta\)-bounded range with \(\eta \le 2\varepsilon\).
- \(\eta\)-bounded range implies \(\varepsilon\)-differential privacy with \(\varepsilon \le \eta\).
Proof. The first part of the equivalence follows from the fact that pure \(\varepsilon\)-differential privacy implies the privacy loss is supported on the interval \([-\varepsilon,\varepsilon]\). Thus, if we set \(t=-\varepsilon\) and \(\eta=2\varepsilon\), then \([t,t+\eta] = [-\varepsilon,\varepsilon]\). The second part follows from the fact that the support of the privacy loss \([t,t+\eta]\) must straddle \(0\). That is, the privacy loss cannot be always positive nor always negative, so \(0 \in [t,t+\eta]\) and, hence, \([t,t+\eta] \subseteq [-\eta,\eta]\). Otherwise \(\forall y ~ f(y)>0\) or \(\forall y ~ f(y)<0\) would imply \(\forall y ~ \mathbb{P}[M(x)=y]>\mathbb{P}[M(x’)=y]\) or \(\forall y ~ \mathbb{P}[M(x)=y]<\mathbb{P}[M(x’)=y]\), contradicting the fact that \(\sum_{y \in \mathcal{Y}} \mathbb{P}[M(x)=y] = 1\) and \(\sum_{y \in \mathcal{Y}} \mathbb{P}[M(x’)=y] = 1\). ∎
OK, back to the exponential mechanism:
Lemma 3 (The Exponential Mechanism is Bounded Range). The exponential mechanism (given in Equation 1 above) satisfies \(\varepsilon\)-bounded range .3
Proof. We have \[e^{f(y)} = \frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]} = \frac{\exp(-\frac{\varepsilon}{2\Delta}\ell(y,x))}{\exp(-\frac{\varepsilon}{2\Delta}\ell(y,x’))} \cdot \frac{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x’))}{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x))}.\] Setting \(t = \log\left(\frac{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x’))}{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x))}\right) - \frac{\varepsilon}{2}\), we have \[ f(y) = \frac{\varepsilon}{2\Delta} (\ell(y,x’)-\ell(y,x)+\Delta) + t.\] By the definition of sensitivity (given in Equation 2), we have \( 0 \le \ell(y,x’)-\ell(y,x)+\Delta \le 2\Delta\), whence \(t \le f(y) \le t + \varepsilon\). ∎
Bounded range is not really a useful privacy definition on its own. Thus we’re going to relate it to a relaxed version of differential privacy next.
Concentrated Differential Privacy
Concentrated differential privacy [BS16] and its variants [DR16] [M17] are relaxations of pure differential privacy with many nice properties. In particular, it composes very cleanly.
Definition 4 (Concentrated Differential Privacy). A randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\rho\)-concentrated differential privacy if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual, \[\forall \lambda > 0 ~~~~~ \mathbb{E}[\exp( \lambda Z)] \le \exp(\lambda(\lambda+1)\rho),\tag{3}\] where \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) is the privacy loss random variable.4
Intuitively, concentrated differential privacy requires that the privacy loss is subgaussian. Specifically, the bound on the moment generating function of \(\rho\)-concentrated differential privacy is tight if the privacy loss \(Z\) follows the distribution \(\mathcal{N}(\rho,2\rho)\). Indeed, the privacy loss random variable of the Gaussian mechanism has such a distribution.5
OK, back to the exponential mechanism: We know that \(\varepsilon\)-differential privacy implies \(\frac12 \varepsilon^2\)-concentrated differential privacy [BS16]. This, of course, applies to the exponential mechaism. A cool fact – that we want to draw more attention to – is that we can do better! Specifically, \(\eta\)-bounded range implies \(\frac18 \eta^2\)-concentrated differential privacy [CR21]. What follows is a proof of this fact following that of Mark Cesar and Ryan Rogers, but with some simplification.
Theorem 5 (Bounded Range implies Concentrated Differential Privacy). If \(M\) is \(\eta\)-bounded range, then it is \(\frac18\eta^2\)-concentrated differentially private.
Proof. Fix datasets \(x,x’ \in \mathcal{X}^n\) differing on a single individual’s data. Let \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) be the privacy loss random variable of the mechanism \(M\) on this pair of datasets. By the definition of bounded range (Definition 1), there exists some \(t \in \mathbb{R}\) such that \(Z \in [t, t+\eta]\) with probability 1. Now we employ Hoeffding’s Lemma [H63]:
Lemma 6 (Hoeffding’s Lemma). Let \(X\) be a random variable supported on the interval \([a,b]\). Then, for all \(\lambda \in \mathbb{R}\), we have \[\mathbb{E}[\exp(\lambda X)] \le \exp \left( \mathbb{E}[X] \cdot \lambda + \frac{(b-a)^2}{8} \cdot \lambda^2 \right).\]
Applying the lemma to the privacy loss gives \[\forall \lambda \in \mathbb{R} ~~~~~ \mathbb{E}[\exp(\lambda Z)] \le \exp \left( \mathbb{E}[Z] \cdot \lambda + \frac{\eta^2}{8} \cdot \lambda^2 \right).\] The only remaining thing we need is to show is that \(\mathbb{E}[Z] \le \frac18 \eta^2\).6
If we set \(\lambda = -1 \), then we get \( \mathbb{E}[\exp( - Z)] \le \exp \left( -\mathbb{E}[Z] + \frac{\eta^2}{8} \right)\), which rearranges to \(\mathbb{E}[Z] \le \frac18 \eta^2 - \log \mathbb{E}[\exp( - Z)]\). Now we have \[ \mathbb{E}[\exp( - Z)] \!=\! \sum_y \mathbb{P}[M(x)\!=\!y] \exp(-f(y)) \!=\! \sum_y \mathbb{P}[M(x)\!=\!y] \!\cdot\! \frac{\mathbb{P}[M(x’)\!=\!y]}{\mathbb{P}[M(x)\!=\!y]} \!=\! 1.\] ∎
This brings us to the TL;DR of this post:
Corollary 7. The exponential mechanism (given by Equation 1) is \(\frac18 \varepsilon^2\)-concentrated differentially private.
This is great news. The standard analysis only gives \(\frac12 \varepsilon^2\)-concentrated differential privacy. Constants matter when applying differential privacy, and we save a factor of 4 in the concentrated differential privacy analysis of the exponential mechanism for free with this improved analysis.
Combining Lemma 2 with Theorem 5 also gives a simpler proof of the conversion from pure differential privacy to concentrated differential privacy [BS16]:
Corollary 8. \(\varepsilon\)-differential privacy implies \(\frac12 \varepsilon^2\)-concentrated differential privacy.
Beyond the Exponential Mechanism
The exponential mechanism is not the only algorithm for private selection. A closely-related algorithm is report noisy max/min:7 Draw independent noise \(\xi_y\) from some distribution for each \(y \in \mathcal{Y}\) then output \[M(x) = \underset{y \in \mathcal{Y}}{\mathrm{argmin}} ~ \ell(y,x) - \xi_y.\]
If the noise distribution is an appropriate Gumbel distribution, then report noisy max is exactly the exponential mechanism. (This equivalence is known as the “Gumbel max trick.”)
We can also use the Laplace distribution or the exponential distribution. Report noisy max with the exponential distribution is equivalent to the permute and flip algorithm [MS20] [DKSSWXZ21]. However, these algorithms don’t enjoy the same improved bounded range and concentrated differential privacy guarantees as the exponential mechanism.
There are also other variants of the selection problem. For example, in some cases we can assume that only a few options have low loss and the rest of the options have high loss – i.e., there is a gap between the minimum loss and the second-lowest loss (or, more generally, the \(k\)-th lowest loss). In this case there are algorithms that attain better accuracy than the exponential mechanism under relaxed privacy definitions [CHS14] [BDRS18] [BKSW19].
There are a lot of interesting aspects of private selection, including questions for further research! We hope to have further posts about some of these topics.
-
For simplicity, we restrict our discussion here to finite sets of outputs, although the definitions, algorithms, and results can be extended to infinite sets. ↩
-
To be more precise, \((\varepsilon,\delta)\)-differential privacy is equivalent to demanding that \(\mathbb{E}[\max\{0,1-\exp(\varepsilon-Z)\}]\le\delta\) [CKS20]. (To be completely precise, we must appropriately deal with the \(Z=\infty\) case, which we ignore in this discussion for simplicity.) ↩
-
This proof actually gives a slightly stronger result: We can replace the sensitivity \(\Delta\) (defined in Equation 2) by half the range \[\hat\Delta = \frac12 \sup_{x,x’ \in \mathcal{X}^n : d(x,x’) \le 1} \left( \max_{\overline{y}\in\mathcal{Y}} \ell(\overline{y},x) - \ell(\overline{y},x’) - \min_{\underline{y}\in\mathcal{Y}} \ell(\underline{y},x) - \ell(\underline{y},x’) \right).\] We always have \(\hat\Delta \le \Delta\) but it is possible that \(\hat\Delta < \Delta\) and the privacy analysis of the exponential mechanism still works if we replace \(\Delta\) by \(\hat\Delta\). ↩
-
Equivalently, a randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\rho\)-concentrated differential privacy if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual, \[\forall \lambda > 0 ~~~~~ \mathrm{D}_{\lambda+1}(M(x)\|M(x’)) \le (\lambda+1)\rho,\] where \(\mathrm{D}_{\lambda+1}(M(x)\|M(x’)))\) is the order \(\lambda+1\) Rényi divergence of \(M(x)\) from \(M(x’)\). ↩
-
To be precise, if \(M(x) = q(x) + \mathcal{N}(0,\sigma^2I)\), then \(M : \mathcal{X}^n \to \mathbb{R}^d\) satisfies \(\frac{\Delta_2^2}{2\sigma^2}\)-concentrated differential privacy, where \(\Delta_2 = \sup_{x,x’\in\mathcal{X}^n : d(x,x’)\le1} \|q(x)-q(x’)\|_2\) is the 2-norm sensitivity of \(q:\mathcal{X}^n \to \mathbb{R}^d\). Furthermore, the privacy loss of the Gaussian mechanism is itself a Gaussian and it makes the inequality defining concentrated differential privacy (Equation 3) an equality for all \(\lambda\) ↩
-
Note that the expectation of the privacy loss is simply the KL divergence: \(\mathbb{E}[Z] = \mathrm{D}_1( M(x) \| M(x’) )\). ↩
-
We have presented selection here in terms of minimization, but most of the literature is in terms of maximization. ↩
[cite this]