Differential PrivacyWebsite for the differential privacy research community
https://differentialprivacy.org
One-shot DP Top-k mechanisms<p>In the last <a href="https://differentialprivacy.org/exponential-mechanism-bounded-range/"><em>blog post</em></a>, we showed that the exponential mechanism enjoys improved composition bounds over general pure DP mechanisms due to a property called <strong>bounded range</strong>. For this post, we will present another useful, and somewhat surprising, property of the exponential mechanism in its application of top-\(k\) selection.</p>
<h2 id="differentially-private-top-k-selection">Differentially Private Top-\(k\) Selection</h2>
<p>We will focus on datasets that are a vector of counts \(h = (h_1, \cdots, h_d) \in \mathbb{N}^d\), which consist of counts \(h_i\) for elements from a universe \(\mathcal{U}\) where \(|\mathcal{U}| = d\). Let’s assume that a user’s data can modify each count by at most 1, yet can change all \(d\) counts, i.e. the \(\ell_\infty\)-sensitivity is 1 and the \(\ell_0\)-sensitivity is \(d\). The task here is to return the top-\(k\) elements from the input counts in a differentially private way.</p>
<p>For top-\(1\), this is simply returning the element with the max count, and this is precisely the problem that the exponential mechanism is set up to solve. Let’s write out the exponential mechanism \(M^{(1)}: \mathbb{N}^d \to [d]\) for this instance:
\[
\mathbb{P}[M^{(1)}(h) = i] = \frac{e^{ \varepsilon h_i }}{\sum_{j \in [d] } e^{ \varepsilon h_j } }, \qquad \forall i \in [d].
\]
For those wondering why this formula omits the factor of \(1/2\) in the exponent, we are using the <a href="https://dongjs.github.io/2020/02/10/ExpMech.html">stronger result</a> of the exponential mechanism which replaces global sensitivity with the range of the loss function \(\ell(i,h) = - h_i\), which is \(1\) in this case. Recall from the last blog post that the exponential mechanism is \(\varepsilon^2/8\)-CDP.</p>
<p>Hence, to generalize this to top-\(k\) selection, we can simply iteratively apply this exponential mechanism by removing the <em>discovered</em> element from each previous round. That is, we write \(M^{(k)}: \mathbb{N}^d \to [d]^k\) as the following for any outcome \( (i_1, i_2, \cdots, i_k) \in [d]^k\),</p>
<p>\[
\mathbb{P}[M^{(k)}(h) = (i_1, i_2, \cdots, i_k)] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\]
<a name="eq:peelingEM"></a>
\[\qquad = \frac{e^{ \varepsilon h_{i_1} }}{\sum_{j \in [d] } e^{ \varepsilon h_j } } \cdot \frac{ e^{\varepsilon h_{i_2} } }{\sum_{j\in [d]\setminus \{ i_1\}} e^{\varepsilon h_j } } \cdot \cdots \cdot \frac{ e^{ \varepsilon h_{i_2} } }{\sum_{j\in [d]\setminus \{ i_1, \cdots, i_{k-1}\}} e^{ \varepsilon h_j } }.
\tag{1}
\]</p>
<p>We can then apply composition to conclude that \(M^{(k)}\) is \(k \varepsilon^2/8\)-CDP.</p>
<h2 id="gumbel-noise-and-the-exponential-mechanism">Gumbel Noise and the Exponential Mechanism</h2>
<p>As we discussed in our last post, we can implement the exponential mechanism by adding <a href="https://en.wikipedia.org/wiki/Gumbel_distribution">Gumbel</a> noise to each count and reporting the noisy max element. A Gumbel random variable \(X \sim \text{Gumbel}(\beta) \), parameterized by scale parameter \(\beta>0\), has the following density function
<a name="eq:GumbelDensity"></a>
\[
p(x;\beta) = \frac{1}{\beta} \exp\left( - x/\beta - e^{-x/\beta} \right), \qquad \forall x \in \mathbb{R}.
\tag{2}
\]</p>
<p>Hence, we can write the exponential mechanism in the following way
\[
M^{(1)}(h) = \arg\max \{ h_i + X_i : i \in [d] \}, \qquad \{X_i \} \stackrel{i.i.d.}{\sim} \text{Gumbel}(1/\varepsilon).
\]</p>
<p>We can then extend this to top-\(k\) by repeatedly adding independent Gumbel noise to each count and removing the discovered element for the next round. However, something that would significantly improve run time would be to add Gumbel noise to each count <em>once</em> and then take the elements with the top-\(k\) noisy counts. We could then add only \(d\) many noise terms, rather than \(O(d^k)\) noise terms if we were to iteratively run \(k\) different exponential mechanisms. The question is, does this one-shot top-\(k\) Gumbel noise mechanism ensure the same level of privacy?</p>
<p>Let’s denote the one-shot Gumbel mechanism as \(\tilde{M}^{(k)}\). At first glance, it does not seem like the one-shot Gumbel mechanism \(\tilde{M}^{(k)}\) should be just as private as the iterative exponential mechanism \(M^{(k)}\), but it turns out they are exactly the same mechanism! The following result is due to <a href="https://arxiv.org/abs/1905.04273" title="David Durfee, Ryan Rogers. Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. NeurIPS 2019"><strong>[DR19]</strong></a>.</p>
<blockquote>
<p><strong>Theorem 1</strong>
For any input vector of counts \(h \in \mathbb{N}^d\), the one-shot Gumbel mechanism \(\tilde{M}^{(k)}(h)\) and iteratively applying the exponential mechanism \(M^{(k)}(h)\) are equal in distribution.</p>
</blockquote>
<p><em>Proof.</em>
Recall the distribution of the iterative exponential mechanism \(M^{(k)}(h)\) from <a href="#eq:peelingEM">(1)</a>.
Now we consider the one-shot Gumbel mechanism \(\tilde{M}^{(k)}(h)\) where we use the density of \(X \sim \) Gumbel\( (1/\varepsilon)\) from <a href="#eq:GumbelDensity">(2)</a>.<br />
\[
\mathbb{P}[\tilde{M}^{(k)}(h) = (i_1, \cdots, i_k)] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\]
\[
\qquad = \int_{-\infty}^\infty p(u_1 - h_{i_1}) \int_{-\infty}^{u_1} p(u_2 - h_{i_2}) \cdots \int_{-\infty}^{u_{k-1}} p(u_k - h_k)
\]
<a name="eq:integral"></a>
\[
\qquad \qquad \cdot \prod_{j \in [d] \setminus \{i_1, \cdots, i_k \} } \mathbb{P}[ X < u_k - h_j]du_k \cdots du_2 du_1.
\tag{3}
\]
Note that we have
\[
\mathbb{P}[X < y] = \exp\left( - \exp\left( -\varepsilon y \right) \right).
\]
Let’s focus on the inner integral over \(u_k\) in <a href="#eq:integral">(3)</a>.<br />
\[
\int_{-\infty}^{u_{k-1}} p(u_k - h_{i_k} )\prod_{j \in [d] \setminus \{i_1, \cdots, i_k \} } \mathbb{P}[X < u_k - h_j ]du_k \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\]
\[
\quad = \int_{-\infty}^{u_{k-1}} \varepsilon \cdot \exp\left( - \varepsilon (u_k - h_{i_k}) - e^{ -\varepsilon (u_k - h_{i_k}) } \right) \cdot \exp\left( -e^{-\varepsilon u_k} \sum_{j \in [d] \setminus \{i_1, \cdots, i_k \} } e^{\varepsilon h_j} \right) du_k
\]
\[
\quad = \varepsilon e^{\varepsilon h_{i_k}} \int_{-\infty}^{u_{k-1}} \exp\left( -\varepsilon u_k - e^{-\varepsilon u_k} \left( e^{\varepsilon h_{i_k}} + \sum_{j \in [d] \setminus \{i_1, \cdots i_k \} } e^{\varepsilon h_j} \right) \right) du_k \qquad
\]
<a name="eq:lastLine"></a>
\[
\qquad = \varepsilon e^{\varepsilon h_{i_k}} \int_{-\infty}^{u_{k-1}} \exp\left( -\varepsilon u_k - e^{-\varepsilon u_k} \left(\sum_{j \in [d] \setminus \{i_1, \cdots i_{k-1} \} } e^{\varepsilon h_j} \right) \right) du_k. \qquad \qquad
\tag{4}
\]
We now integrate with a \(v\)-substitution,
\[
v =e^{-\varepsilon u_{k}} \sum_{j \in [d] \setminus \{i_1, \cdots i_{k-1} \} } e^{\varepsilon h_j}<br />
\]
\[
dv = - \varepsilon \sum_{j \in [d] \setminus \{i_1, \cdots i_{k-1} \} } e^{\varepsilon h_j} \cdot e^{-\varepsilon u_{k}} du_{k}.
\]</p>
<p>Continuing with <a href="#eq:lastLine">(4)</a>, we get
\[
\int_{-\infty}^{u_{k-1}} p(u_k - h_{i_k} )\prod_{j \in [d] \setminus \{i_1, \cdots, i_k \} } \mathbb{P}[X < u_k - h_j ]du_k \qquad \qquad \qquad \qquad \qquad
\]
\[
\qquad = \frac{e^{\varepsilon h_{i_k} }}{\sum_{j \in [d] \setminus \{i_1, \cdots, i_{k-1} \}} e^{\varepsilon h_j}} \cdot \exp\left( - e^{-\varepsilon u_{k-1}} \cdot \sum_{j \in [d] \setminus \{i_1, \cdots, i_{k-1} \}} e^{\varepsilon h_j} \right)
\]
\[
\qquad = \frac{e^{\varepsilon h_{i_k} }}{\sum_{j \in [d] \setminus \{i_1, \cdots, i_{k-1} \}} e^{\varepsilon h_j}} \cdot \prod_{j \in [d] \setminus \{i_1, \cdots, i_{k-1} \} } \mathbb{P}[X < u_{k-1} - h_j ] .
\]
Note how this line has the last term in the expression for \(M^{(k)}(h)\) in <a href="#eq:peelingEM">(1)</a>, which is independent of \(u_{k-1}\) and can hence be pulled out of the larger integral in <a href="#eq:integral">(3)</a>. By induction, we have
\[
\mathbb{P}[\tilde{M}^{(k)}(h) = (i_1, \cdots, i_k)] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\]
\[
\qquad = \frac{e^{\varepsilon h_{i_1} }}{\sum_{j \in [d]} e^{\varepsilon h_j}} \cdot \frac{e^{\varepsilon h_{i_2} }}{\sum_{j \in [d] \setminus \{ i_1\}} e^{\varepsilon h_j}} \cdot \cdots \cdot \frac{e^{\varepsilon h_k }}{\sum_{j \in [d] \setminus \{i_1, \cdots, i_{k-1} \}} e^{\varepsilon h_j}}
\]
\[
\qquad = \mathbb{P}[M^{(k)}(h) =(i_1, \cdots, i_k)].
\] ∎</p>
<p>So that’s great! We can now run the one-shot Gumbel mechanism for top-\(k\) and still get the improved composition bounds of the exponential mechanism. In addition to this achieving better runtime, this analysis can help with proving top-\(k\) DP algorithms over a large domain universe despite giving access to only the true top-\(\bar{k}\) items and their counts where \(\bar{k} > k \), see <a href="https://arxiv.org/abs/1905.04273" title="David Durfee, Ryan Rogers. Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. NeurIPS 2019"><strong>[DR19]</strong></a> for more details.</p>
<h2 id="report-noisy-max-for-dp-top-k">Report Noisy Max for DP Top-\(k\)</h2>
<p>We now turn to comparing this algorithm to some natural alternatives. As we discussed in the last post, there is a family of mechanisms, report noisy max (RNM) mechanisms, that ensure differential privacy for the selection problem, and hence the top-\(1\) problem. We showed that the exponential mechanism is equivalent to RNM with Gumbel noise, there is also RNM with Laplace and with Exponential noise, the last being the recently discovered <em>permute-and-flip</em> mechanism <a href="https://arxiv.org/abs/2010.12603" title="Ryan McKenna, Daniel Sheldon. Permute-and-Flip: A new mechanism for differentially private selection
. NeurIPS 2020."><strong>[MS20]</strong></a> <a href="https://arxiv.org/abs/2105.07260" title="Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., Thomas Steinke, Yuxin Wang, Yingtai Xiao, Danfeng Zhang. The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise. 2021."><strong>[DKSSWXZ21]</strong></a>.</p>
<p>To then use RNM mechanisms for top-\(k\), we can again iteratively apply them and use composition to get the overall privacy guarantee. However, it turns out that you can also use the Laplace noise version of RNM in one-shot <a href="https://arxiv.org/abs/2105.08233" title="Gang Qiao, Weijie J. Su, Li Zhang. Oneshot Differentially Private Top-k Selection. ICML 2021."><strong>[QSZ21]</strong></a>.</p>
<p>We can compare the relative noise that is added to each count in both the Laplace and Gumbel versions. Since <a href="https://arxiv.org/abs/2105.08233" title="Gang Qiao, Weijie J. Su, Li Zhang. Oneshot Differentially Private Top-k Selection. ICML 2021."><strong>[QSZ21]</strong></a> gives their privacy guarantee in terms of approximate \((\varepsilon,\delta )\)-DP, we will now make the comparison there. We first look at the standard deviation for Laplace \( \sigma_{\text{Lap}}\) (using Theorem 2.2 in <a href="https://arxiv.org/abs/2105.08233" title="Gang Qiao, Weijie J. Su, Li Zhang. Oneshot Differentially Private Top-k Selection. ICML 2021."><strong>[QSZ21]</strong></a>).
\[
\sigma_{\text{Lap}} = \frac{8 \sqrt{2k \ln(d/\delta)}}{\varepsilon}.
\]
Note that the one-shot Laplace mechanism returns counts as well as the indices of the top-\(k\), both of which use Laplace noise with standard deviation \(\sigma_{\text{Lap}}\), so we will also include Laplace noise to the discovered elements in the Gumbel version. That is, we add Gumbel noise with scale \(\sqrt{k}/\varepsilon’\) for the discovery portion and Laplace noise with scale \(2\sqrt{k}/\varepsilon’\) for obtaining their counts, resulting in standard deviation noise \(\sigma_{\text{Gumb}}’\) and \(\sigma_{\text{Lap}}’\), respectively.<br />
\[
\sigma_{\text{Gumb}}’ = \frac{\pi \sqrt{k} }{\sqrt{6}\varepsilon’}, \qquad
\sigma_{\text{Lap}}’ = \frac{2\sqrt{2k}}{\varepsilon’}.
\]
Recall that adding this scale of Gumbel noise and Laplace noise will ensure \(\tfrac{\varepsilon’^2}{8} \)-CDP each, so combining will ensure \(\tfrac{\varepsilon’^2}{4}\)-CDP. We could also use Gaussian noise to return the counts since we are using CDP, but we will analyze it with Laplace noise for comparison. To ensure \((\varepsilon,\delta)\)-DP, we use the CDP to DP conversion from Lemma 3.5 in <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun, Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016."><strong>[BS16]</strong></a> and solve for \(\varepsilon’\). Hence, we get for any \(\delta>0\)
\[
\varepsilon’^2/4 = \left( \sqrt{\ln(1/\delta) + \varepsilon} - \sqrt{\ln(1/\delta)} \right)^2
\]
\[ \implies \varepsilon’ = 2 \sqrt{\ln(1/\delta)} \left( \sqrt{1 + \tfrac{\varepsilon}{\ln(1/\delta)}} - 1 \right).
\]</p>
<p>Let’s consider a typical privacy setting where \(\varepsilon < \ln(1/\delta)\), and use the inequality \(\sqrt{1+x} \geq 1 + x/4\) for \(0<x<1\). Here is a short proof of this inequality:
\[
(1 + x/4)^2 = 1 + x/2 + x^2/16 \leq 1 + x/2 + x/2 = 1 + x.<br />
\]
Note that the privacy guarantee for one-shot Laplace noise only holds when \(\varepsilon < 0.2\) and \(\delta < 0.05\) as stated in Theorem 2.2 in <a href="https://arxiv.org/abs/2105.08233" title="Gang Qiao, Weijie J. Su, Li Zhang. Oneshot Differentially Private Top-k Selection. ICML 2021."><strong>[QSZ21]</strong></a>. In this case, we have
\[
\varepsilon’ \geq 2 \sqrt{\ln(1/\delta)} \left( 1 + 1/4 \cdot \tfrac{\varepsilon}{\ln(1/\delta)} - 1 \right) = 1/2 \cdot \tfrac{\varepsilon}{\sqrt{\ln(1/\delta)}}.
\]
Plugging \(\varepsilon’\) into the standard deviation of Gumbel and Laplace, we get
\[
\sigma_{\text{Gumb}}’ \leq \frac{2\pi\sqrt{k \ln(1/\delta)}}{\sqrt{6}\cdot \varepsilon}, \qquad \sigma_{\text{Lap}}’ \leq \frac{4\sqrt{2k\ln(1/\delta)}}{\varepsilon}.
\]</p>
<p>Putting this together, we can show that we add significantly less noise for the discovery and releasing noisy count phases,
\[
\sigma_{\text{Gumb}}’\leq \sigma_{\text{Lap}}/4 ,\qquad \sigma_{\text{Lap}}’ \leq \sigma_{\text{Lap}}/2.
\]
Note that these bounds can be improved further with similar analysis.</p>
<p>Although it has not been studied yet whether the permute-and-flip mechanism \(M_{\text{PF}} \) can also ensure DP in one shot by using Exponential noise, we briefly discuss whether it can be bounded range for a similar parameter as the Exponential Mechanism, and hence achieve similar composition bounds. Consider running permute-and-flip on two items \(\{1,2 \}\) with a monotonic quality score \(q: \mathcal{X} \times \{1,2 \} \to \mathbb{R} \) whose sensitivity is 1. Let \(x, x’ \in \mathcal{X} \) be neighbors where
\[
q(x,1) = q(x,2) = 0
\]
\[
q(x’,1) = 0 , \quad q(x’,2) = 1.
\]
Hence, permute-and-flip will return outcome \(1\) or \(2\) with half probability each on dataset \(x\), while with dataset \(x’\) outcome \(1\) occurs with probability \(1/2 \cdot e^{-\varepsilon}\) and outcome \(2\) occurs with probability \( 1/2 + 1/2 \cdot (1 - e^{-\varepsilon})\). We can then compute the bounded range parameter \(\alpha\) as
\[
\frac{\mathbb{P}[M_{\text{PF}}(x’) = 2 ] }{\mathbb{P}[M_{\text{PF}}(x) = 2 ]}\leq e^{\alpha}\frac{\mathbb{P}[M_{\text{PF}}(x’) = 1 ]}{\mathbb{P}[M_{\text{PF}}(x) = 1 ]} \implies \alpha \geq \varepsilon + \ln(2 - e^{-\varepsilon} ).
\]
Note that with \(\varepsilon \gg 1\), we get \(\alpha \) close to \(\varepsilon\), which would be the same bounded range parameter as the exponential mechanism. However, with \(\varepsilon< 1\), we get \(\alpha\) close to \(2\varepsilon\). This example provides a lower bound on the BR parameter for permute-and-flip.</p>
<h2 id="conclusion">Conclusion</h2>
<p>We have looked at the top-\(k\) selection problem subject to differential privacy and although there are many different mechanisms to use, the exponential mechanism stands out for several reasons:</p>
<ol>
<li>The exponential mechanism is \(\varepsilon\)-DP and \( \varepsilon^2/8\)-CDP and hence gets improved composition.</li>
<li>Iteratively applying the exponential mechanism for top-\(k\) can be implemented by adding Gumbel noise to each count and returning the elements with the top-\(k\) noisy counts in one-shot.</li>
<li>The one-shot Gumbel mechanism returns a ranked list of \(k\) elements, rather than a set of \(k\) elements.</li>
</ol>
David DurfeeRyan RogersMon, 09 Aug 2021 10:00:00 -0700
https://differentialprivacy.org/one-shot-top-k/
https://differentialprivacy.org/one-shot-top-k/A Better Privacy Analysis of the Exponential Mechanism<p>A basic and frequent task in data analysis is <em>selection</em> – given a set of options \(\mathcal{Y}\), output the (approximately) best one, where “best” is defined by some loss function \(\ell : \mathcal{Y} \times \mathcal{X}^n \to \mathbb{R}\) and a dataset \(x \in \mathcal{X}^n\). That is, we want to output some \(y \in \mathcal{Y}\) that approximately minimizes \(\ell(y,x)\). Naturally, we are interested in <em>private selection</em> – i.e., the output should be differentially private in terms of the dataset \(x\).
This post discusses algorithms for private selection – in particular, we give an improved privacy analysis of the popular exponential mechanism.</p>
<h2 id="the-exponential-mechanism">The Exponential Mechanism</h2>
<p>The most well-known algorithm for private selection is the <a href="https://en.wikipedia.org/wiki/Exponential_mechanism_(differential_privacy)"><em>exponential mechanism</em></a> <a href="https://doi.org/10.1109/FOCS.2007.66" title="Frank McSherry, Kunal Talwar. Mechanism Design via Differential Privacy. FOCS 2007."><strong>[MT07]</strong></a>. The exponential mechanism \(M : \mathcal{X}^n \to \mathcal{Y} \) is a randomized algorithm given by \[\forall x \in \mathcal{X}^n ~ \forall y \in \mathcal{Y} ~~~~~ \mathbb{P}[M(x) = y] = \frac{\exp(-\frac{\varepsilon}{2\Delta} \ell(y,x))}{\sum_{y’ \in \mathcal{Y}} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x)) }, \tag{1}\] where \(\Delta\) is the sensitivity of the loss function \(\ell\) given by \[\Delta = \sup_{x,x’ \in \mathcal{X}^n : d(x,x’) \le 1} \max_{y\in\mathcal{Y}} |\ell(y,x) - \ell(y,x’)|,\tag{2}\] where the supremum is taken over all datasets \(x\) and \(x’\) differing on the data of a single individual (which we denote by \(d(x,x’)\le 1\)).</p>
<p>In terms of utility, we can easily show that <a href="https://arxiv.org/abs/1511.02513" title="Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman. Algorithmic Stability for Adaptive Data Analysis. STOC 2016."><strong>[BNSSSU16]</strong></a> \[\mathbb{E}[\ell(M(x),x)] \le \min_{y \in \mathcal{Y}} \ell(y,x) + \frac{2\Delta}{\varepsilon} \log |\mathcal{Y}|\] for all \(x \in \mathcal{X}^n\) (and we can also give high probability bounds).</p>
<p>It is easy to show that the exponential mechanism satisfies \(\varepsilon\)-differential privacy.
But there is more to this story! We’re going to look at a more refined privacy analysis.</p>
<h2 id="bounded-range">Bounded Range</h2>
<p>The privacy guarantee of the exponential mechanism is more precisely characterized by <em>bounded range</em>. This was observed and defined by David Durfee and Ryan Rogers <a href="https://arxiv.org/abs/1905.04273" title="David Durfee, Ryan Rogers. Practical Differentially Private Top-k Selection with Pay-what-you-get Composition. NeurIPS 2019"><strong>[DR19]</strong></a> and further analyzed later <a href="https://arxiv.org/abs/1909.13830" title="Jinshuo Dong, David Durfee, Ryan Rogers. Optimal Differential Privacy Composition for Exponential Mechanisms. ICML 2020."><strong>[DDR20]</strong></a>.</p>
<blockquote>
<p><strong>Definition 1 (Bounded Range).</strong><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>
A randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\eta\)-bounded range if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual, there exists some \(t \in \mathbb{R}\) such that \[\forall y \in \mathcal{Y} ~~~~~ \log\left(\frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]}\right) \in [t, t+\eta].\] Here \(t\) may depend on the pair of input datasets \(x,x’\), but not on the output \(y\).</p>
</blockquote>
<p>To interpret this definition, we <a href="/flavoursofdelta/">recall the definition of the privacy loss random variable</a>: Define \(f : \mathcal{Y} \to \mathbb{R}\) by \[f(y) = \log\left(\frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]}\right).\] Then the privacy loss random variable \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) is given by \(Z = f(M(x))\).</p>
<p>Pure \(\varepsilon\)-differential privacy is equivalent to demanding that the privacy loss is bounded by \(\varepsilon\) – i.e., \(\mathbb{P}[|Z|\le\varepsilon]=1\). Approximate \((\varepsilon,\delta)\)-differential privacy is, roughly, equivalent to demanding that \(\mathbb{P}[Z\le\varepsilon]\ge1-\delta\).<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>Now \(\eta\)-bounded range is simply demanding that the privacy loss \(Z\) is supported on some interval of length \(\eta\). This interval \([t,t+\eta]\) may depend on the pair \(x,x’\).</p>
<p>Bounded range and pure differential privacy are equivalent up to a factor of 2 in the parameters:</p>
<blockquote>
<p><strong>Lemma 2 (Bounded Range versus Pure Differential Privacy).</strong></p>
<ul>
<li>\(\varepsilon\)-differential privacy implies \(\eta\)-bounded range with \(\eta \le 2\varepsilon\).</li>
<li>\(\eta\)-bounded range implies \(\varepsilon\)-differential privacy with \(\varepsilon \le \eta\).</li>
</ul>
</blockquote>
<p><em>Proof.</em> The first part of the equivalence follows from the fact that pure \(\varepsilon\)-differential privacy implies the privacy loss is supported on the interval \([-\varepsilon,\varepsilon]\). Thus, if we set \(t=-\varepsilon\) and \(\eta=2\varepsilon\), then \([t,t+\eta] = [-\varepsilon,\varepsilon]\).
The second part follows from the fact that the support of the privacy loss \([t,t+\eta]\) must straddle \(0\). That is, the privacy loss cannot be always positive nor always negative, so \(0 \in [t,t+\eta]\) and, hence, \([t,t+\eta] \subseteq [-\eta,\eta]\). Otherwise \(\forall y ~ f(y)>0\) or \(\forall y ~ f(y)<0\) would imply \(\forall y ~ \mathbb{P}[M(x)=y]>\mathbb{P}[M(x’)=y]\) or \(\forall y ~ \mathbb{P}[M(x)=y]<\mathbb{P}[M(x’)=y]\), contradicting the fact that \(\sum_{y \in \mathcal{Y}} \mathbb{P}[M(x)=y] = 1\) and \(\sum_{y \in \mathcal{Y}} \mathbb{P}[M(x’)=y] = 1\). ∎</p>
<p>OK, back to the exponential mechanism:</p>
<blockquote>
<p><strong>Lemma 3 (The Exponential Mechanism is Bounded Range).</strong>
The exponential mechanism (given in Equation 1 above) satisfies \(\varepsilon\)-bounded range .<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
</blockquote>
<p><em>Proof.</em>
We have \[e^{f(y)} = \frac{\mathbb{P}[M(x)=y]}{\mathbb{P}[M(x’)=y]} = \frac{\exp(-\frac{\varepsilon}{2\Delta}\ell(y,x))}{\exp(-\frac{\varepsilon}{2\Delta}\ell(y,x’))} \cdot \frac{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x’))}{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x))}.\]
Setting \(t = \log\left(\frac{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x’))}{\sum_{y’} \exp(-\frac{\varepsilon}{2\Delta} \ell(y’,x))}\right) - \frac{\varepsilon}{2}\), we have \[ f(y) = \frac{\varepsilon}{2\Delta} (\ell(y,x’)-\ell(y,x)+\Delta) + t.\]
By the definition of sensitivity (given in Equation 2), we have \( 0 \le \ell(y,x’)-\ell(y,x)+\Delta \le 2\Delta\), whence \(t \le f(y) \le t + \varepsilon\). ∎</p>
<p>Bounded range is not really a useful privacy definition on its own. Thus we’re going to relate it to a relaxed version of differential privacy next.</p>
<h2 id="concentrated-differential-privacy">Concentrated Differential Privacy</h2>
<p>Concentrated differential privacy <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun, Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016."><strong>[BS16]</strong></a> and its variants <a href="https://arxiv.org/abs/1603.01887" title="Cynthia Dwork, Guy N. Rothblum. Concentrated Differential Privacy. 2016."><strong>[DR16]</strong></a> <a href="https://arxiv.org/abs/1702.07476" title="Ilya Mironov. Rényi Differential Privacy. CCS 2017."><strong>[M17]</strong></a> are relaxations of pure differential privacy with many nice properties. In particular, it composes very cleanly.</p>
<blockquote>
<p><strong>Definition 4 (Concentrated Differential Privacy).</strong>
A randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\rho\)-concentrated differential privacy if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual,
\[\forall \lambda > 0 ~~~~~ \mathbb{E}[\exp( \lambda Z)] \le \exp(\lambda(\lambda+1)\rho),\tag{3}\]
where \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) is the privacy loss random variable.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
</blockquote>
<p>Intuitively, concentrated differential privacy requires that the privacy loss is subgaussian. Specifically, the bound on the moment generating function of \(\rho\)-concentrated differential privacy is tight if the privacy loss \(Z\) follows the distribution \(\mathcal{N}(\rho,2\rho)\). Indeed, the privacy loss random variable of the Gaussian mechanism has such a distribution.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></p>
<p>OK, back to the exponential mechanism:
We know that \(\varepsilon\)-differential privacy implies \(\frac12 \varepsilon^2\)-concentrated differential privacy <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun, Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016."><strong>[BS16]</strong></a>.
This, of course, applies to the exponential mechaism. A cool fact – that we want to draw more attention to – is that we can do better!
Specifically, \(\eta\)-bounded range implies \(\frac18 \eta^2\)-concentrated differential privacy <a href="https://arxiv.org/abs/2004.07223" title="Mark Cesar, Ryan Rogers. Bounding, Concentrating, and Truncating: Unifying Privacy Loss Composition for Data Analytics. ALT 2021."><strong>[CR21]</strong></a>.
What follows is a proof of this fact following that of Mark Cesar and Ryan Rogers, but with some simplification.</p>
<blockquote>
<p><strong>Theorem 5 (Bounded Range implies Concentrated Differential Privacy).</strong>
If \(M\) is \(\eta\)-bounded range, then it is \(\frac18\eta^2\)-concentrated differentially private.</p>
</blockquote>
<p><em>Proof.</em>
Fix datasets \(x,x’ \in \mathcal{X}^n\) differing on a single individual’s data.
Let \(Z \gets \mathsf{PrivLoss}(M(x)\|M(x’))\) be the privacy loss random variable of the mechanism \(M\) on this pair of datasets.
By the definition of bounded range (Definition 1), there exists some \(t \in \mathbb{R}\) such that \(Z \in [t, t+\eta]\) with probability 1.
Now we employ <a href="https://en.wikipedia.org/wiki/Hoeffding%27s_lemma">Hoeffding’s Lemma</a> <a href="https://doi.org/10.1080%2F01621459.1963.10500830" title="Wassily Hoeffding. Probability inequalities for sums of bounded random variables. JASA 1963."><strong>[H63]</strong></a>:</p>
<blockquote>
<p><strong>Lemma 6 (Hoeffding’s Lemma).</strong>
Let \(X\) be a random variable supported on the interval \([a,b]\). Then, for all \(\lambda \in \mathbb{R}\), we have \[\mathbb{E}[\exp(\lambda X)] \le \exp \left( \mathbb{E}[X] \cdot \lambda + \frac{(b-a)^2}{8} \cdot \lambda^2 \right).\]</p>
</blockquote>
<p>Applying the lemma to the privacy loss gives \[\forall \lambda \in \mathbb{R} ~~~~~ \mathbb{E}[\exp(\lambda Z)] \le \exp \left( \mathbb{E}[Z] \cdot \lambda + \frac{\eta^2}{8} \cdot \lambda^2 \right).\]
The only remaining thing we need is to show is that \(\mathbb{E}[Z] \le \frac18 \eta^2\).<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></p>
<p>If we set \(\lambda = -1 \), then we get \( \mathbb{E}[\exp( - Z)] \le \exp \left( -\mathbb{E}[Z] + \frac{\eta^2}{8} \right)\), which rearranges to \(\mathbb{E}[Z] \le \frac18 \eta^2 - \log \mathbb{E}[\exp( - Z)]\).
Now we have \[ \mathbb{E}[\exp( - Z)] \!=\! \sum_y \mathbb{P}[M(x)\!=\!y] \exp(-f(y)) \!=\! \sum_y \mathbb{P}[M(x)\!=\!y] \!\cdot\! \frac{\mathbb{P}[M(x’)\!=\!y]}{\mathbb{P}[M(x)\!=\!y]} \!=\! 1.\]
∎</p>
<p>This brings us to the TL;DR of this post:</p>
<blockquote>
<p><strong>Corollary 7.</strong> The exponential mechanism (given by Equation 1) is \(\frac18 \varepsilon^2\)-concentrated differentially private.</p>
</blockquote>
<p>This is great news. The standard analysis only gives \(\frac12 \varepsilon^2\)-concentrated differential privacy. Constants matter when applying differential privacy, and we save a factor of 4 in the concentrated differential privacy analysis of the exponential mechanism for free with this improved analysis.</p>
<p>Combining Lemma 2 with Theorem 5 also gives a simpler proof of the conversion from pure differential privacy to concentrated differential privacy <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun, Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016."><strong>[BS16]</strong></a>:</p>
<blockquote>
<p><strong>Corollary 8.</strong> \(\varepsilon\)-differential privacy implies \(\frac12 \varepsilon^2\)-concentrated differential privacy.</p>
</blockquote>
<h2 id="beyond-the-exponential-mechanism">Beyond the Exponential Mechanism</h2>
<p>The exponential mechanism is not the only algorithm for private selection. A closely-related algorithm is <em>report noisy max/min</em>:<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> Draw independent noise \(\xi_y\) from some distribution for each \(y \in \mathcal{Y}\) then output \[M(x) = \underset{y \in \mathcal{Y}}{\mathrm{argmin}} ~ \ell(y,x) - \xi_y.\]</p>
<p>If the noise distribution is an appropriate <a href="https://en.wikipedia.org/wiki/Gumbel_distribution">Gumbel distribution</a>, then report noisy max is exactly the exponential mechanism. (This equivalence is known as the “Gumbel max trick.”)</p>
<p>We can also use the Laplace distribution or the exponential distribution. Report noisy max with the exponential distribution is equivalent to the <em>permute and flip</em> algorithm <a href="https://arxiv.org/abs/2010.12603" title="Ryan McKenna, Daniel Sheldon. Permute-and-Flip: A new mechanism for differentially private selection
. NeurIPS 2020."><strong>[MS20]</strong></a> <a href="https://arxiv.org/abs/2105.07260" title="Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., Thomas Steinke, Yuxin Wang, Yingtai Xiao, Danfeng Zhang. The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise. 2021."><strong>[DKSSWXZ21]</strong></a>. However, these algorithms don’t enjoy the same improved bounded range and concentrated differential privacy guarantees as the exponential mechanism.</p>
<p>There are also other variants of the selection problem. For example, in some cases we can assume that only a few options have low loss and the rest of the options have high loss – i.e., there is a gap between the minimum loss and the second-lowest loss (or, more generally, the \(k\)-th lowest loss). In this case there are algorithms that attain better accuracy than the exponential mechanism under relaxed privacy definitions <a href="https://arxiv.org/abs/1409.2177" title="Kamalika Chaudhuri, Daniel Hsu, Shuang Song. The Large Margin Mechanism for Differentially Private Maximization. NIPS 2014."><strong>[CHS14]</strong></a> <a href="https://dl.acm.org/doi/10.1145/3188745.3188946" title=" Mark Bun, Cynthia Dwork, Guy N. Rothblum, Thomas Steinke. Composable and versatile privacy via truncated CDP. STOC 2018."><strong>[BDRS18]</strong></a> <a href="https://arxiv.org/abs/1905.13229" title="Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu. Private Hypothesis Selection. NeurIPS 2019."><strong>[BKSW19]</strong></a>.</p>
<p>There are a lot of interesting aspects of private selection, including questions for further research! We hope to have further posts about some of these topics.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>For simplicity, we restrict our discussion here to finite sets of outputs, although the definitions, algorithms, and results can be extended to infinite sets. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>To be more precise, \((\varepsilon,\delta)\)-differential privacy is equivalent to demanding that \(\mathbb{E}[\max\{0,1-\exp(\varepsilon-Z)\}]\le\delta\) <a href="https://arxiv.org/abs/2004.00010" title="Clément L. Canonne, Gautam Kamath, Thomas Steinke. The Discrete Gaussian for Differential Privacy. NeurIPS 2020."><strong>[CKS20]</strong></a>. (To be completely precise, we must appropriately deal with the \(Z=\infty\) case, which we ignore in this discussion for simplicity.) <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This proof actually gives <a href="https://dongjs.github.io/2020/02/10/ExpMech.html">a slightly stronger result</a>: We can replace the sensitivity \(\Delta\) (defined in Equation 2) by half the range \[\hat\Delta = \frac12 \sup_{x,x’ \in \mathcal{X}^n : d(x,x’) \le 1} \left( \max_{\overline{y}\in\mathcal{Y}} \ell(\overline{y},x) - \ell(\overline{y},x’) - \min_{\underline{y}\in\mathcal{Y}} \ell(\underline{y},x) - \ell(\underline{y},x’) \right).\] We always have \(\hat\Delta \le \Delta\) but it is possible that \(\hat\Delta < \Delta\) and the privacy analysis of the exponential mechanism still works if we replace \(\Delta\) by \(\hat\Delta\). <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Equivalently, a randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) satisfies \(\rho\)-concentrated differential privacy if, for all pairs of inputs \(x, x’ \in \mathcal{X}^n\) differing only on the data of a single individual, \[\forall \lambda > 0 ~~~~~ \mathrm{D}_{\lambda+1}(M(x)\|M(x’)) \le \lambda(\lambda+1)\rho,\] where \(\mathrm{D}_{\lambda+1}(M(x)\|M(x’)))\) is the order \(\lambda+1\) Rényi divergence of \(M(x)\) from \(M(x’)\). <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>To be precise, if \(M(x) = q(x) + \mathcal{N}(0,\sigma^2I)\), then \(M : \mathcal{X}^n \to \mathbb{R}^d\) satisfies \(\frac{\Delta_2^2}{2\sigma^2}\)-concentrated differential privacy, where \(\Delta_2 = \sup_{x,x’\in\mathcal{X}^n : d(x,x’)\le1} \|q(x)-q(x’)\|_2\) is the 2-norm sensitivity of \(q:\mathcal{X}^n \to \mathbb{R}^d\). Furthermore, the privacy loss of the Gaussian mechanism is itself a Gaussian and it makes the inequality defining concentrated differential privacy (Equation 3) an equality for all \(\lambda\) <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Note that the expectation of the privacy loss is simply the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">KL divergence</a>: \(\mathbb{E}[Z] = \mathrm{D}_1( M(x) \| M(x’) )\). <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>We have presented selection here in terms of minimization, but most of the literature is in terms of maximization. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Ryan RogersThomas SteinkeMon, 12 Jul 2021 10:00:00 -0700
https://differentialprivacy.org/exponential-mechanism-bounded-range/
https://differentialprivacy.org/exponential-mechanism-bounded-range/Open Problem - Optimal Query Release for Pure Differential Privacy<p>Releasing large sets of statistical queries is a centerpiece of the theory of differential privacy. Here, we are given a <em>dataset</em> \(x = (x_1,\dots,x_n) \in [T]^n\), and a set of <em>statistical queries</em> \(f_1,\dots,f_k\), where each query is defined by some bounded function \(f_j : [T] \to [-1,1]\), and (abusing notation) is defined as
\[
f_j(x) = \frac{1}{n} \sum_{i=1}^{n} f_j(x_i).
\]
We use \(f(x) = (f_1(x),\dots,f_k(x))\) to denote the vector consisting of the true answers to all these queries.
Our goal is to design an \((\varepsilon, \delta)\)-differentially private algorithm \(M\) that takes a dataset \(x\in [T]^n\) and outputs a random vector \(M(x)\in \mathbb{R}^k\) such that \(\| M(x) - f(x) \|\) is small in expectation for some norm \(\|\cdot\|\). Usually algorithms for this problem also give high probability bounds on the error, but we focus on expected error for simplicity.</p>
<p>This problem has been studied for both <em>pure differential privacy</em> (\(\delta = 0\)) and <em>appproximate differential privacy</em> (\(\delta > 0\)), and for both \(\ell_\infty\)-error
\[
\mathbb{E}( \| M(x) - f(x)\|_{\infty} ) \leq \alpha,
\]
and \(\ell_2\)-error
\[
\mathbb{E}( \| M(x) - f(x)\|_{2} ) \leq \alpha k^{1/2},
\]
giving four variants of the problem. By now we know tight worst-case upper and lower bounds for two of these variants, and nearly tight bounds (up to logarithmic factors) for a third. The tightest known upper bounds are given in the following table.</p>
<table>
<tbody>
<tr>
<td> </td>
<td>Pure DP</td>
<td>Approx DP</td>
</tr>
<tr>
<td>\( \ell_2 \)<br />error</td>
<td>\( \alpha \lesssim \left(\frac{\log^2 k ~\cdot~ \log^{3/2}T}{\varepsilon n} \right)^{1/2} \) <br /> [<a href="https://arxiv.org/abs/1212.0297">NTZ13</a>]</td>
<td>\( \alpha \lesssim \left(\frac{\log^{1/2} T}{\varepsilon n} \right)^{1/2} \) <br /> [<a href="https://guyrothblum.files.wordpress.com/2014/11/drv10.pdf">DRV10</a>]</td>
</tr>
<tr>
<td>\( \ell_\infty \)<br />error</td>
<td>\( \alpha \lesssim \left(\frac{\log k ~\cdot~ \log T}{\varepsilon n} \right)^{1/3} \) <br /> [<a href="https://arxiv.org/abs/1109.2229">BLR13</a>]</td>
<td>\( \alpha \lesssim \left(\frac{\log k ~\cdot~ \log^{1/2} T}{\varepsilon n} \right)^{1/2} \) <br /> [<a href="https://guyrothblum.files.wordpress.com/2014/11/hr10.pdf">HR10</a>, <a href="https://arxiv.org/abs/1107.3731">GRU12</a>]</td>
</tr>
</tbody>
</table>
<p>The bounds for approximate DP are known to be tight [<a href="https://arxiv.org/abs/1311.3158">BUV14</a>]. Our two open problems both involve improving the best known upper bounds for pure differential privacy.</p>
<blockquote>
<p><b>Open Problem 1:</b> What is the best possible \(\ell_\infty\)-error for answering a worst-case set of \(k\) statistical queries over a domain of size \(T\) subject to \((\varepsilon,0)\)-differential privacy?</p>
</blockquote>
<p>We conjecture that the known upper bound in the table can be improved to
\[
\alpha = \left(\frac{\log k \cdot \log T}{\varepsilon n} \right)^{1/2},
\]
which is known to be the best possible [<a href="https://dataspace.princeton.edu/handle/88435/dsp01vq27zn422">Har11</a>, Theorem 4.5.1].</p>
<blockquote>
<p><b>Open Problem 2:</b> What is the best possible \(\ell_2\)-error for answering a worst-case set of \(k\) statistical queries over a domain of size \(T\) subject to \((\varepsilon,0)\)-differential privacy?</p>
</blockquote>
<p>We conjecture that the upper bound can be improved to
\[
\alpha = \left(\frac{\log T}{\varepsilon n} \right)^{1/2}.
\]
The construction used in [<a href="https://dataspace.princeton.edu/handle/88435/dsp01vq27zn422">Har11</a>, Theorem 4.5.1] can be analyzed to show this bound would be tight. Note, in particular, that this conjecture implies that the tight upper bound has no dependence on the number of queries, similarly to the case of \(\ell_2\) error and approximate DP.</p>
Sasho NikolovJonathan UllmanWed, 07 Jul 2021 13:45:00 -0400
https://differentialprivacy.org/open-problem-optimal-query-release/
https://differentialprivacy.org/open-problem-optimal-query-release/Conference Digest - ICML 2021<p><a href="https://icml.cc/Conferences/2021">ICML 2021</a>, one of the biggest conferences in machine learning, naturally has a ton of interesting sounding papers on the topic of differential privacy.
We went through this year’s <a href="https://icml.cc/Conferences/2021/AcceptedPapersInitial">accepted papers</a> and aggregated all the relevant papers we could find.
In addition, this year features three workshops on the topic of privacy, as well as a tutorial.
As always, please inform us if we overlooked any papers on differential privacy.</p>
<h2 id="workshops">Workshops</h2>
<ul>
<li>
<p><a href="http://federated-learning.org/fl-icml-2021/">Federated Learning for User Privacy and Data Confidentiality</a></p>
</li>
<li>
<p><a href="https://sites.google.com/view/ml4data">Machine Learning for Data: Automated Creation, Privacy, Bias</a></p>
</li>
<li>
<p><a href="https://tpdp.journalprivacyconfidentiality.org/2021/">Theory and Practice of Differential Privacy</a></p>
</li>
</ul>
<h2 id="tutorial">Tutorial</h2>
<ul>
<li><a href="https://icml.cc/Conferences/2021/Schedule?showEvent=10839">Privacy in Learning: Basics and the Interplay</a><br />
<a href="https://www.microsoft.com/en-us/research/people/huzhang/">Huishuai Zhang</a>, <a href="https://www.microsoft.com/en-us/research/people/weic/">Wei Chen</a></li>
</ul>
<h2 id="papers">Papers</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2009.02668">A Framework for Private Matrix Analysis in Sliding Window Model</a><br />
<a href="https://sites.google.com/view/jalajupadhyay/home">Jalaj Upadhyay</a>, <a href="https://www.fujitsu.com/us/about/businesspolicy/tech/rd/research-staff/sarvagya.html">Sarvagya Upadhyay</a></p>
</li>
<li>
<p>Accuracy, Interpretability, and Differential Privacy via Explainable Boosting<br />
<a href="https://scholar.google.com/citations?user=HmxjgMAAAAAJ">Harsha Nori</a>, <a href="https://www.microsoft.com/en-us/research/people/rcaruana/">Rich Caruana</a>, <a href="https://sites.google.com/view/zhiqi-bu">Zhiqi Bu</a>, <a href="https://heyyjudes.github.io/">Judy Hanwen Shen</a>, <a href="https://www.microsoft.com/en-us/research/people/jakul/">Janardhan Kulkarni</a></p>
</li>
<li>
<p>Differentially Private Aggregation in the Shuffle Model: Almost Central Accuracy in Almost a Single Message<br />
<a href="https://sites.google.com/view/badihghazi/home">Badih Ghazi</a>, <a href="https://sites.google.com/site/ravik53/">Ravi Kumar</a>, <a href="https://pasin30055.github.io/">Pasin Manurangsi</a>, <a href="https://rasmuspagh.net/">Rasmus Pagh</a>, <a href="https://www.linkedin.com/in/amersinha/">Amer Sinha</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2011.00467">Differentially Private Bayesian Inference for Generalized Linear Models</a><br />
<a href="https://warwick.ac.uk/fac/sci/dcs/people/u1554597">Tejas Kulkarni</a>, <a href="https://users.aalto.fi/~jalkoj1/">Joonas Jälkö</a>, <a href="https://scholar.google.com/citations?user=Y_EvCPAAAAAJ">Antti Koskela</a>, <a href="https://people.aalto.fi/samuel.kaski">Samuel Kaski</a>, <a href="https://www.cs.helsinki.fi/u/ahonkela/">Antti Honkela</a></p>
</li>
<li>
<p>Differentially-Private Clustering of Easy Instances<br />
<a href="http://www.cohenwang.com/edith/">Edith Cohen</a>, <a href="http://www.cs.tau.ac.il/~haimk/">Haim Kaplan</a>, <a href="https://www.tau.ac.il/~mansour/">Yishay Mansour</a>, <a href="https://www.uri.co.il/">Uri Stemmer</a>, <a href="https://www.linkedin.com/in/eliad-tsfadia-21482b96/">Eliad Tsfadia</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2102.08885">Differentially Private Correlation Clustering</a><br />
<a href="https://cs-people.bu.edu/mbun/">Mark Bun</a>, <a href="https://elias.ba30.eu/">Marek Elias</a>, <a href="https://www.microsoft.com/en-us/research/people/jakul/">Janardhan Kulkarni</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2105.13287">Differentially Private Densest Subgraph Detection</a><br />
<a href="https://biocomplexity.virginia.edu/person/dung-nguyen">Dung Nguyen</a>, <a href="https://engineering.virginia.edu/faculty/anil-vullikanti">Anil Vullikanti</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2102.08244">Differentially Private Quantiles</a><br />
<a href="http://jgillenw.com/">Jennifer Gillenwater</a>, <a href="https://www.majos.net/">Matthew Joseph</a>, <a href="https://www.alexkulesza.com/">Alex Kulesza</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2103.06641">Differentially Private Query Release Through Adaptive Projection</a><br />
<a href="https://sergulaydore.github.io/">Sergul Aydore</a>, <a href="https://wibrown.github.io/">William Brown</a>, <a href="https://www.cis.upenn.edu/~mkearns/">Michael Kearns</a>, <a href="http://www-cs-students.stanford.edu/~kngk/">Krishnaram Kenthapadi</a>, <a href="https://www.lucamel.is/">Luca Melis</a>, <a href="https://www.cis.upenn.edu/~aaroth/">Aaron Roth</a>, <a href="https://ankitsiva.xyz/">Ankit Siva</a></p>
</li>
<li>
<p>Differentially Private Sliced Wasserstein Distance<br />
<a href="http://asi.insa-rouen.fr/enseignants/~arakoto/">Alain Rakotomamonjy</a>, <a href="https://pageperso.lif.univ-mrs.fr/~liva.ralaivola/doku.php">Liva Ralaivola</a></p>
</li>
<li>
<p>Large Scale Private Learning via Low-rank Reparametrization<br />
<a href="https://scholar.google.com/citations?user=FcRGdiwAAAAJ">Da Yu</a>, <a href="https://www.microsoft.com/en-us/research/people/huzhang/">Huishuai Zhang</a>, <a href="https://www.microsoft.com/en-us/research/people/weic/">Wei Chen</a>, Jian Yin, <a href="https://www.microsoft.com/en-us/research/people/tyliu/">Tie-Yan Liu</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2102.08598">Leveraging Public Data for Practical Private Query Release</a><br />
<a href="https://www.linkedin.com/in/terrance-liu-26796974/">Terrance Liu</a>, <a href="https://sites.google.com/umn.edu/giuseppe-vietri/home">Giuseppe Vietri</a>, <a href="http://www.thomas-steinke.net/">Thomas Steinke</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a>, <a href="https://zstevenwu.com/">Steven Wu</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2104.09734">Locally Private k-Means in One Round</a><br />
Alisa Chang, <a href="https://sites.google.com/view/badihghazi/home">Badih Ghazi</a>, <a href="https://sites.google.com/site/ravik53/">Ravi Kumar</a>, <a href="https://pasin30055.github.io/">Pasin Manurangsi</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2102.12099">Lossless Compression of Efficient Private Local Randomizers</a><br />
<a href="http://vtaly.net/">Vitaly Feldman</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2105.08233">Oneshot Differentially Private Top-k Selection</a><br />
<a href="https://lsa.umich.edu/stats/people/phd-students/qiaogang.html">Gang Qiao</a>, <a href="http://www-stat.wharton.upenn.edu/~suw/">Weijie Su</a>, <a href="https://research.google/people/LiZhang/">Li Zhang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.12321">PAPRIKA: Private Online False Discovery Rate Control</a><br />
<a href="https://wanrongz.github.io/">Wanrong Zhang</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="https://sites.gatech.edu/rachel-cummings/">Rachel Cummings</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2103.00039">Practical and Private (Deep) Learning without Sampling or Shuffling</a><br />
<a href="https://kairouzp.github.io/">Peter Kairouz</a>, <a href="https://research.google/people/author35837/">Brendan McMahan</a>, <a href="https://shs037.github.io/">Shuang Song</a>, <a href="http://www.omthakkar.com/">Om Thakkar</a>, <a href="https://athakurta.squarespace.com/">Abhradeep Thakurta</a>, <a href="https://research.google/people/106689/">Zheng Xu</a></p>
</li>
<li>
<p>Private Adaptive Gradient Methods for Convex Optimization<br />
<a href="http://web.stanford.edu/~asi/">Hilal Asi</a>, <a href="https://web.stanford.edu/~jduchi/">John Duchi</a>, <a href="https://afallah.lids.mit.edu/">Alireza Fallah</a>, <a href="https://scholar.google.com/citations?user=_JXjrEp9FhYC">Omid Javidbakht</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p>Private Alternating Least Squares: (Nearly) Optimal Privacy/Utility Trade-off for Matrix Completion<br />
Steve Chien, <a href="https://www.prateekjain.org/">Prateek Jain</a>, <a href="http://walid.krichene.net/">Walid Krichene</a>, <a href="https://scholar.google.com/citations?user=yR-ugIoAAAAJ">Steffen Rendle</a>, <a href="https://shs037.github.io/">Shuang Song</a>, <a href="https://athakurta.squarespace.com/">Abhradeep Thakurta</a>, <a href="https://research.google/people/LiZhang/">Li Zhang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2103.01516">Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry</a><br />
<a href="http://web.stanford.edu/~asi/">Hilal Asi</a>, <a href="http://vtaly.net/">Vitaly Feldman</a>, <a href="https://tomerkoren.github.io/">Tomer Koren</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2102.06387">The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation</a><br />
<a href="https://kairouzp.github.io/">Peter Kairouz</a>, <a href="https://kenziyuliu.github.io/">Ziyu Liu</a>, <a href="http://www.thomas-steinke.net/">Thomas Steinke</a></p>
</li>
</ul>
Gautam KamathMon, 07 Jun 2021 12:30:00 -0400
https://differentialprivacy.org/icml2021/
https://differentialprivacy.org/icml2021/Statistical Inference is Not a Privacy Violation<p>On April 28, 2021, the US Census Bureau <a href="https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/process/disclosure-avoidance/2020-das-updates.html">released</a> a new demonstration of its differentially private Disclosure Avoidance System (DAS) for the 2020 US Census. The public were given a month to submit feedback before the system is finalized.
This demonstration data and the feedback has generated a lot of discussion, including media coverage on <a href="https://www.npr.org/2021/05/19/993247101/for-the-u-s-census-keeping-your-data-anonymous-and-useful-is-a-tricky-balance">National Public Radio</a>, in <a href="https://www.washingtonpost.com/local/social-issues/2020-census-differential-privacy-ipums/2021/06/01/6c94b46e-c30d-11eb-93f5-ee9558eecf4b_story.html">the Washington Post</a>, and via <a href="https://apnews.com/article/business-census-2020-technology-e701e313e841674be6396321343b7e49">the Associated Press</a>. The DAS is also the subject of an <a href="https://www.courtlistener.com/docket/59728874/state-v-united-states-department-of-commerce/">ongoing lawsuit</a>.</p>
<p>The following is a response from experts on differential privacy and cryptography to the <a href="https://alarm-redist.github.io/posts/2021-05-28-census-das/Harvard-DAS-Evaluation.pdf">working paper of Kenny et al.</a> on the impact of the 2020 U.S. Census Disclosure Avoidance System (DAS) on redistricting.</p>
<p>This paper makes a <a href="https://github.com/frankmcsherry/blog/blob/master/posts/2016-06-14.md">common but serious mistake</a>, from which the authors wrongfully conclude the Census Bureau should not modernize its privacy-protection technology. Not only do the results not support this conclusion, but they instead show the power of the methodology, known as differential privacy, adopted by the Bureau, precisely the opposite of the authors’ erroneous conclusions.</p>
<p>Trust is essential; once destroyed it can be nearly impossible to rebuild, and getting privacy wrong in this Census will have an impact on all future government surveys. The Census Bureau has shown that their <a href="https://desfontain.es/privacy/index.html">2010 (DAS) does not survive modern privacy threats</a>, and in fact was roughly equivalent to publishing nearly three quarters of the responses. The Census Bureau’s decision to modernize its Disclosure Avoidance System (DAS) for the 2020 Decennial Census to be differentially private is the correct response to decades of theoretical and empirical work on the privacy risks inherent in releasing large numbers of statistics derived from a dataset.</p>
<p>The importance of the Census, and the reality that no technology competing with differential privacy exists for meeting their confidentiality obligations, makes it very important that the public and policy makers have accurate information. We imagine you will be reporting on this topic in the future. Others have <a href="https://gerrymander.princeton.edu/DAS-evaluation-Kenny-response">addressed flaws</a> in the paper regarding implications for redistricting; we want to provide you with an understanding of the privacy mistake in the study.</p>
<p>To understand the flaw in the paper’s argument, consider the role of smoking in determining cancer risk. Statistical study of medical data has taught us that smoking causes cancer. Armed with this knowledge, if we are told that 40 year old Mr. S is a smoker, we can conclude that he has an elevated cancer risk. The statistical inference of elevated cancer risk—made before Mr. S was born—did not violate Mr. S’s privacy. To conclude otherwise is to define science to be a privacy attack. This is the mistake made in the paper.</p>
<p>This is basically what Kenny et al. found.</p>
<p>The authors looked at three different predictors: one built directly from (swapped) 2010 Census data and the other two built using differential privacy applied to (swapped) 2010 Census data, and evaluated all three “on approximately 5.8 million registered voters included in the North Carolina February 2021 voter file.” What did they find?</p>
<blockquote>
<p>“Our analysis shows that across three main racial and ethnic groups, the predictions based on the [differential privacy based] DAS data appear to be as accurate as those based on the 2010 Census data.”</p>
</blockquote>
<p>This makes perfect sense. Bayesian Improved Surname Geocoding, or BISG, is a statistical method of building a predictor inferring ethnicity (or race) from name and geography. Here, name and geography play the role of the information as to whether or not one smokes, and the prediction of ethnicity corresponds to the cancer risk prediction. The predictor is constructed from census data on the ethnic makeup of individual census blocks and statistical information about the popularity of individual surnames within different ethnic groups. With such a predictor, moving across the country can change the outcome, as can changing one’s name. But a BISG prediction is not about the individual, it is about the statistical—population-level—relationship between name, geography, and ethnicity.</p>
<p>The differentially private DAS enabled learning to make statistical inferences about ethnicity from name and geography, without compromising the privacy of any Census respondent, exactly as it was intended to do. In other words, the paper establishes fitness-for-use of the DAS data for the BISG statistical method! Because differential privacy permits learning statistical patterns without compromising the privacy of individual members of the dataset, it should not interfere with learning the predictor, which is exactly what the authors found. Returning to our “smoking causes cancer” example, the researchers found that it was just as easy to detect this statistical pattern with a modern disclosure avoidance system in place as it was with the older, less protective system.</p>
<p>The authors’ conclusions –“ the DAS data may not provide universal privacy protection” – are simply not supported by their findings.</p>
<p>They have confused learning that smoking causes cancer—and applying this predictor to an individual smoker—with learning medical details of individual patients in the dataset. Change the input to the predictor—replace “smoker” with “non-smoker” or move across the country, for example—and the prediction changes.</p>
<p>The BISG prediction is not about the individual, it does not accompany her as she relocates from one neighborhood to another, it is a statistical relationship between name, geography, and ethnicity. It is not a privacy compromise, it is science.</p>
<p>Signed:</p>
<ul>
<li>Mark Bun, Assistant Professor of Computer Science, Boston University</li>
<li>Damien Desfontaines, Privacy Engineer, Google</li>
<li>Cynthia Dwork, Professor of Computer Science, Harvard University</li>
<li>Moni Naor, Professor of Computer Science, The Weizmann Institute of Science</li>
<li>Kobbi Nissim, Professor of Computer Science, Georgetown University</li>
<li>Aaron Roth, Professor of Computer and Information Science, University of Pennsylvania</li>
<li>Adam Smith, Professor of Computer Science, Boston University</li>
<li>Thomas Steinke, Research Scientist, Google</li>
<li>Jonathan Ullman, Assistant Professor of Computer Science, Northeastern University</li>
<li>Salil Vadhan, Professor of Computer Science and Applied Mathematics, Harvard University</li>
</ul>
<p>Please contact Cynthia Dwork for contact information for authors happy to speak about this on the record.</p>
Jonathan UllmanThu, 03 Jun 2021 18:30:00 -0500
https://differentialprivacy.org/inference-is-not-a-privacy-violation/
https://differentialprivacy.org/inference-is-not-a-privacy-violation/Call for Papers - Workshop on the Theory and Practice of Differential Privacy (TPDP 2021)<p>Work on differential privacy spans a number of different research communities, including theoretical computer science, machine learning, statistics, security, law, databases, cryptography, programming languages, social sciences, and more.
Each of these communities may choose to publish their work in their own community’s venues, which could result in small groups of differential privacy researchers becoming isolated.
To alleviate these issues, we have the Workshop on the <a href="https://tpdp.journalprivacyconfidentiality.org/">Theory and Practice of Differential Privacy</a> (TPDP), which is intended to bring these subcommunities together under one roof (well, a virtual one at least for 2020 and 2021).</p>
<p>We have just posted the <a href="https://tpdp.journalprivacyconfidentiality.org/2021/TPDP2021CfP.pdf">Call for Papers</a> for <a href="https://tpdp.journalprivacyconfidentiality.org/2021/">TPDP 2021</a>, which will be a workshop affiliated with <a href="https://icml.cc/Conferences/2021/">ICML 2021</a>.
The submission deadline is Friday, May 28, 2021, Anywhere on Earth (conveniently, two days after the deadline for NeurIPS 2021).
Submissions are extended abstracts of up to four pages in length, and will undergo a lightweight review process, based mostly on relevance and interest to the differential privacy community.
The workshop is non-archival, so feel free to submit recent work at any stage of publication.
Submissions will be on <a href="https://openreview.net/group?id=ICML.cc/2021/Workshop/TPDP">OpenReview</a>, but since submitted work may be preliminary, the process will be “closed” similar to traditional review processes.
One goal of the workshop is to be inclusive and welcoming to newcomers to the differential privacy community, so please consider participating even if you are new to the field.</p>
<p>Most papers will be presented as posters at a (virtual) poster session, while a few papers will be selected for spotlight talks.
There will also be plenary talks by <a href="https://www.cs.huji.ac.il/~katrina/">Katrina Ligett</a> (Hebrew University of Jerusalem) and <a href="https://www2.math.upenn.edu/~ryrogers/">Ryan Rogers</a> (LinkedIn).
The program co-chairs are <a href="https://sites.gatech.edu/rachel-cummings/">Rachel Cummings</a> and <a href="http://www.gautamkamath.com/">myself</a>.
Please submit your best work on differential privacy, and hope to see you there!</p>
<p align="center">
<img src="/images/Ligett.png" />
<img src="/images/Rogers.png" /> <br />
<i>Invited speakers Katrina Ligett and Ryan Rogers</i>
</p>
Gautam KamathWed, 05 May 2021 10:00:00 -0400
https://differentialprivacy.org/tpdp21-cfp/
https://differentialprivacy.org/tpdp21-cfp/ALT Highlights - An Equivalence between Private Learning and Online Learning (ALT '21 Tutorial)<p>Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference <a href="http://algorithmiclearningtheory.org/alt2021/">ALT 2021</a>, including plenary talks, tutorials, trends in learning theory, and more!
To reach a broad audience, the series will be disseminated as guest posts on different blogs in machine learning and theoretical computer science.
Given the topic of this post, we felt <a href="https://differentialprivacy.org/">DifferentialPrivacy.org</a> was a great fit.
This initiative is organized by the <a href="https://www.let-all.com/">Learning Theory Alliance</a>, and overseen by <a href="http://www.gautamkamath.com/">Gautam Kamath</a>.
All posts in ALT Highlights are indexed on the official <a href="https://www.let-all.com/blog/2021/04/20/alt-highlights-2021/">Learning Theory Alliance blog</a>.</p>
<p>The second post is coverage of <a href="http://www.cs.technion.ac.il/~shaymrn">Shay Moran</a>’s <a href="https://www.youtube.com/watch?v=wk910Aj559A">tutorial</a>, by <a href="https://people.eecs.berkeley.edu/~kush/">Kush Bhatia</a> and <a href="https://web.stanford.edu/~mglasgow/">Margalit Glasgow</a>.</p>
<hr />
<p>The tutorial at ALT was given by <a href="http://www.cs.technion.ac.il/~shaymrn/">Shay Moran</a>, an assistant professor at the Technion in Haifa.
His <a href="https://www.youtube.com/watch?v=wk910Aj559A">talk</a> focused on recent results showing a deep connection between two important areas in learning theory: online learning and differentially private learning.
While online learning is a well-established area that has been studied since the invention of the Perceptron algorithm in 1954, differential privacy (DP) - introduced in the seminal work of Dwork, McSherry, Nissim, and Smith in 2006 <a href="https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405"><strong>[DMNS06]</strong></a> - has received increasing attention in recent years from both theoretical and applied research communities along with industry and the government.
This recent interest in differential privacy comes from a need to protect the privacy rights of the individuals, while still allowing one to derive useful conclusions from the dataset.
Shay, in a sequence of papers with co-authors Noga Alon, Mark Bun, Roi Livni, and Maryanthe Malliaris, revealed a surprising qualitative connection between these two models of learning: a concept class can be learned in an online fashion if and only if this concept class can be learned in an offline fashion by a differentially private algorithm.</p>
<p>The main objectives of this tutorial were to give an in-depth foray into this recent work, and present an opportunity to young researchers to identify interesting research problems at the intersection of these two fields. This line of work (<a href="https://arxiv.org/abs/1806.00949"><strong>[ALMM19]</strong></a>, <a href="https://arxiv.org/abs/2003.00563"><strong>[BLM20]</strong></a>) - which has been primarily featured in general CS Theory conferences, including recently winning a best paper award at FOCS - introduced several new techniques which could be of use to the machine learning theory community. The first part of this article focuses on the technical challenges of characterizing DP learnability and the solutions used in Shay’s work, which originate from combinatorics and model theory. In the rest of this article, we highlight some of the exciting open directions Shay sees more broadly in learning theory.</p>
<h2 id="background-pac-learning-online-learning-and-dp-pac-learning">Background: PAC Learning, Online Learning, and DP PAC Learning</h2>
<p>We begin by reviewing the classical setting of PAC learning.
The goal of PAC learning is to learn some function from a <em>concept class</em> \(\mathcal{H} \), a set of functions from some domain \( \mathcal{X} \) to \( \{0, 1\} \).
<img align="right" src="/images/PACLearning.png" style="width:300px;height:300px;" />
In the realizable setting of PAC learning, which we will focus on here, the learner is presented with \( n \) labeled training samples \( \{(x_i, y_i)\}_{i=1}^{n} \) where \( x_i \in \mathcal{X} \) and \( y_i = h(x_i) \) for some function \( h \in \mathcal{H} \). While the learner knows the concept class \( \mathcal{H} \), the function \( h \) is unknown, and after seeing the \( n \) samples, the learner algorithm \( A \) must output some function \( \hat{h}: \mathcal{X} \rightarrow \{0, 1\} \). A concept class is <em>PAC-learnable</em> if there exists an algorithm \( A \) such that for any distribution over samples, as the number of labeled training samples goes to infinity, the probability that \( \hat{h} \) incorrectly labels a new random sample goes to 0. We will say that \( A \) is a <em>proper</em> learner if \( A \) outputs a function in \(\mathcal{H}\), while \( A \) is <em>improper</em> if it may output a function outside of \( \mathcal{H} \). More quantitative measures of learnability concern the exact number of samples \( n \) needed for the learner to correctly predict future labels with nontrivial probability.</p>
<p>DP PAC learning imposes an additional restriction on PAC learning: the learner must output a function which does not reveal too much about any one sample in the input. Formally, we say an algorithm \( A \) is \( (\varepsilon, \delta) \)-DP if for any two neighboring inputs \( X = (X_1, \cdots, X_i, \cdots, X_n) \) and \( X’ = (X_1, \cdots, X_i’,\cdots, X_n) \) which differ at exactly one sample, for any set \( S \), \( \Pr[A(X) \in S] \leq e^\varepsilon Pr[A(X’) \in S] + \delta \) . A concept class is <em>DP PAC learnable</em> if it can be PAC-learned by a \( (0.1, o(1/n)) \)-DP algorithm \( A \). That is, changing any one training sample \( (x_i, y_i) \) should not affect the distribution over concepts output by \( A \) by too much.</p>
<p>Online learning considers a setting where the samples arrive one-by-one, and the learner must make predictions as this process unfolds. <img align="right" src="/images/OnlineLearning.png" style="width:300px;height:300px;" />Formally, in realizable online learning, at each round \( i = 1,\dots,T \), the learner is presented with a sample \( x_i\). The learner must predict the label, and then the true label \( y_i \) is revealed to the learner. The sequence of examples \( x_i \) and the labels \( y_i \) may be chosen adversarially, but they must be consistent with some function \( h \in \mathcal{H} \). The goal of the learner is to minimize the total number of mistakes made by round \( T \), also called the <em>mistake bound</em>. If there is a learner such that at \( T \rightarrow \infty \), the number of mistakes is \( o(T) \), we say that the class of functions \( \mathcal{H} \) is <em>online learnable</em>. Because of the adversarial nature of the examples, online learning is well known to be much harder than PAC learning, and is possible precisely when the <em>Littlestone Dimension</em> of the concept class is finite. Figure 1 below illustrates the definition of the Littlestone Dimension. One important PAC-learnable concept class which is not online learnable is the infinite class of thresholds on \( \mathbb{R} \): The set of functions \(\{h_t\}_{t \in \mathbb{R}} \) where \( h_t(x) = \textbf{1}(x > t) \).</p>
<p><img src="/images/LD.png" alt="Figure 1: Littlestone Dimension" title="Figure 1: Littlestone Dimension" /></p>
<p><strong>Figure 1</strong>: The Littlestone dimension is the maximum depth of a tree shattered by \( \mathcal{H} \), where each node may contain a unique element from \(\mathcal{X}\). The tree is shattered if each leaf can be labeled by a function in \(\mathcal{H} \) that labels each element on the path to the root according to the value of the edge entering it. In this figure, we show how the set of thresholds on \( [0, 1] \) shatters this tree of depth 3.</p>
<p>It’s worth taking a moment to understand intuitively why learning thresholds might be hard for both an online learner and a DP learner. In an online setting, suppose the adversary chooses the next example to be any value of \( x \) in between all previously 0-labeled examples and all 1-labeled examples. Then no matter what label \( A \) chooses, the adversary can say \( A \) was incorrect. In a DP setting, a simple proper learner that outputs a threshold that makes no errors on the training data will reveal too much information about the samples at the boundary of 0-labeled samples and 1-labeled samples.</p>
<h2 id="a-challenging-question">A Challenging Question</h2>
<p>The journey to characterizing DP PAC learnability started soon after the introduction of DP, in <a href="https://arxiv.org/abs/0803.0924"><strong>[KLNRS08]</strong></a>, where the concept of DP PAC learning was introduced. This work primarily considered the more stringent <em>pure</em> DP-learning, where \( \delta = 0\). This work established that any finite concept class could be learned privately by applying the <em>exponential mechanism</em> (a standard technique in DP) to the empirical error of each candidate concept. Using the exponential mechanism, the learner could output each function \( h \in \mathcal{H}\) with probability proportional to \( \exp(- \varepsilon(\# \text{ of errors h on the training data})) \). This was sufficiently private because the empirical error is not too sensitive to a change of one training sample. Later works <a href="https://arxiv.org/abs/1402.2224"><strong>[BNS14]</strong></a> combinatorially characterized the complexity of pure DP PAC learning in terms of a measure called <em>representation dimension</em>, showing that in some cases where PAC learning was possible, pure DP PAC learning was not. <a href="https://arxiv.org/abs/1504.07553"><strong>[BNSV15]</strong></a> further yielded lower bounds on the limits of proper DP PAC learning.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><br />
Both pure and proper DP learning though are significantly more stringent than improper DP learning, and proving lower bounds against an improper DP learner posed a serious challenge. Even the following simple-sounding question was unsolved:</p>
<blockquote>
<p>Can an improper DP algorithm learn the infinite class of thresholds over [0, 1]? (*)</p>
</blockquote>
<p>This question stands at the center of Shay’s work, which unfolded while Shay was residing at the Institute for Advanced Study (IAS) at Princeton from 2017 to 2019.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> At the time, Shay was working on understanding the expressivity of limited mutual-information algorithms, that is, algorithms which expose little information about the total input. “If we were to have directly worked on this problem, I believe we wouldn’t have solved it,” Shay says. Instead, they came from the angle of mutual-information, a concept qualitatively similar to DP, but armed with a rich toolkit from 70 years of information theory. One of Shay’s prior works with Raef Bassily, Ido Nachum, Jonathan Shafer, and Amir Yehudayof established lower bounds on the mutual information of any proper algorithm that learns thresholds <a href="https://arxiv.org/abs/1710.05233"><strong>[BMNSY18]</strong></a>, though this didn’t yet address the challenge presented by improper learners.</p>
<p>Unlike most lower bounds in theoretical computer science, proving hardness of learning the infinite class of thresholds on the line against an improper DP algorithm would require coming up with algorithm-specific distributions over samples. That is, instead of showing that one distribution over samples would be impossible to learn for all algorithms — in the way the a uniform distribution over the set in \(\mathcal{X}\) shattered by the concept class is hard to PAC-learn for any algorithm — they would have to come up with a distribution on \(\mathcal{X}\) specific to each candidate learning algorithm. Indeed, if the distribution was known to the learner, it was possible to devise a DP algorithm using the exponential mechanism (again!) which could learn any PAC-learnable concept class. Similar to the case of finite concept classes, here we can apply the exponential mechanism to some finite set of representative functions forming a cover of the concept class.</p>
<h2 id="uncovering-a-solution">Uncovering a Solution</h2>
<p>At the IAS, Shay met his initial team to tackle this obstacle: Noga Alon, a combinatorialist, Roi Livni, a learning theorist, and Maryanthe Malliaris, a model theorist. Shay and Maryanthe would walk together from IAS to their combinatorics class at Princeton taught by Noga and discuss mathematics. While Marayathe studied the abstract mathematical field of model theory, Shay and Maryanthe soon noticed a connection between model theory and machine learning: the Littlestone dimension. “There is applied math, then pure math, and then model theory is way over there,” Shay elaborates. “If theoretical machine learning is between applied math and pure math, model theory is on the other extreme.” This surprising interdisciplinary connection led them to use a result from model theory: a concept class had finite Littlestone Dimension precisely when the concept class had finite threshold dimension (formally, the maximum number of threshold functions that could be embedded in the class). This meant that answering (*) negatively was enough to show that DP PAC learning was as hard as online learning.</p>
<p>The idea for showing a lower bound for improper DP learning of thresholds came from Ramsey Theory, a famous area in combinatorics. Ramsey theory guarantees the existence of structured subsets among large, but arbitrary, combinatorial objects. A toy example of Ramsey Theory is that in any graph on 6 or more nodes, there must be a group of 3 nodes which form either a clique or an independent set. In the case of DP learning thresholds, the learning algorithm \( A \) is the arbitrary (and unstructured) combinatorial object. Ramsey Theory guarantees that for any PAC learner \( A \), there exists some large subset \( \mathcal{X}’ \) of the domain \(\mathcal{X}=[0, 1]\) on which \( A \) is close to behaving “normally”. The next step is to show that behaving normally contradicts differentially privacy. We’ll dig more technically into this argument in the next couple paragraphs. Note that we invent a couple definitions for the sake of exposition (“proper-normal” and “improper-normal”) that don’t appear in <a href="https://arxiv.org/abs/1504.07553"><strong>[BNSV15]</strong></a> or <a href="https://arxiv.org/abs/1806.00949"><strong>[ALMM19]</strong></a>.</p>
<p>Let’s start by seeing how a simpler version of this argument from <a href="https://arxiv.org/abs/1504.07553"><strong>[BNSV15]</strong></a> works to show that proper DP algorithms cannot learn thresholds. Recall that in a proper algorithm, after seeing \( n \) labeled samples, \( A \) must output a threshold. To be a PAC learner, if exactly half of the \( n \) samples are labeled \( 1 \) (which we will call a <em>balanced</em> sample), \( A \) must output a threshold between the smallest and largest sample with constant probability. Indeed, otherwise the empirical error of \( A \) will be too large to even hope to generalize. This implies that for a balanced list of samples \( S = [(x_1, 0), \ldots ,(x_{n/2}, 0), (x_{n/2+1}, 1), \ldots ,(x_n, 1)] \) (ordered by \(x\)-value), there must exist an integer \( k \in [n] \) for which with probability \( \Omega(1/n) \), \(A \) outputs a threshold in between
\( x_k \) and \( x_{k+1} \). We’ll say that \( A \) is <em>\(k\)-proper-normal</em> on a set \( \mathcal{X}’ \) if for any balanced sample \( S \) of \( n \) points in \( \mathcal{X}’ \), \( A \) outputs a threshold in between the \(k\)th and \(k+1\)th ordered samples with probability \( \Omega(1/n) \). For example, the naive algorithm that always outputs a threshold between the 0-labeled samples and 1-labeled samples is \(n/2\)-normal on the entire domain \([0, 1]\). Ramsey theory guarantees that there is an arbitrary large subset of the domain \([0, 1]\) on which \(A \) is \(k\)-proper-normal for some \(k\). (See Figure 2).</p>
<p><img src="/images/RamseyTheorem.png" alt="Figure 2: Ramsey's Theorem" title="Figure 2: Ramsey's Theorem" />
<strong>Figure 2</strong>: Ramsey’s Theorem guarantees a large subset \( \mathcal{X}’ \) on which \( A \) is \( k \)-proper-normal for some \( k \).</p>
<p>The second part of the argument shows that being \( k \)-proper-normal on a large set is in direct conflict with differential privacy. We show this argument in Figure 3 by constructing a set of \( n \) samples \( S^* \) on which \( A \) must output a threshold in many distinct regions with substantial probability.</p>
<p><img src="/images/DP_Construction.png" alt="Figure 3: A Hard Distribution" title="Figure 2: Ramsey's Theorem" /></p>
<p><strong>Figure 3:</strong> A distribution showing the conflict between \(A \) being DP and \(k\)-proper normal. By DP, the behaviour of \(A\) on \(S^* \) must be similar to its behaviour on \(S_i\) for \(i = 1 \ldots \Omega(n).\) Namely, since \(S^* \) and \( S_i \) differ by at most two points, \( A(S^*) \) must output a threshold in \( I_i \) with probability \( p \geq (q - 2\delta)e^{-(2\varepsilon)} \) for each \( i \), where \( q \) is a lower bound on the probability that \( A(S_i) \) outputs a threshold in \( I_i \). Because \( A \) is \(k\)-proper-normal, \( q > c/n \), so \( p = \Omega(1/n) \). This yields a contradiction for \( m >1/p \) because \( A \) cannot simultaneously output a threshold in two of the intervals \( I_i \).</p>
<p>For the case of improper algorithms, Shay and his coauthors considered an alternative notion of normality, which we will term <em>improper-normal</em>. Recall that in this case the output \( A(S) \) of the learner on a sample \( S \) is <em>any function</em> from \( [0, 1] \) to \( \{0, 1\} \) and not necessarly a threshold. We’ll say that \( A \) is \(k\)-improper-normal on a set \( \mathcal{X}’ \) if for any balanced sample \( S = [(x_1, 0), \ldots, (x_{n/2}, 0), (x_{n/2 +1}, 1), \ldots, (x_n, 1)] \) of \( n \) points in \( \mathcal{X}’, \Pr[A(S)(w) = 1] - \Pr[A(S)(v) = 1] > \Omega(1/n) \) for any \( w \in (x_{k - 1}, x_k) \) and \( v \in (x_k, x_{k + 1}) \) in \( \mathcal{X}’ \setminus \{x_1, … x_n\} \). To PAC learn, A must be \(k\)-improper-normal for some \(k\) on any set of \(n + 1\) points. Applying the same Ramsey Theorem to a graph with colored hyperedges of size \(n + 1\) shows that there must exist an arbitrarily large set \( \mathcal{X}’ \) on which \( A \) is \(k\)-improper normal for some \( k \). A similar (but more nuanced) argument as before shows that a learner cannot be simultaneously private and \(k\)-improper-normal on some distribution over \( \mathcal{X}’ \).</p>
<p>For the last piece of the puzzle, showing the upper bound converse, Mark Bun, an expert in differential privacy at Princeton at the time, joined in. Beginning in the fall of 2019, Mark, Shay, and Roi worked out an upper bound that showed that any class with finite Littlestone dimension could be learned privately. Their technique introduced a new notion of stability, called <em>global stability</em>, which is a property of algorithms that frequently output the same hypothesis. Given a globally stable PAC learner \( A \), to obtain a DP PAC learner, one can run \( A \) many times and produce a histogram of the output hypotheses, add noise to this histogram for the sake of privacy, and then select the most frequent output hypothesis. The construction for the globally stable learner uses the Standard Optimal Algorithm for online learning as a black box - though the reduction is very computationally intensive and results in a sample complexity depending exponentially on the Littlestone Dimension. This reduction was improved in <a href="https://arxiv.org/abs/2012.03893"><strong>[GGKM20]</strong></a>, where the authors gave a reduction requiring only polynomially many samples in the Littlestone dimension.</p>
<h2 id="outlook">Outlook</h2>
<p>Shay mentions that these results are only a first step towards understanding differentially private PAC learning. This work establishes a deep connection between online learning and DP learning, qualitatively showing that these two problems have similar underlying complexity. Recent works <a href="https://arxiv.org/abs/1905.11311"><strong>[GHM19]</strong></a> have gone a step further and established polynomial time reductions from DP learning to online learning under certain conditions. At the same time, Bun <a href="https://arxiv.org/abs/2007.05665"><strong>[B20]</strong></a> demonstrates a computational gap between the two problems: they exhibit a concept class which is DP PAC learnable in polynomial time, but no algorithm can learn it online in polynomial time and sample complexity.</p>
<p>An interesting question here is whether a polynomial time online learning algorithm implies a polynomial time DP learning algorithm? With regards to sample complexity, tighter quantitative bounds relating the sample complexity of DP learning to the Threshold dimension and the Littlestone Dimension, along with constructive reductions from DP learning to online learning are wide open. An interesting conjecture, also highlighted in the talk, is whether DP PAC learning and PAC learning are actually equivalent up to a \(log*\) factor of the Littlestone dimension? Solving this would mean that for most natural function classes, one need not pay a very high price for private learning as compared to PAC learning.</p>
<p>From a more practical perspective, studying such qualitative equivalences can lay the groundwork allowing one to use the vast existing knowledge in the field of online learning to design better algorithms for DP learning, and vice versa. Despite its abstractness, Shay believes this result still has significance for engineers: “If they want to do something differentially privately, if they already have a good online learning algorithm for this, maybe they can modify it,” Shay says. “It gives some kind of inspiration.”</p>
<p>From a broader perspective, Shay believes that discovering clean, beautiful mathematical models that are more realistic than the PAC learning model and understanding the price one pays for privacy in those models are important directions for future research. One model, highlighted in the tutorial by Shay is that of <em>Universal Learning</em>, introduced in a recent work <a href="https://arxiv.org/abs/2011.04483"><strong>[BHMVY21]</strong></a> with Olivier Bousquet, Steve Hanneke, Ramon van Handel and Amir Yehudayoff. This model considers a learning task with a fixed distribution of samples and studies the hardness of the problem as the number of samples \( n \to \infty \). This setup better captures the practical aspects of modern machine learning and overcomes the limitations of the PAC model, which studies worst-case distributions for each sample size.</p>
<p>And how should one go about identifying such better mathematical models? “Pick the simplest problem that you don’t know how to solve and see where that leads you”, says Shay. One should begin by trying to understand the deficiencies of existing learning models, by identifying simple examples which go beyond these existing models. For example, Livni and Moran <a href="https://arxiv.org/abs/2006.13508"><strong>[LM20]</strong></a> exhibit the limitations of PAC-Bayes framework through a simple 1D linear classification problem. Fixing such limitations can often lead one to discover better learning models.</p>
<hr />
<p><em>Thanks to Gautam Kamath, Shay Moran, and Keziah Naggita for helpful conversations and comments.</em></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>For a more complete background on the progress in DP PAC learning, we refer the reader to the excellent survey blog post <a href="https://differentialprivacy.org/private-pac/">here</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>At the time, Shay was additionally affiliated with Princeton University and Google Brain. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Kush BhatiaMargalit GlasgowMon, 26 Apr 2021 12:00:00 -0400
https://differentialprivacy.org/alt-highlights/
https://differentialprivacy.org/alt-highlights/What is δ, and what δifference does it make?<p>There are many variants or flavours of differential privacy (DP) some weaker than others: often, a given variant comes with own guarantees and “conversion theorems” to the others. As an example, “pure” DP has a single parameter \(\varepsilon\), and corresponds to a very stringent notion of DP:</p>
<blockquote>
<p>An algorithm \(M\) is \(\varepsilon\)-DP if, for all neighbouring inputs \(D,D'\) and all measurable \(S\), \( \Pr[ M(D) \in S ] \leq e^\varepsilon\Pr[ M(D’) \in S ] \).</p>
</blockquote>
<p>By relaxing this a little, one obtains the standard definition of approximate DP, a.k.a. \((\varepsilon,\delta)\)-DP:</p>
<blockquote>
<p>An algorithm \(M\) is \((\varepsilon,\delta)\)-DP if, for all neighbouring inputs \(D,D'\) and all measurable \(S\), \( \Pr[ M(D) \in S ] \leq e^\varepsilon\Pr[ M(D’) \in S ]+\delta \).</p>
</blockquote>
<p>This definition is very useful, as in many settings achieving the stronger \(\varepsilon\)-DP guarantee (i.e., \(\delta=0\)) is impossible, or comes at a very high utility cost. But how to interpret it? The above definition, on its face, doesn’t preclude what one may call “<em>catastrophic failures of privacy</em> 💥:” most of the time, things are great, but with some small probability \(\delta\) all hell breaks loose. For instance, the following algorithm is \((\varepsilon,\delta)\)-DP:</p>
<ul>
<li>Get a sensitive database \(D\) of \(n\) records</li>
<li>Select uniformly at random a fraction \(\delta\) of the database (\(\delta n\) records)</li>
<li>Output that subset of records in the clear 💥</li>
</ul>
<p>(actually, this is even \((0,\delta)\)-DP!). This sounds preposterous, and obviously something that one would want to avoid in practice (lest one wants to face very angry customers or constituents). This is one of the rules of thumb for picking \(\delta\) small enough (or even “cryptographically small”), typically \(\delta \ll 1/n\), so that the records are safe (hard to disclose \(\delta n \ll 1\) records).</p>
<p>So: good privacy most of the time, but with probably \(\delta\) then all bets are off.</p>
<p>However, those catastrophic failure of privacy, while technically allowed by the definition of \((\varepsilon,\delta)\)-DP, <strong>are not something that can really happen with the DP algorithms and techniques used both in practice and in theoretical work.</strong> Before explaining why, let’s see what is the kind of desirable behaviour one would expect: a <em>“smooth, manageable tradeoff of privacy parameters.”</em> For that discussion, let’s introduce the <em>privacy loss random variable</em>: given an algorithm M and two neighbouring inputs D,D’, let \(f(y)\) be defined as
\[
f(y) = \log\frac{\Pr[M(D)=y]}{\Pr[M(D’)=y]}
\]
for every possible output \(y\in\Omega\). Now, define the random variable \(Z := f(M(D))\) (implicitly, \(Z\) depends on \(D,D',M\)). This random variable quantifies how much observing the output of the algorithm \(M\) helps distinguishing between \(D\) and \(D'\).</p>
<p>Now, going a little bit fast, you can check that saying that \(M\) is \(\varepsilon\)-DP corresponds to the guarantee “<em>\(\Pr[Z > \varepsilon] = 0\) for all neighbouring inputs \(D,D'\).</em>”
Similarly, \(M\) being \((\varepsilon,\delta)\)-DP is the guarantee \(\Pr[Z > \varepsilon] \leq \delta\).\({}^{(\dagger)}\) For instance, the “catastrophic failure of privacy” corresponds to the scenario below, which depicts a possible distribution for \(Z\): \(Z\leq \varepsilon\) with probability \(1-\delta\), but then with probability \(\delta\) we have \(Z\gg 1\).</p>
<p><img src="/images/flavours-delta-fig1.png" width="600" alt="The type of (bad) distribution of Z corresponding to 'our catastrophic failure of privacy'" style="margin:auto;display: block;" /></p>
<p>What we would like is a smoother thing, where even when \(Z>\varepsilon\) is still remains reasonable and doesn’t immediately become large. A nice behaviour of the tails, ideally something like this:</p>
<p><img src="/images/flavours-delta-fig2.png" width="600" alt="A distribution for Z with nice tails, leading to smooth tradeoffs between ε and δ" style="margin:auto;display: block;" /></p>
<p>For instance, if we had a bound on \(\mathbb{E}[|Z|]\), we could use Markov’s inequality to get, well, <em>something</em>. For instance, imagine we had \(\mathbb{E}[|Z|]\leq \varepsilon\delta\): then
\[
\Pr[ |Z| > \varepsilon ] \leq \frac{\mathbb{E}[|Z|]}{\varepsilon }\leq \delta
\]
<em>(great! We have \((\varepsilon,\delta)\)-DP)</em>; but also \(\Pr[ |Z| > 10\varepsilon ] \leq \frac{\delta}{10}\). Privacy violations do not blow up out of proporxtion immediately, we can trade \(\varepsilon\) for \(\delta\). That seems like the type of behaviour we would like our algorithms to exhibit.</p>
<p><img src="/images/flavours-delta-fig3.png" width="600" alt="The type of privacy guarantees a Markov-type tail bound would give" style="margin:auto;display: block;" /></p>
<p>But why stop at Markov’s inequality then, which gives some nice but still weak tail bounds? Why not ask for <em>stronger</em>: Chebyshev’s inequality? Subexponential tail bounds? Hell, <em>subgaussian</em> tail bounds? This is, basically, what some stronger notions of differential privacy than approximate DP give.</p>
<ul>
<li>
<p><strong>Rényi DP</strong> <a href="https://arxiv.org/abs/1702.07476" title="Ilya Mironov. Renyi Differential Privacy. CSF 2017"><strong>[Mironov17]</strong></a>, for instance, is a guarantee on the moment-generating function (MGF) of the privacy random variable \(Z\): it has two parameters, \(\alpha>1\) and \(\tau\), and requires that \(\mathbb{E}[e^{(\alpha-1)Z}] \leq e^{(\alpha-1)\tau}\) for all neighbouring \(D,D'\). In turn, by applying for instance Markov’s inequality to the MGF of \(Z\), we can control the tail bounds, and get a nice, smooth tradeoff in terms of \((\varepsilon,\delta)\)-DP.</p>
</li>
<li>
<p><strong>Concentrated DP</strong> (CDP) <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun and Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016"><strong>[BS16]</strong></a> is an even stronger requirement, which roughly speaking requires the algorithm to be Rényi DP <em>simultaneously</em> for all \(1< \alpha \leq \infty\). More simply, this is “morally” a requirement on the MGF of \(Z\) which asks it to be subgaussian.</p>
</li>
</ul>
<p>The above two examples are not just fun but weird variants of DP: they actually capture the behaviour of many well-known differentially private algorithms, and in particular that of the Gaussian mechanism. While the guarantees they provide are less easy to state and interpret than \(\varepsilon\)-DP or \((\varepsilon,\delta)\)-DP, they are incredibly useful to analyze those algorithms, and enjoy very nice composition properties… and, of course, lead to that smooth tradeoff between \(\varepsilon\) and \(\delta\) for \((\varepsilon,\delta)\)-DP.</p>
<p><strong>To summarize:</strong></p>
<ul>
<li>\(\varepsilon\)-DP gives great guarantees, but is a very stringent requirement. Corresponds to the privacy loss random variable supported on \([-\varepsilon,\varepsilon]\) (no tails!)</li>
<li>\((\varepsilon,\delta)\)-DP gives guarantees easy to parse, but on its face allows for very bad behaviours. Corresponds to the privacy loss random variable in \([-\varepsilon,\varepsilon]\) with probability \(1-\delta\) (but outside, all bets are off!)</li>
<li>Rényi DP and Concentrated DP correspond to something in between, controlling the tails of the privacy loss random variable by a guarantee on its MGF. A bit harder to interpret, but capture the behaviour of many DP building blocks can be converted to \((\varepsilon,\delta)\)-DP (with nice trade-offs between \(\varepsilon\) and \(\delta\).</li>
</ul>
<hr />
<p>\({}^{(\dagger)}\) The astute reader may notice that this is not <em>quite</em> true. Namely, the guarantee \(\Pr[Z > \varepsilon] \leq \delta\) on the privacy loss random variable (PLRV) does imply \((\varepsilon,\delta)\)-differential privacy, but the converse does not hold. See, for instance, Lemma 9 of <a href="https://arxiv.org/abs/2004.00010" title="Clément L. Canonne, Gautam Kamath, Thomas Steinke. The Discrete Gaussian for Differential Privacy. NeurIPS 2020"><strong>[CKS20]</strong></a> for an exact characterization of \((\varepsilon,\delta)\)-DP in terms of the PLRV.</p>
Clément CanonneThu, 11 Mar 2021 21:00:00 -0400
https://differentialprivacy.org/flavoursofdelta/
https://differentialprivacy.org/flavoursofdelta/Conference Digest - TPDP 2020<p><a href="https://tpdp.journalprivacyconfidentiality.org/2020/">TPDP 2020</a> is a workshop focused on differential privacy. As such, it’s a great place to learn about recent developments in the DP research community.
It will be held on 13 November and is co-located with <a href="https://www.sigsac.org/ccs/CCS2020/">CCS</a>, but, of course, it’s virtual this year. <a href="https://www.sigsac.org/ccs/CCS2020/registration.html">Registration is only US$35 if you register by Friday, 30 October.</a> Check out the 8 excellent talks and 71 posters below – wow, the workshop has grown!</p>
<p>Please let us know if there are any errors or omissions.</p>
<h2 id="invited-talks">Invited Talks</h2>
<ul>
<li>
<p>OpenDP: A Community Effort to Build Trustworthy Differential Privacy Software.<br />
<a href="https://salil.seas.harvard.edu/">Salil Vadhan</a></p>
</li>
<li>
<p>Implementation with Base-2 DP or: How I learned to stop worrying and love floating point.
<a href="https://cilvento.org/">Christina Ilvento</a></p>
</li>
</ul>
<h2 id="contributed-talks">Contributed Talks</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2009.09052">Private Reinforcement Learning with PAC and Regret Guarantees</a><br />
Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07709">Auditing Differentially Private Machine Learning: How Private is Private SGD?</a><br />
Matthew Jagielski, Jonathan Ullman, Alina Oprea</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06783">Characterizing Private Clipped Gradient Descent on Convex Generalized Linear Problems</a><br />
Shuang Song, Om Thakkar, Abhradeep Thakurta</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.10941">Private Query Release Assisted by Public Data</a><br />
Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.00563">An Equivalence Between Private Classification and Online Prediction</a><br />
Mark Bun, Roi Livni, Shay Moran</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09745">Differentially Private Set Union</a><br />
Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin</p>
</li>
</ul>
<h2 id="posters">Posters</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2004.00010">The Discrete Gaussian for Differential Privacy</a><br />
Clément Canonne, Gautam Kamath, Thomas Steinke</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09465">Locally Private Hypothesis Selection</a><br />
Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Z. Steven Wu, Huanyu Zhang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.05839">LinkedIn’s Audience Engagements API: A Privacy Preserving Data Analytics System at Scale</a><br />
Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, Parvez Ahammad</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.14717">Differentially Private Decomposable Submodular Maximization</a><br />
Anamay Chaturvedi, Huy Nguyen, Lydia Zakynthinou</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.11934">Private Post-GAN Boosting</a><br />
Marcel Neunhoeffer, Z. Steven Wu, Cynthia Dwork</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.01100">Efficient, Noise-Tolerant, and Private Learning via Boosting</a><br />
Marco Carmosino, Mark Bun, Jessica Sorrell</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.04509">Closure Properties for Private Classification and Online Prediction</a><br />
by Noga Alon, Amos Beimel, Shay Moran, Uri Stemmer</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.12018">Overlook: Differentially Private Exploratory Visualization for Big Data</a><br />
Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, Matei Zaharia</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.01980">On the Equivalence between Online and Private Learnability beyond Binary Classification</a><br />
Young Hun Jung, Baekjin Kim and Ambuj Tewari</p>
</li>
<li>
<p><a href="https://dettanym.github.io/files/tpdp20_workshop_paper.pdf">Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration</a><br />
Miti Mazmudar, Thomas Humphries, Matthew Rafuse, Xi He</p>
</li>
<li>
<p><a href="https://drops.dagstuhl.de/opus/volltexte/2020/12026/">Bounded Leakage Differential Privacy</a><br />
Katrina Ligett, Charlotte Peale, Omer Reingold</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1909.06322">A Knowledge Transfer Framework for Differentially Private Sparse Learning</a><br />
Lingxiao Wang, Quanquan Gu</p>
</li>
<li>
<p>Consistent Integer, Non-Negative, Hierarchical Histograms without Integer Programming<br />
Cynthia Dwork, Christina Ilvento</p>
</li>
<li>
<p><a href="https://www.microsoft.com/en-us/research/uploads/prod/2020/03/intrinsic_privacy_tpdp.pdf">An Empirical Study on the Intrinsic Privacy of Stochastic Gradient Descent</a><br />
Stephanie Hyland, Shruti Tople</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2001.03618">Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation</a><br />
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, Abhradeep Thakurta</p>
</li>
<li>
<p>Improving Sparse Vector Technique with Renyi Differential Privacy<br />
Yuqing Zhu and Yu-Xiang Wang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.11707">Breaking the Communication-Privacy-Accuracy Trilemma</a><br />
Wei-Ning Chen, Peter Kairouz, Ayfer Özgür</p>
</li>
<li>
<p>Budget Sharing for Multi-Analyst Differential Privacy<br />
David Pujol, Yikai Wu, Brandon Fain, Ashwin Machanavajjhala</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1908.07643">AdaCliP: Adaptive Clipping for Private SGD</a><br />
Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.01181">Private Optimization Without Constraint Violation</a><br />
Andrés Muñoz Medina, Umar Syed, Sergei Vassilvitskii, Ellen Vitercik</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1912.06015">Efficient Per-Example Gradient Computations in Convolutional Neural Networks</a><br />
Gaspar Rochette, Andre Manoel, Eric Tramel</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.12674">Controlling Privacy Loss in Survey Sampling</a><br />
Audra McMillan, Mark Bun, Marco Gaboardi, Joerg Drechsler</p>
</li>
<li>
<p>Privacy-Preserving Community Detection under the Stochastic Block Model<br />
Jonathan Hehir, Aleksandra Slavkovic, Xiaoyue Niu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.10129">Smoothed Analysis of Differentially Private and Online Learning</a><br />
Nika Haghtalab, Tim Roughgarden, Abhishek Shetty</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09464">Private Mean Estimation for Heavy-Tailed Distributions</a><br />
Gautam Kamath, Vikrant Singhal, Jonathan Ullman</p>
</li>
<li>
<p><a href="https://link.springer.com/chapter/10.1007%2F978-3-030-57521-2_23">Private Posterior Inference Consistent with Public Information: a Case Study in Small Area Estimation from Synthetic Census Data</a><br />
Jeremy Seeman, Aleksandra Slavkovic, Matthew Reimherr</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2001.09122">Reasoning About Generalization via Conditional Mutual Information</a><br />
Thomas Steinke, Lydia Zakynthinou</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.15429">Understanding Gradient Clipping in Private SGD: A Geometric Perspective</a><br />
Xiangyi Chen, Z. Steven Wu, Mingyi Hong</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.06605">Privacy Amplification via Random Check-Ins</a><br />
Borja Balle, Peter Kairouz, Brendan McMahan, Om Thakkar, Abhradeep Thakurta</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.12603">Permute-and-flip: a new mechanism for differentially-private selection</a><br />
Ryan McKenna, Daniel Sheldon</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.02923">Descent-to-Delete: Gradient-Based Methods for Machine Unlearning</a><br />
Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.06529">A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f-Divergences</a><br />
Shahab Asoodeh, Jiachun Liao, Flavio Calmon, Oliver Kosut, Lalitha Sankar</p>
</li>
<li>
<p><a href="https://sites.tufts.edu/vrdi/files/2020/07/Slides-DP-Bhushan-Suwal-JN-Matthews-et-al.pdf">Census TopDown and the Redistricting Use Case</a><br />
Aloni Cohen, Moon Duchin, JN Matthews, Bhushan Suwal, Peter Wayner</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.04014">Interaction is Necessary for Distributed Learning with Privacy or Communication Constraints</a><br />
Yuval Dagan, Vitaly Feldman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10783">Fisher information under local differential privacy</a><br />
Leighton Barnes, Wei-Ning Chen, Ayfer Ozgur</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.06830">Differentially Private Assouad, Fano, and Le Cam</a><br />
Jayadev Acharya, Ziteng Sun, Huanyu Zhang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.13660">Learning discrete distributions: user vs item-level privacy</a><br />
Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1910.13659">Efficient Privacy-Preserving Stochastic Nonconvex Optimization</a><br />
Lingxiao Wang, Bargav Jayaraman, David Evans, Quanquan Gu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.03813">Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification</a><br />
Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee</p>
</li>
<li>
<p>Differentially private partition selection<br />
Damien Desfontaines, Bryant Gipson, Chinmoy Mandayam, James Voss</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.08007">Differentially Private Clustering: Tight Approximation Ratios</a><br />
Badih Ghazi, Ravi Kumar, Pasin Manurangsi</p>
</li>
<li>
<p>Let’s not make a fuzz about it<br />
Elisabet Lobo Vesga, Alejandro Russo, Marco Gaboardi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.12321">PAPRIKA: Private Online False Discovery Rate Control</a><br />
Wanrong Zhang, Gautam Kamath, Rachel Cummings</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2009.13689">Oblivious Sampling Algorithms for Private Data Analysis</a><br />
Sajin Sasy, Olga Ohrimenko</p>
</li>
<li>
<p>SOGDB-epsilon: Secure Outsourced Growing Database with Differentially Private Record Update<br />
Chenghong Wang, Kartik Nayak, Ashwin Machanavajjhala</p>
</li>
<li>
<p><a href="https://invertibleworkshop.github.io/accepted_papers/pdfs/41.pdf">Differentially Private Normalizing Flows for Privacy-Preserving Density Estimation</a><br />
Chris Waites, Rachel Cummings</p>
</li>
<li>
<p><a href="https://cs.uwaterloo.ca/~hsivasub/pub/TPDP2020.pdf">Differentially Private Sublinear Average Degree Approximation</a><br />
Harry Sivasubramaniam, Haonan Li, Xi He</p>
</li>
<li>
<p><a href="https://chong-l.github.io/MAPL_TNC_FL_ICML_2020.pdf">Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning</a><br />
Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06618">CoinPress: Practical Private Mean and Covariance Estimation</a><br />
Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.09481">Connecting Robust Shuffle Privacy and Pan-Privacy</a><br />
Victor Balcer, Albert Cheu, Matthew Joseph, Jieming Mao</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.11204">Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation</a><br />
Tsubasa Takahashi, Shun Takagi, Hajime Ono, Tatsuya Komatsu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07490">Understanding Unintended Memorization in Federated Learning</a><br />
Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Francoise Beaufays</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10630">Near Instance-Optimality in Differential Privacy</a><br />
Hilal Asi, John Duchi</p>
</li>
<li>
<p><a href="https://drive.google.com/file/d/1okHAkjNENiS2WfSKdkUo8B29yE8-Qfof/view">Implementing differentially private integer partitions via the exponential mechanism</a> and <a href="https://drive.google.com/file/d/1OytgB24d1n-xPIWrrKCsVQQdS7rV3tjn/view">Implementing Sparse Vector</a><br />
Christina Ilvento</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.05157">Differentially Private Simple Linear Regression</a><br />
Audra McMillan, Daniel Alabi, Jayshree Sarathy, Adam Smith, Salil Vadhan</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.05453">New Oracle-Efficient Algorithms for Private Synthetic Data Release</a><br />
Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07749">General-Purpose Differentially-Private Confidence Intervals</a><br />
Cecilia Ferrando, Shufan Wang, Daniel Sheldon</p>
</li>
<li>
<p>Central Limit Theorem and Uncertainty Principles for Differentially Private Query Answering<br />
Jinshuo Dong, Linjun Zhang, Weijie Su</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.13501">Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds</a><br />
Yingxue Zhou, Xiangyi Chen, Mingyi Hong, Z. Steven Wu, Arindam Banerjee</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1905.10335">Minimax Rates of Estimating Approximate Differential Privacy</a><br />
Xiyang Liu, Sewoong Oh</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.07740">Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data</a><br />
Christian Arnold, Marcel Neunhoeffer</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.10541">PAC learning with stable and private predictions</a><br />
Yuval Dagan, Vitaly Feldman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.04656">Computing Local Sensitivities of Counting Queries with Joins</a><br />
Yuchao Tao, Xi He, Ashwin Machanavajjhala, Sudeepa Roy</p>
</li>
<li>
<p>Efficient Reductions for Differentially Private Multi-objective Regression<br />
Julius Adebayo, Daniel Alabi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.06667">The Pitfalls of Differentially Private Prediction in Healthcare</a><br />
Vinith Suriyakumar, Nicolas Papernot, Anna Goldenberg, Marzyeh Ghassemi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.14191">Tempered Sigmoid Activations for Deep Learning with Differential Privacy</a><br />
Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, Úlfar Erlingsson</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10881">Revisiting Membership Inference Under Realistic Assumptions</a><br />
Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2009.04013">Attribute Privacy: Framework and Mechanisms</a><br />
Wanrong Zhang, Olga Ohrimenko, Rachel Cummings</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.10664">DuetSGX: Differential Privacy with Secure Hardware</a><br />
Phillip Nguyen, Alex Silence, David Darais, Joseph Near</p>
</li>
<li>
<p><a href="https://pdfs.semanticscholar.org/4319/65b3c5a47cf8bfd30f1c30cd044382e98d68.pdf">A Programming Framework for OpenDP</a><br />
Marco Gaboardi, Michael Hay, Salil Vadhan</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.09352">A One-Pass Private Sketch for Most Machine Learning Tasks</a><br />
Benjamin Coleman, Anshumali Shrivastava</p>
</li>
<li>
<p>Model-Agnostic Private Learning with Domain Adaptation<br />
Yuqing Zhu, Chong Liu, Yu-Xiang Wang</p>
</li>
</ul>
Thomas SteinkeWed, 28 Oct 2020 00:01:00 +0000
https://differentialprivacy.org/tpdp2020/
https://differentialprivacy.org/tpdp2020/Reconstruction Attacks in Practice<p>This is the second of two posts describing the theory and practice of reconstruction attacks. To read the first post, which covers the theoretical basis of such attacks, <a href="https://differentialprivacy.org/reconstruction-theory/">[click here]</a>.</p>
<hr />
<p>In the <a href="https://differentialprivacy.org/reconstruction-theory/">last post</a>, we discussed how an attacker can use noisy answers to questions about a database to reconstruct private information in the database. The reconstruction attack framework was:</p>
<ol>
<li>The attacker submits sufficiently random queries that link prior information (which the attacker already knows) to private data (which the attacker wants to learn).</li>
<li>The attacker receives noisy answers to these queries and writes them down as constraints for a linear program to solve for the private bits.</li>
<li>The attacker solves the linear program and rounds the result to recover most of the bits.</li>
</ol>
<p>Our last post discussed some of this attack’s nice theoretical guarantees, and this post matches that with real-world performance. More specifically, we’ll cover two successful applications of this attack against a piece of anonymizing SQL software called Diffix which, despite the name, is not differentially private.</p>
<h3 id="what-is-diffix">What is Diffix?</h3>
<p>Diffix is a system designed by the startup Aircloak for answering statistical queries over a private database. It is described by its creators as an “anonymizing SQL interface [that] sits in front of your data and enables you to conduct ad hoc analytics — fully privacy preserving and GDPR-compliant.”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> Aircloak’s approach is to develop targeted defenses for known vulnerabilities, but to otherwise privilege utility over protecting against unknown vulnerabilities. They combine this approach with a serious effort to actually find vulnerabilities in Diffix through periodic bug bounties that offer monetary prizes for participants who mount successful attacks. While this post is critical of the design of Diffix itself, we commend Aircloak for their genuine openness to scrutiny. Indeed, the attacks described in this post were carried out as a part of these bug bounty programs and led to the discovery of several vulnerabilities in the software that have since been addressed. The first attack we describe was carried out by Aloni Cohen and Kobbi Nissim in the first bug bounty program in late 2017 and early 2018. The second was run by Travis Dick, Matthew Joseph, and Zachary Schutzman in the second bug bounty program during the summer of 2020.</p>
<p>Before diving into the details of the attacks, we’ll first introduce the basic functionality of Diffix and how it purports to defend against vulnerabilities, including linear reconstruction attacks. The goal of Diffix is to answer SQL queries, such as:</p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND clientId BETWEEN 2000 and 3000
</code></pre>
<p>on a database while preventing the disclosure of record-level data.<br />
A challenge for a system like Diffix is to answer such counting queries while preventing an adversarial user—the attacker—from learning record-level information. As you might remember from the last post, such a system must not provide exact answers to arbitrary queries. Otherwise the attacker could mount a <em>differencing attack</em>. For example, an attacker who knows that Billy Joel’s <code class="language-plaintext highlighter-rouge">clientID</code> is 2744 could learn the status of the singer’s loan by comparing the answer to the previous query with the answer to:</p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND clientId BETWEEN 2000 and 3000
AND clientId != 2744
</code></pre>
<p>An intuitive defense is to add noise to the answer—say, Gaussian noise sampled from \(N(0,10)\).
Now the difference \(\Delta\) in the responses to the two queries is a random variable sampled from \(N(1,20)\) or \(N(0,20)\) depending on whether Joel’s <code class="language-plaintext highlighter-rouge">loanStatus</code> is or isn’t <code class="language-plaintext highlighter-rouge">C</code>.
With just one sample, the distributions are hard to distinguish.</p>
<p>Still, this scheme is easily thwarted by <em>averaging attacks</em>.
If the noise is sampled anew each time a query is made, then repeatedly making the same pair of queries generates many independent samples from \(N(1,20)\) or \(N(0,20)\), and enough queries would make it possible to distinguish these distributions easily.</p>
<p>As before, there is an intuitive defense: use the same noise for repeated queries. This defense introduces its own new attacks by making many syntactically-distinct but semantically-equivalent queries. Those attacks in turn suggest new defenses which suggest new attacks, and so on. Diffix is, in a sense, the result of this hypothetical arms race.</p>
<p>From a technical perspective, Diffix consists of three components, which together are intended to thwart these attacks. First, Diffix only accepts a limited subset of SQL and will categorically reject any query that does not fit this subset. These restrictions—including tight restrictions on <code class="language-plaintext highlighter-rouge">JOIN</code>s and on the number of mathematical functions in a single expression—limit the ability of an adversary to use the full power of SQL to access the database. The second component is a collection of data-dependent ad-hoc methods to prevent leaking information about individuals or very small subsets of users, including suppressing answers to queries about small numbers of users and flattening outliers.</p>
<p>The final component is Diffix’s layered noise. This noise is comprised of two individual noise terms added together: a <em>data-dependent</em> term whose variance is constant<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> and a <em>query-dependent</em> term whose variance depends on the complexity of the query. The data-dependent noise prevents naïve averaging attacks. It is a pseudorandom error where the seed of the pseudorandom function depends on individual data records that contribute to the query result. Semantically equivalent queries using different syntax will nonetheless share this error, so simply averaging the responses will not remove this noise.</p>
<p>The query-dependent noise prevents a naïve Dinur–Nissim style reconstruction attacks. A noise term of magnitude \(\Omega(1)\) is generated deterministically for each condition in the <code class="language-plaintext highlighter-rouge">WHERE</code> or <code class="language-plaintext highlighter-rouge">HAVING</code> clause of the SQL query, and the terms are added together. A Dinur–Nissim query is a random subset of the dataset that contains \(\Omega(n)\) records. The straightforward way of specifying such a query is to enumerate the subset record by record using \(\Omega(n)\) conditions:<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND (clientId = 2007
OR clientId = 2018
...
OR clientId = 2991)
</code></pre>
<p>A query with \(\Omega(n)\) conditions is answered with noise with standard deviation \(\Omega(\sqrt{n})\), enough to thwart efficient reconstruction algorithms.</p>
<h3 id="carrying-out-reconstruction">Carrying out reconstruction</h3>
<p>The additional noise per SQL condition is the main obstacle to running a successful reconstruction attack on a database behind Diffix. As described above, the noise prevents the naive implementation of the reconstruction algorithm from receiving accurate enough answers to reconstruct the database using a reasonable number of queries.<br />
A natural approach is to use very few SQL conditions—ideally, just one—to make random-enough queries, each identifying a subset of the records in the dataset.
So the challenge is to formulate a large family of such queries that are accepted by Diffix’s restricted subset of SQL, using as few conditions as possible.</p>
<h4 id="the-cohennissim-attack">The Cohen–Nissim Attack</h4>
<p>Instead of specifying each row with a separate condition, the Cohen–Nissim attack<a href="https://arxiv.org/abs/1810.05692">[CN18]</a> uses an ad hoc <em>hash function</em> to extract entropy from the data itself in order to systematically choose the needed subsets.<br />
Suppose we have a list of the values in the database’s <code class="language-plaintext highlighter-rouge">clientId</code> column, and we want to recover the <code class="language-plaintext highlighter-rouge">loanStatus</code> secret bit. Rather than explicitly enumerating the <code class="language-plaintext highlighter-rouge">clientId</code>s for a random subset of the rows to include in each query, we can write a boolean-valued function which evaluates to true on about half of the <code class="language-plaintext highlighter-rouge">clientId</code>s and ask Diffix to include only the rows for which the condition is true. In this way, instead of first choosing a subset of rows and then asking Diffix about those rows, we choose this function and use its evaluation to specify our random subset.</p>
<p>After some experimentation with the language restrictions, Cohen and Nissim settled on the following:</p>
<pre><code class="language-SQL">...
WHERE FLOOR(100 * ((clientId * 2)^0.7))
= FLOOR(100 * ((clientId * 2)^0.7) + 0.5)
</code></pre>
<p>Let’s see what this does. Let \(d=d_0.d_1 d_2 d_3 d_4 \dots \) be the decimal representation of the value \(d = (\mathtt{clientID}\cdot 2)^{0.7}\), which appears on both sides of the equality.
The expression is true if and only if \(d_3 < 5\).<br />
To see this, the left hand side evaluates to \(d_{0}d_{1}d_{2} = \lfloor 100d \rfloor\); the right hand side evaluates to \(d_{0}d_{1}d_{2}\) if \(d_3 < 5\) or \(d_{0}d_{1}d_{2}+1\) if \(d_3 \geq 5\). In the former case, the equality condition evaluates to ‘true’, and in the latter case it evaluates to ‘false’. Replacing 100 with other powers of 10 changes which digit in the decimal expansion is checked.</p>
<p>By varying the constants in the SQL query, this single expression yields a whole family of conditions, albeit a very ad-hoc one. The hope was that, for different primes \(q\) and fractional exponents \(p\), the individual digits of the decimal representations of \((\mathtt{clientID}*q)^p\) would be random enough for reconstruction to work.
The complete attack queries looked like this:</p>
<pre><code class="language-SQL">
SELECT COUNT(clientId) FROM loans
WHERE FLOOR(100 * ((clientId * 2)^.7))
= FLOOR(100 * ((clientId * 2)^.7) + 0.5)
AND clientId BETWEEN 2000 and 3000
AND loanStatus = 'C'
</code></pre>
<p>The range condition at the end simply selects a subset of the data which is small enough for the attack to run quickly on a personal computer but large enough to satisfy the requirements of the Diffix bounty program. This family of queries allows for a linear program to reconstruct the secret <code class="language-plaintext highlighter-rouge">loanStatus</code> bits with high accuracy.</p>
<p>In the course of verifying the attack for the Diffix bounty program, reconstruction was carried out on 4 different ranges of <code class="language-plaintext highlighter-rouge">clientId</code>s containing 455 records. For each record, the attack correctly determined whether or not the corresponding <code class="language-plaintext highlighter-rouge">loanStatus</code> was <code class="language-plaintext highlighter-rouge">C</code>.</p>
<p>Aircloak’s response to this attack was to further restrict the queries allowed by Diffix. Columns like <code class="language-plaintext highlighter-rouge">clientId</code>, where most of the values correspond to a single user, are tagged as ‘isolating’, and mathematical functions can no longer be used on such columns. The hope was that this modification would prevent the extraction of entropy from an identifying column via hashing.</p>
<h4 id="the-dickjosephschutzman-attack">The Dick–Joseph–Schutzman Attack</h4>
<p>Without the ability to directly use a uniquely identifying column from the database itself, we need another way to single out rows of the database. We can use an idea that’s been around since the 1990s, when Latanya Sweeney showed<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> that almost 90 percent of Americans could be identified with only a date of birth, ZIP code, and gender, but of course each of these alone is nowhere near sufficient to isolate a single individual. We can use this idea and try to evade the modification to Diffix by choosing multiple non-isolating columns which, when taken together, can isolate rows in the database.</p>
<p>This modified attack uses the <code class="language-plaintext highlighter-rouge">pickup_latitude</code> column in the <code class="language-plaintext highlighter-rouge">taxi</code> data set as the source of entropy, which is non-isolating, in part because there are a large number of rows where the value is recorded as zero. We can combine this column with the <code class="language-plaintext highlighter-rouge">trip_distance</code> column and run queries of the following form:</p>
<pre><code class="language-SQL">
SELECT COUNT(*) FROM rides
WHERE FLOOR(pickup_latitude ^ 8.789 + 0.5)
= FLOOR(pickup_latitude ^ 8.789)
AND trip_distance IN (0.87, 1.97, 2.75)
AND payment_type = 'CSH'
</code></pre>
<p>This example query is part of an attack to recover the <code class="language-plaintext highlighter-rouge">payment_type</code> column, which (for the purposes of this attack) is a binary column containing two values: <code class="language-plaintext highlighter-rouge">CRD</code> (for credit card payments) and <code class="language-plaintext highlighter-rouge">CSH</code> (for cash payments). The <code class="language-plaintext highlighter-rouge">IN (0.87, 1.97, 2.75)</code> restricts to a subset of the data with about 450 rows, each with a distinct value for <code class="language-plaintext highlighter-rouge">pickup_latitude</code>. However, because across the whole database, very few rows have a distinct value in this column, Diffix does not consider it ‘isolating’ and it can be used as Cohen–Nissim used <code class="language-plaintext highlighter-rouge">clientId</code>. The values in <code class="language-plaintext highlighter-rouge">pickup_latitude</code> are recorded to six decimal places of precision and the least significant four of them are essentially random digits. By choosing an appropriate range for the exponent and using the same trick as in the Cohen–Nissim attack, allows the construction of a Diffix-accepted query which includes around half of the rows in the targeted subset. Using different values for the exponent leads to a large family of queries which allow the attack to be carried out as before with similarly high accuracy of over 95 percent.</p>
<p>Dick–Joseph–Schutzman additionally extends this attack to recover <em>numerical</em> rather than just binary secret data. By using queries of the form</p>
<pre><code class="language-SQL">SELECT SUM(passenger_count) FROM rides ...
</code></pre>
<p>Diffix will return return noisy sums over the specified subset for a numeric column like <code class="language-plaintext highlighter-rouge">passenger_count</code>. Then, a similar linear program can reconstruct estimates for these values with high accuracy. For numeric columns like <code class="language-plaintext highlighter-rouge">passenger_count</code> which take on relatively few distinct values, the attack recovers the exact values with accuracy above 75 percent. Due to limitations in the Diffix bounty program rules which require perfect reconstruction of a value to be considered ‘accurate’, we didn’t evaluate the performance of the attack on numeric columns with richer values, such as <code class="language-plaintext highlighter-rouge">dropoff_latitude</code>.</p>
<p>Finally, this attack extends to one used to reconstruct string data character-by-character. A U.S. social security number consists of a string formatted like <code class="language-plaintext highlighter-rouge">xxx-xx-xxxx</code> with none unknown digits in three blocks separated by dashes. There are potentially one billion different strings that could appear in this column. However, by exploiting the structure of the data, a separate attack can be run to recover each digit individually using the summation attack, since there are only ten different values each digit could take. Queries of the form</p>
<pre><code class="language-SQL">
SELECT SUM(CAST(SUBSTRING(ssn, 3, 1) AS integer)) FROM rides ....
</code></pre>
<p>can be used to recover the 3rd digit from each row’s social security number. Running this attack for each digit then aggregating the individual guesses to construct a guess for each user’s entire social security number allows the attack to achieve perfect reconstruction on about 90 percent of the values. A similar attack worked on the <code class="language-plaintext highlighter-rouge">pickup_datetime</code> and <code class="language-plaintext highlighter-rouge">dropoff_datetime</code> columns, with separate attacks on the value in the seconds position, the minutes position, and so on, and finally piecing these together to correctly reconstruct about 85 percent of the values.</p>
<p>Again, Aircloak’s response was to restrict the query language. Both of the successful attacks relied on the use of some arithmetic inside of a <code class="language-plaintext highlighter-rouge">FLOOR</code> function to check whether or not a row is included in a particular query. Diffix now forbids the use of arithmetic with <em>bucketing functions</em> such as <code class="language-plaintext highlighter-rouge">FLOOR</code>, <code class="language-plaintext highlighter-rouge">CEIL</code>, <code class="language-plaintext highlighter-rouge">ROUND</code>, etc. This defeats strategies which choose random-ish subsets via this kind of hashing, but does not necessarily preclude the extraction of entropy from the data in other ways.</p>
<h4 id="whats-next">What’s Next?</h4>
<p>We’d again like to thank Aircloak for opening their system to attacks and critiques through the Diffix bounty program. By being so willing to expose their product in this way, they have provided a test bed for us to bridge the gap between theory and application and demonstrate how a linear reconstruction attack might work in practice. Vulnerability to these and other attacks are a potential threat to any data privacy system which does not account for the cumulative threat to privacy that may result from many seemingly-innocuous queries, not just Diffix. The attacks we describe here only require the attacker have access to some subset of the data with a sufficient amount of entropy, and while more entropy allows for more complete reconstruction, it may be possible to use something potentially very accessible like a list of users’ email addresses in this kind of attack to reconstruct a non-trivial amount of the secret data using queries against a system that adds independent noise to each query. Systems like this fall into the trap of the classic arms race, where a designer builds a system to protect against certain attacks, then a clever and determined adversary defeats the system, and the designer is forced to make revisions. This cycle may never terminate, leaving us perpetually unsure of when we can be confident that a system is secure enough to trust with sensitive data.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Descriptions of Diffix and Aircloak are based on <a href="https://www.aircloak.com">https://www.aircloak.com</a>, <a href="https://arxiv.org/pdf/1806.02075.pdf">https://arxiv.org/pdf/1806.02075.pdf</a>, <a href="https://demo.aircloak.com/docs/">https://demo.aircloak.com/docs/</a>, and the authors’ participation in the Aircloak bounty program. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The variance is proportional to the largest effect any single user has on the output. For <code class="language-plaintext highlighter-rouge">COUNT</code> queries, this largest contribution is 1, and for <code class="language-plaintext highlighter-rouge">SUM</code> queries, it’s roughly the magnitude of the largest value in the column. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Note that Diffix’s syntax restrictions don’t allow disjunctions (using <code class="language-plaintext highlighter-rouge">OR</code>s). An equivalent way of writing this that is allowed by Diffix would use <code class="language-plaintext highlighter-rouge">...WHERE ... AND clientId IN (2007, 2018,...)</code>. For such conditions, Diffix adds a noise layer for each element of the <code class="language-plaintext highlighter-rouge">IN</code> condition. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Sweeney, Latanya. “Simple demographics often identify people uniquely.” Health (San Francisco) 671.2000 (2000): 1-34. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Aloni CohenSasho NikolovZachary SchutzmanJonathan UllmanTue, 27 Oct 2020 00:11:38 -0400
https://differentialprivacy.org/diffix-attack/
https://differentialprivacy.org/diffix-attack/