Differential PrivacyWebsite for the differential privacy research community
https://differentialprivacy.org
What is δ, and what δifference does it make?<p>There are many variants or flavours of differential privacy (DP) some weaker than others: often, a given variant comes with own guarantees and “conversion theorems” to the others. As an example, “pure” DP has a single parameter \(\varepsilon\), and corresponds to a very stringent notion of DP:</p>
<blockquote>
<p>An algorithm \(M\) is \(\varepsilon\)-DP if, for all neighbouring inputs \(D,D'\) and all measurable \(S\), \( \Pr[ M(D) \in S ] \leq e^\varepsilon\Pr[ M(D’) \in S ] \).</p>
</blockquote>
<p>By relaxing this a little, one obtains the standard definition of approximate DP, a.k.a. \((\varepsilon,\delta)\)-DP:</p>
<blockquote>
<p>An algorithm \(M\) is \((\varepsilon,\delta)\)-DP if, for all neighbouring inputs \(D,D'\) and all measurable \(S\), \( \Pr[ M(D) \in S ] \leq e^\varepsilon\Pr[ M(D’) \in S ]+\delta \).</p>
</blockquote>
<p>This definition is very useful, as in many settings achieving the stronger \(\varepsilon\)-DP guarantee (i.e., \(\delta=0\)) is impossible, or comes at a very high utility cost. But how to interpret it? The above definition, on its face, doesn’t preclude what one may call “<em>catastrophic failures of privacy</em> 💥:” most of the time, things are great, but with some small probability \(\delta\) all hell breaks loose. For instance, the following algorithm is \((\varepsilon,\delta)\)-DP:</p>
<ul>
<li>Get a sensitive database \(D\) of \(n\) records</li>
<li>Select uniformly at random a fraction \(\delta\) of the database (\(\delta n\) records)</li>
<li>Output that subset of records in the clear 💥</li>
</ul>
<p>(actually, this is even \((0,\delta)\)-DP!). This sounds preposterous, and obviously something that one would want to avoid in practice (lest one wants to face very angry customers or constituents). This is one of the rules of thumb for picking \(\delta\) small enough (or even “cryptographically small”), typically \(\delta \ll 1/n\), so that the records are safe (hard to disclose \(\delta n \ll 1\) records).</p>
<p>So: good privacy most of the time, but with probably \(\delta\) then all bets are off.</p>
<p>However, those catastrophic failure of privacy, while technically allowed by the definition of \((\varepsilon,\delta)\)-DP, <strong>are not something that can really happen with the DP algorithms and techniques used both in practice and in theoretical work.</strong> Before explaining why, let’s see what is the kind of desirable behaviour one would expect: a <em>“smooth, manageable tradeoff of privacy parameters.”</em> For that discussion, let’s introduce the <em>privacy loss random variable</em>: given an algorithm M and two neighbouring inputs D,D’, let \(f(y)\) be defined as
\[
f(y) = \log\frac{\Pr[M(D)=y]}{\Pr[M(D’)=y]}
\]
for every possible output \(y\in\Omega\). Now, define the random variable \(Z := f(M(D))\) (implicitly, \(Z\) depends on \(D,D',M\)). This random variable quantifies how much observing the output of the algorithm \(M\) helps distinguishing between \(D\) and \(D'\).</p>
<p>Now, going a little bit fast, you can check that saying that \(M\) is \(\varepsilon\)-DP corresponds to the guarantee “<em>\(\Pr[Z > \varepsilon] = 0\) for all neighbouring inputs \(D,D'\).</em>”
Similarly, \(M\) being \((\varepsilon,\delta)\)-DP is the guarantee \(\Pr[Z > \varepsilon] \leq \delta\).\({}^{(\dagger)}\) For instance, the “catastrophic failure of privacy” corresponds to the scenario below, which depicts a possible distribution for \(Z\): \(Z\leq \varepsilon\) with probability \(1-\delta\), but then with probability \(\delta\) we have \(Z\gg 1\).</p>
<p><img src="/images/flavours-delta-fig1.png" width="600" alt="The type of (bad) distribution of Z corresponding to 'our catastrophic failure of privacy'" style="margin:auto;display: block;" /></p>
<p>What we would like is a smoother thing, where even when \(Z>\varepsilon\) is still remains reasonable and doesn’t immediately become large. A nice behaviour of the tails, ideally something like this:</p>
<p><img src="/images/flavours-delta-fig2.png" width="600" alt="A distribution for Z with nice tails, leading to smooth tradeoffs between ε and δ" style="margin:auto;display: block;" /></p>
<p>For instance, if we had a bound on \(\mathbb{E}[|Z|]\), we could use Markov’s inequality to get, well, <em>something</em>. For instance, imagine we had \(\mathbb{E}[|Z|]\leq \varepsilon\delta\): then
\[
\Pr[ |Z| > \varepsilon ] \leq \frac{\mathbb{E}[|Z|]}{\varepsilon }\leq \delta
\]
<em>(great! We have \((\varepsilon,\delta)\)-DP)</em>; but also \(\Pr[ |Z| > 10\varepsilon ] \leq \frac{\delta}{10}\). Privacy violations do not blow up out of proporxtion immediately, we can trade \(\varepsilon\) for \(\delta\). That seems like the type of behaviour we would like our algorithms to exhibit.</p>
<p><img src="/images/flavours-delta-fig3.png" width="600" alt="The type of privacy guarantees a Markov-type tail bound would give" style="margin:auto;display: block;" /></p>
<p>But why stop at Markov’s inequality then, which gives some nice but still weak tail bounds? Why not ask for <em>stronger</em>: Chebyshev’s inequality? Subexponential tail bounds? Hell, <em>subgaussian</em> tail bounds? This is, basically, what some stronger notions of differential privacy than approximate DP give.</p>
<ul>
<li>
<p><strong>Rényi DP</strong> <a href="https://arxiv.org/abs/1702.07476" title="Ilya Mironov. Renyi Differential Privacy. CSF 2017"><strong>[Mironov17]</strong></a>, for instance, is a guarantee on the moment-generating function (MGF) of the privacy random variable \(Z\): it has two parameters, \(\alpha>1\) and \(\tau\), and requires that \(\mathbb{E}[e^{(\alpha-1)Z}] \leq e^{(\alpha-1)\tau}\) for all neighbouring \(D,D'\). In turn, by applying for instance Markov’s inequality to the MGF of \(Z\), we can control the tail bounds, and get a nice, smooth tradeoff in terms of \((\varepsilon,\delta)\)-DP.</p>
</li>
<li>
<p><strong>Concentrated DP</strong> (CDP) <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun and Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016"><strong>[BS16]</strong></a> is an even stronger requirement, which roughly speaking requires the algorithm to be Rényi DP <em>simultaneously</em> for all \(1< \alpha \leq \infty\). More simply, this is “morally” a requirement on the MGF of \(Z\) which asks it to be subgaussian.</p>
</li>
</ul>
<p>The above two examples are not just fun but weird variants of DP: they actually capture the behaviour of many well-known differentially private algorithms, and in particular that of the Gaussian mechanism. While the guarantees they provide are less easy to state and interpret than \(\varepsilon\)-DP or \((\varepsilon,\delta)\)-DP, they are incredibly useful to analyze those algorithms, and enjoy very nice composition properties… and, of course, lead to that smooth tradeoff between \(\varepsilon\) and \(\delta\) for \((\varepsilon,\delta)\)-DP.</p>
<p><strong>To summarize:</strong></p>
<ul>
<li>\(\varepsilon\)-DP gives great guarantees, but is a very stringent requirement. Corresponds to the privacy loss random variable supported on \([-\varepsilon,\varepsilon]\) (no tails!)</li>
<li>\((\varepsilon,\delta)\)-DP gives guarantees easy to parse, but on its face allows for very bad behaviours. Corresponds to the privacy loss random variable in \([-\varepsilon,\varepsilon]\) with probability \(1-\delta\) (but outside, all bets are off!)</li>
<li>Rényi DP and Concentrated DP correspond to something in between, controlling the tails of the privacy loss random variable by a guarantee on its MGF. A bit harder to interpret, but capture the behaviour of many DP building blocks can be converted to \((\varepsilon,\delta)\)-DP (with nice trade-offs between \(\varepsilon\) and \(\delta\).</li>
</ul>
<hr />
<p>\({}^{(\dagger)}\) The astute reader may notice that this is not <em>quite</em> true. Namely, the guarantee \(\Pr[Z > \varepsilon] \leq \delta\) on the privacy loss random variable (PLRV) does imply \((\varepsilon,\delta)\)-differential privacy, but the converse does not hold. See, for instance, Lemma 9 of <a href="https://arxiv.org/abs/2004.00010" title="Clément L. Canonne, Gautam Kamath, Thomas Steinke. The Discrete Gaussian for Differential Privacy. NeurIPS 2020"><strong>[CKS20]</strong></a> for an exact characterization of \((\varepsilon,\delta)\)-DP in terms of the PLRV.</p>
Clément CanonneThu, 11 Mar 2021 21:00:00 -0400
https://differentialprivacy.org/flavoursofdelta/
https://differentialprivacy.org/flavoursofdelta/Conference Digest - TPDP 2020<p><a href="https://tpdp.journalprivacyconfidentiality.org/2020/">TPDP 2020</a> is a workshop focused on differential privacy. As such, it’s a great place to learn about recent developments in the DP research community.
It will be held on 13 November and is co-located with <a href="https://www.sigsac.org/ccs/CCS2020/">CCS</a>, but, of course, it’s virtual this year. <a href="https://www.sigsac.org/ccs/CCS2020/registration.html">Registration is only US$35 if you register by Friday, 30 October.</a> Check out the 8 excellent talks and 71 posters below – wow, the workshop has grown!</p>
<p>Please let us know if there are any errors or omissions.</p>
<h2 id="invited-talks">Invited Talks</h2>
<ul>
<li>
<p>OpenDP: A Community Effort to Build Trustworthy Differential Privacy Software.<br />
<a href="https://salil.seas.harvard.edu/">Salil Vadhan</a></p>
</li>
<li>
<p>Implementation with Base-2 DP or: How I learned to stop worrying and love floating point.
<a href="https://cilvento.org/">Christina Ilvento</a></p>
</li>
</ul>
<h2 id="contributed-talks">Contributed Talks</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2009.09052">Private Reinforcement Learning with PAC and Regret Guarantees</a><br />
Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07709">Auditing Differentially Private Machine Learning: How Private is Private SGD?</a><br />
Matthew Jagielski, Jonathan Ullman, Alina Oprea</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06783">Characterizing Private Clipped Gradient Descent on Convex Generalized Linear Problems</a><br />
Shuang Song, Om Thakkar, Abhradeep Thakurta</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.10941">Private Query Release Assisted by Public Data</a><br />
Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.00563">An Equivalence Between Private Classification and Online Prediction</a><br />
Mark Bun, Roi Livni, Shay Moran</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09745">Differentially Private Set Union</a><br />
Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin</p>
</li>
</ul>
<h2 id="posters">Posters</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2004.00010">The Discrete Gaussian for Differential Privacy</a><br />
Clément Canonne, Gautam Kamath, Thomas Steinke</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09465">Locally Private Hypothesis Selection</a><br />
Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Z. Steven Wu, Huanyu Zhang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.05839">LinkedIn’s Audience Engagements API: A Privacy Preserving Data Analytics System at Scale</a><br />
Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, Parvez Ahammad</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.14717">Differentially Private Decomposable Submodular Maximization</a><br />
Anamay Chaturvedi, Huy Nguyen, Lydia Zakynthinou</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.11934">Private Post-GAN Boosting</a><br />
Marcel Neunhoeffer, Z. Steven Wu, Cynthia Dwork</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.01100">Efficient, Noise-Tolerant, and Private Learning via Boosting</a><br />
Marco Carmosino, Mark Bun, Jessica Sorrell</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.04509">Closure Properties for Private Classification and Online Prediction</a><br />
by Noga Alon, Amos Beimel, Shay Moran, Uri Stemmer</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.12018">Overlook: Differentially Private Exploratory Visualization for Big Data</a><br />
Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, Matei Zaharia</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.01980">On the Equivalence between Online and Private Learnability beyond Binary Classification</a><br />
Young Hun Jung, Baekjin Kim and Ambuj Tewari</p>
</li>
<li>
<p><a href="https://dettanym.github.io/files/tpdp20_workshop_paper.pdf">Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration</a><br />
Miti Mazmudar, Thomas Humphries, Matthew Rafuse, Xi He</p>
</li>
<li>
<p><a href="https://drops.dagstuhl.de/opus/volltexte/2020/12026/">Bounded Leakage Differential Privacy</a><br />
Katrina Ligett, Charlotte Peale, Omer Reingold</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1909.06322">A Knowledge Transfer Framework for Differentially Private Sparse Learning</a><br />
Lingxiao Wang, Quanquan Gu</p>
</li>
<li>
<p>Consistent Integer, Non-Negative, Hierarchical Histograms without Integer Programming<br />
Cynthia Dwork, Christina Ilvento</p>
</li>
<li>
<p><a href="https://www.microsoft.com/en-us/research/uploads/prod/2020/03/intrinsic_privacy_tpdp.pdf">An Empirical Study on the Intrinsic Privacy of Stochastic Gradient Descent</a><br />
Stephanie Hyland, Shruti Tople</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2001.03618">Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation</a><br />
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, Abhradeep Thakurta</p>
</li>
<li>
<p>Improving Sparse Vector Technique with Renyi Differential Privacy<br />
Yuqing Zhu and Yu-Xiang Wang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.11707">Breaking the Communication-Privacy-Accuracy Trilemma</a><br />
Wei-Ning Chen, Peter Kairouz, Ayfer Özgür</p>
</li>
<li>
<p>Budget Sharing for Multi-Analyst Differential Privacy<br />
David Pujol, Yikai Wu, Brandon Fain, Ashwin Machanavajjhala</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1908.07643">AdaCliP: Adaptive Clipping for Private SGD</a><br />
Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.01181">Private Optimization Without Constraint Violation</a><br />
Andrés Muñoz Medina, Umar Syed, Sergei Vassilvitskii, Ellen Vitercik</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1912.06015">Efficient Per-Example Gradient Computations in Convolutional Neural Networks</a><br />
Gaspar Rochette, Andre Manoel, Eric Tramel</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.12674">Controlling Privacy Loss in Survey Sampling</a><br />
Audra McMillan, Mark Bun, Marco Gaboardi, Joerg Drechsler</p>
</li>
<li>
<p>Privacy-Preserving Community Detection under the Stochastic Block Model<br />
Jonathan Hehir, Aleksandra Slavkovic, Xiaoyue Niu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.10129">Smoothed Analysis of Differentially Private and Online Learning</a><br />
Nika Haghtalab, Tim Roughgarden, Abhishek Shetty</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09464">Private Mean Estimation for Heavy-Tailed Distributions</a><br />
Gautam Kamath, Vikrant Singhal, Jonathan Ullman</p>
</li>
<li>
<p><a href="https://link.springer.com/chapter/10.1007%2F978-3-030-57521-2_23">Private Posterior Inference Consistent with Public Information: a Case Study in Small Area Estimation from Synthetic Census Data</a><br />
Jeremy Seeman, Aleksandra Slavkovic, Matthew Reimherr</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2001.09122">Reasoning About Generalization via Conditional Mutual Information</a><br />
Thomas Steinke, Lydia Zakynthinou</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.15429">Understanding Gradient Clipping in Private SGD: A Geometric Perspective</a><br />
Xiangyi Chen, Z. Steven Wu, Mingyi Hong</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.06605">Privacy Amplification via Random Check-Ins</a><br />
Borja Balle, Peter Kairouz, Brendan McMahan, Om Thakkar, Abhradeep Thakurta</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.12603">Permute-and-flip: a new mechanism for differentially-private selection</a><br />
Ryan McKenna, Daniel Sheldon</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.02923">Descent-to-Delete: Gradient-Based Methods for Machine Unlearning</a><br />
Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.06529">A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f-Divergences</a><br />
Shahab Asoodeh, Jiachun Liao, Flavio Calmon, Oliver Kosut, Lalitha Sankar</p>
</li>
<li>
<p><a href="https://sites.tufts.edu/vrdi/files/2020/07/Slides-DP-Bhushan-Suwal-JN-Matthews-et-al.pdf">Census TopDown and the Redistricting Use Case</a><br />
Aloni Cohen, Moon Duchin, JN Matthews, Bhushan Suwal, Peter Wayner</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.04014">Interaction is Necessary for Distributed Learning with Privacy or Communication Constraints</a><br />
Yuval Dagan, Vitaly Feldman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10783">Fisher information under local differential privacy</a><br />
Leighton Barnes, Wei-Ning Chen, Ayfer Ozgur</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.06830">Differentially Private Assouad, Fano, and Le Cam</a><br />
Jayadev Acharya, Ziteng Sun, Huanyu Zhang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.13660">Learning discrete distributions: user vs item-level privacy</a><br />
Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1910.13659">Efficient Privacy-Preserving Stochastic Nonconvex Optimization</a><br />
Lingxiao Wang, Bargav Jayaraman, David Evans, Quanquan Gu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.03813">Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification</a><br />
Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee</p>
</li>
<li>
<p>Differentially private partition selection<br />
Damien Desfontaines, Bryant Gipson, Chinmoy Mandayam, James Voss</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.08007">Differentially Private Clustering: Tight Approximation Ratios</a><br />
Badih Ghazi, Ravi Kumar, Pasin Manurangsi</p>
</li>
<li>
<p>Let’s not make a fuzz about it<br />
Elisabet Lobo Vesga, Alejandro Russo, Marco Gaboardi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.12321">PAPRIKA: Private Online False Discovery Rate Control</a><br />
Wanrong Zhang, Gautam Kamath, Rachel Cummings</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2009.13689">Oblivious Sampling Algorithms for Private Data Analysis</a><br />
Sajin Sasy, Olga Ohrimenko</p>
</li>
<li>
<p>SOGDB-epsilon: Secure Outsourced Growing Database with Differentially Private Record Update<br />
Chenghong Wang, Kartik Nayak, Ashwin Machanavajjhala</p>
</li>
<li>
<p><a href="https://invertibleworkshop.github.io/accepted_papers/pdfs/41.pdf">Differentially Private Normalizing Flows for Privacy-Preserving Density Estimation</a><br />
Chris Waites, Rachel Cummings</p>
</li>
<li>
<p><a href="https://cs.uwaterloo.ca/~hsivasub/pub/TPDP2020.pdf">Differentially Private Sublinear Average Degree Approximation</a><br />
Harry Sivasubramaniam, Haonan Li, Xi He</p>
</li>
<li>
<p><a href="https://chong-l.github.io/MAPL_TNC_FL_ICML_2020.pdf">Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning</a><br />
Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06618">CoinPress: Practical Private Mean and Covariance Estimation</a><br />
Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.09481">Connecting Robust Shuffle Privacy and Pan-Privacy</a><br />
Victor Balcer, Albert Cheu, Matthew Joseph, Jieming Mao</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.11204">Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation</a><br />
Tsubasa Takahashi, Shun Takagi, Hajime Ono, Tatsuya Komatsu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07490">Understanding Unintended Memorization in Federated Learning</a><br />
Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Francoise Beaufays</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10630">Near Instance-Optimality in Differential Privacy</a><br />
Hilal Asi, John Duchi</p>
</li>
<li>
<p><a href="https://drive.google.com/file/d/1okHAkjNENiS2WfSKdkUo8B29yE8-Qfof/view">Implementing differentially private integer partitions via the exponential mechanism</a> and <a href="https://drive.google.com/file/d/1OytgB24d1n-xPIWrrKCsVQQdS7rV3tjn/view">Implementing Sparse Vector</a><br />
Christina Ilvento</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.05157">Differentially Private Simple Linear Regression</a><br />
Audra McMillan, Daniel Alabi, Jayshree Sarathy, Adam Smith, Salil Vadhan</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.05453">New Oracle-Efficient Algorithms for Private Synthetic Data Release</a><br />
Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Z. Steven Wu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07749">General-Purpose Differentially-Private Confidence Intervals</a><br />
Cecilia Ferrando, Shufan Wang, Daniel Sheldon</p>
</li>
<li>
<p>Central Limit Theorem and Uncertainty Principles for Differentially Private Query Answering<br />
Jinshuo Dong, Linjun Zhang, Weijie Su</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.13501">Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds</a><br />
Yingxue Zhou, Xiangyi Chen, Mingyi Hong, Z. Steven Wu, Arindam Banerjee</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1905.10335">Minimax Rates of Estimating Approximate Differential Privacy</a><br />
Xiyang Liu, Sewoong Oh</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.07740">Really Useful Synthetic Data – A Framework to Evaluate the Quality of Differentially Private Synthetic Data</a><br />
Christian Arnold, Marcel Neunhoeffer</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.10541">PAC learning with stable and private predictions</a><br />
Yuval Dagan, Vitaly Feldman</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.04656">Computing Local Sensitivities of Counting Queries with Joins</a><br />
Yuchao Tao, Xi He, Ashwin Machanavajjhala, Sudeepa Roy</p>
</li>
<li>
<p>Efficient Reductions for Differentially Private Multi-objective Regression<br />
Julius Adebayo, Daniel Alabi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.06667">The Pitfalls of Differentially Private Prediction in Healthcare</a><br />
Vinith Suriyakumar, Nicolas Papernot, Anna Goldenberg, Marzyeh Ghassemi</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.14191">Tempered Sigmoid Activations for Deep Learning with Differential Privacy</a><br />
Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, Úlfar Erlingsson</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10881">Revisiting Membership Inference Under Realistic Assumptions</a><br />
Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2009.04013">Attribute Privacy: Framework and Mechanisms</a><br />
Wanrong Zhang, Olga Ohrimenko, Rachel Cummings</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.10664">DuetSGX: Differential Privacy with Secure Hardware</a><br />
Phillip Nguyen, Alex Silence, David Darais, Joseph Near</p>
</li>
<li>
<p><a href="https://pdfs.semanticscholar.org/4319/65b3c5a47cf8bfd30f1c30cd044382e98d68.pdf">A Programming Framework for OpenDP</a><br />
Marco Gaboardi, Michael Hay, Salil Vadhan</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.09352">A One-Pass Private Sketch for Most Machine Learning Tasks</a><br />
Benjamin Coleman, Anshumali Shrivastava</p>
</li>
<li>
<p>Model-Agnostic Private Learning with Domain Adaptation<br />
Yuqing Zhu, Chong Liu, Yu-Xiang Wang</p>
</li>
</ul>
Thomas SteinkeWed, 28 Oct 2020 00:01:00 +0000
https://differentialprivacy.org/tpdp2020/
https://differentialprivacy.org/tpdp2020/Reconstruction Attacks in Practice<p>This is the second of two posts describing the theory and practice of reconstruction attacks. To read the first post, which covers the theoretical basis of such attacks, <a href="https://differentialprivacy.org/reconstruction-theory/">[click here]</a>.</p>
<hr />
<p>In the <a href="https://differentialprivacy.org/reconstruction-theory/">last post</a>, we discussed how an attacker can use noisy answers to questions about a database to reconstruct private information in the database. The reconstruction attack framework was:</p>
<ol>
<li>The attacker submits sufficiently random queries that link prior information (which the attacker already knows) to private data (which the attacker wants to learn).</li>
<li>The attacker receives noisy answers to these queries and writes them down as constraints for a linear program to solve for the private bits.</li>
<li>The attacker solves the linear program and rounds the result to recover most of the bits.</li>
</ol>
<p>Our last post discussed some of this attack’s nice theoretical guarantees, and this post matches that with real-world performance. More specifically, we’ll cover two successful applications of this attack against a piece of anonymizing SQL software called Diffix which, despite the name, is not differentially private.</p>
<h3 id="what-is-diffix">What is Diffix?</h3>
<p>Diffix is a system designed by the startup Aircloak for answering statistical queries over a private database. It is described by its creators as an “anonymizing SQL interface [that] sits in front of your data and enables you to conduct ad hoc analytics — fully privacy preserving and GDPR-compliant.”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> Aircloak’s approach is to develop targeted defenses for known vulnerabilities, but to otherwise privilege utility over protecting against unknown vulnerabilities. They combine this approach with a serious effort to actually find vulnerabilities in Diffix through periodic bug bounties that offer monetary prizes for participants who mount successful attacks. While this post is critical of the design of Diffix itself, we commend Aircloak for their genuine openness to scrutiny. Indeed, the attacks described in this post were carried out as a part of these bug bounty programs and led to the discovery of several vulnerabilities in the software that have since been addressed. The first attack we describe was carried out by Aloni Cohen and Kobbi Nissim in the first bug bounty program in late 2017 and early 2018. The second was run by Travis Dick, Matthew Joseph, and Zachary Schutzman in the second bug bounty program during the summer of 2020.</p>
<p>Before diving into the details of the attacks, we’ll first introduce the basic functionality of Diffix and how it purports to defend against vulnerabilities, including linear reconstruction attacks. The goal of Diffix is to answer SQL queries, such as:</p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND clientId BETWEEN 2000 and 3000
</code></pre>
<p>on a database while preventing the disclosure of record-level data.<br />
A challenge for a system like Diffix is to answer such counting queries while preventing an adversarial user—the attacker—from learning record-level information. As you might remember from the last post, such a system must not provide exact answers to arbitrary queries. Otherwise the attacker could mount a <em>differencing attack</em>. For example, an attacker who knows that Billy Joel’s <code class="language-plaintext highlighter-rouge">clientID</code> is 2744 could learn the status of the singer’s loan by comparing the answer to the previous query with the answer to:</p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND clientId BETWEEN 2000 and 3000
AND clientId != 2744
</code></pre>
<p>An intuitive defense is to add noise to the answer—say, Gaussian noise sampled from \(N(0,10)\).
Now the difference \(\Delta\) in the responses to the two queries is a random variable sampled from \(N(1,20)\) or \(N(0,20)\) depending on whether Joel’s <code class="language-plaintext highlighter-rouge">loanStatus</code> is or isn’t <code class="language-plaintext highlighter-rouge">C</code>.
With just one sample, the distributions are hard to distinguish.</p>
<p>Still, this scheme is easily thwarted by <em>averaging attacks</em>.
If the noise is sampled anew each time a query is made, then repeatedly making the same pair of queries generates many independent samples from \(N(1,20)\) or \(N(0,20)\), and enough queries would make it possible to distinguish these distributions easily.</p>
<p>As before, there is an intuitive defense: use the same noise for repeated queries. This defense introduces its own new attacks by making many syntactically-distinct but semantically-equivalent queries. Those attacks in turn suggest new defenses which suggest new attacks, and so on. Diffix is, in a sense, the result of this hypothetical arms race.</p>
<p>From a technical perspective, Diffix consists of three components, which together are intended to thwart these attacks. First, Diffix only accepts a limited subset of SQL and will categorically reject any query that does not fit this subset. These restrictions—including tight restrictions on <code class="language-plaintext highlighter-rouge">JOIN</code>s and on the number of mathematical functions in a single expression—limit the ability of an adversary to use the full power of SQL to access the database. The second component is a collection of data-dependent ad-hoc methods to prevent leaking information about individuals or very small subsets of users, including suppressing answers to queries about small numbers of users and flattening outliers.</p>
<p>The final component is Diffix’s layered noise. This noise is comprised of two individual noise terms added together: a <em>data-dependent</em> term whose variance is constant<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> and a <em>query-dependent</em> term whose variance depends on the complexity of the query. The data-dependent noise prevents naïve averaging attacks. It is a pseudorandom error where the seed of the pseudorandom function depends on individual data records that contribute to the query result. Semantically equivalent queries using different syntax will nonetheless share this error, so simply averaging the responses will not remove this noise.</p>
<p>The query-dependent noise prevents a naïve Dinur–Nissim style reconstruction attacks. A noise term of magnitude \(\Omega(1)\) is generated deterministically for each condition in the <code class="language-plaintext highlighter-rouge">WHERE</code> or <code class="language-plaintext highlighter-rouge">HAVING</code> clause of the SQL query, and the terms are added together. A Dinur–Nissim query is a random subset of the dataset that contains \(\Omega(n)\) records. The straightforward way of specifying such a query is to enumerate the subset record by record using \(\Omega(n)\) conditions:<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
<pre><code class="language-SQL">SELECT COUNT(*) FROM loans
WHERE loanStatus = 'C'
AND (clientId = 2007
OR clientId = 2018
...
OR clientId = 2991)
</code></pre>
<p>A query with \(\Omega(n)\) conditions is answered with noise with standard deviation \(\Omega(\sqrt{n})\), enough to thwart efficient reconstruction algorithms.</p>
<h3 id="carrying-out-reconstruction">Carrying out reconstruction</h3>
<p>The additional noise per SQL condition is the main obstacle to running a successful reconstruction attack on a database behind Diffix. As described above, the noise prevents the naive implementation of the reconstruction algorithm from receiving accurate enough answers to reconstruct the database using a reasonable number of queries.<br />
A natural approach is to use very few SQL conditions—ideally, just one—to make random-enough queries, each identifying a subset of the records in the dataset.
So the challenge is to formulate a large family of such queries that are accepted by Diffix’s restricted subset of SQL, using as few conditions as possible.</p>
<h4 id="the-cohennissim-attack">The Cohen–Nissim Attack</h4>
<p>Instead of specifying each row with a separate condition, the Cohen–Nissim attack<a href="https://arxiv.org/abs/1810.05692">[CN18]</a> uses an ad hoc <em>hash function</em> to extract entropy from the data itself in order to systematically choose the needed subsets.<br />
Suppose we have a list of the values in the database’s <code class="language-plaintext highlighter-rouge">clientId</code> column, and we want to recover the <code class="language-plaintext highlighter-rouge">loanStatus</code> secret bit. Rather than explicitly enumerating the <code class="language-plaintext highlighter-rouge">clientId</code>s for a random subset of the rows to include in each query, we can write a boolean-valued function which evaluates to true on about half of the <code class="language-plaintext highlighter-rouge">clientId</code>s and ask Diffix to include only the rows for which the condition is true. In this way, instead of first choosing a subset of rows and then asking Diffix about those rows, we choose this function and use its evaluation to specify our random subset.</p>
<p>After some experimentation with the language restrictions, Cohen and Nissim settled on the following:</p>
<pre><code class="language-SQL">...
WHERE FLOOR(100 * ((clientId * 2)^0.7))
= FLOOR(100 * ((clientId * 2)^0.7) + 0.5)
</code></pre>
<p>Let’s see what this does. Let \(d=d_0.d_1 d_2 d_3 d_4 \dots \) be the decimal representation of the value \(d = (\mathtt{clientID}\cdot 2)^{0.7}\), which appears on both sides of the equality.
The expression is true if and only if \(d_3 < 5\).<br />
To see this, the left hand side evaluates to \(d_{0}d_{1}d_{2} = \lfloor 100d \rfloor\); the right hand side evaluates to \(d_{0}d_{1}d_{2}\) if \(d_3 < 5\) or \(d_{0}d_{1}d_{2}+1\) if \(d_3 \geq 5\). In the former case, the equality condition evaluates to ‘true’, and in the latter case it evaluates to ‘false’. Replacing 100 with other powers of 10 changes which digit in the decimal expansion is checked.</p>
<p>By varying the constants in the SQL query, this single expression yields a whole family of conditions, albeit a very ad-hoc one. The hope was that, for different primes \(q\) and fractional exponents \(p\), the individual digits of the decimal representations of \((\mathtt{clientID}*q)^p\) would be random enough for reconstruction to work.
The complete attack queries looked like this:</p>
<pre><code class="language-SQL">
SELECT COUNT(clientId) FROM loans
WHERE FLOOR(100 * ((clientId * 2)^.7))
= FLOOR(100 * ((clientId * 2)^.7) + 0.5)
AND clientId BETWEEN 2000 and 3000
AND loanStatus = 'C'
</code></pre>
<p>The range condition at the end simply selects a subset of the data which is small enough for the attack to run quickly on a personal computer but large enough to satisfy the requirements of the Diffix bounty program. This family of queries allows for a linear program to reconstruct the secret <code class="language-plaintext highlighter-rouge">loanStatus</code> bits with high accuracy.</p>
<p>In the course of verifying the attack for the Diffix bounty program, reconstruction was carried out on 4 different ranges of <code class="language-plaintext highlighter-rouge">clientId</code>s containing 455 records. For each record, the attack correctly determined whether or not the corresponding <code class="language-plaintext highlighter-rouge">loanStatus</code> was <code class="language-plaintext highlighter-rouge">C</code>.</p>
<p>Aircloak’s response to this attack was to further restrict the queries allowed by Diffix. Columns like <code class="language-plaintext highlighter-rouge">clientId</code>, where most of the values correspond to a single user, are tagged as ‘isolating’, and mathematical functions can no longer be used on such columns. The hope was that this modification would prevent the extraction of entropy from an identifying column via hashing.</p>
<h4 id="the-dickjosephschutzman-attack">The Dick–Joseph–Schutzman Attack</h4>
<p>Without the ability to directly use a uniquely identifying column from the database itself, we need another way to single out rows of the database. We can use an idea that’s been around since the 1990s, when Latanya Sweeney showed<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> that almost 90 percent of Americans could be identified with only a date of birth, ZIP code, and gender, but of course each of these alone is nowhere near sufficient to isolate a single individual. We can use this idea and try to evade the modification to Diffix by choosing multiple non-isolating columns which, when taken together, can isolate rows in the database.</p>
<p>This modified attack uses the <code class="language-plaintext highlighter-rouge">pickup_latitude</code> column in the <code class="language-plaintext highlighter-rouge">taxi</code> data set as the source of entropy, which is non-isolating, in part because there are a large number of rows where the value is recorded as zero. We can combine this column with the <code class="language-plaintext highlighter-rouge">trip_distance</code> column and run queries of the following form:</p>
<pre><code class="language-SQL">
SELECT COUNT(*) FROM rides
WHERE FLOOR(pickup_latitude ^ 8.789 + 0.5)
= FLOOR(pickup_latitude ^ 8.789)
AND trip_distance IN (0.87, 1.97, 2.75)
AND payment_type = 'CSH'
</code></pre>
<p>This example query is part of an attack to recover the <code class="language-plaintext highlighter-rouge">payment_type</code> column, which (for the purposes of this attack) is a binary column containing two values: <code class="language-plaintext highlighter-rouge">CRD</code> (for credit card payments) and <code class="language-plaintext highlighter-rouge">CSH</code> (for cash payments). The <code class="language-plaintext highlighter-rouge">IN (0.87, 1.97, 2.75)</code> restricts to a subset of the data with about 450 rows, each with a distinct value for <code class="language-plaintext highlighter-rouge">pickup_latitude</code>. However, because across the whole database, very few rows have a distinct value in this column, Diffix does not consider it ‘isolating’ and it can be used as Cohen–Nissim used <code class="language-plaintext highlighter-rouge">clientId</code>. The values in <code class="language-plaintext highlighter-rouge">pickup_latitude</code> are recorded to six decimal places of precision and the least significant four of them are essentially random digits. By choosing an appropriate range for the exponent and using the same trick as in the Cohen–Nissim attack, allows the construction of a Diffix-accepted query which includes around half of the rows in the targeted subset. Using different values for the exponent leads to a large family of queries which allow the attack to be carried out as before with similarly high accuracy of over 95 percent.</p>
<p>Dick–Joseph–Schutzman additionally extends this attack to recover <em>numerical</em> rather than just binary secret data. By using queries of the form</p>
<pre><code class="language-SQL">SELECT SUM(passenger_count) FROM rides ...
</code></pre>
<p>Diffix will return return noisy sums over the specified subset for a numeric column like <code class="language-plaintext highlighter-rouge">passenger_count</code>. Then, a similar linear program can reconstruct estimates for these values with high accuracy. For numeric columns like <code class="language-plaintext highlighter-rouge">passenger_count</code> which take on relatively few distinct values, the attack recovers the exact values with accuracy above 75 percent. Due to limitations in the Diffix bounty program rules which require perfect reconstruction of a value to be considered ‘accurate’, we didn’t evaluate the performance of the attack on numeric columns with richer values, such as <code class="language-plaintext highlighter-rouge">dropoff_latitude</code>.</p>
<p>Finally, this attack extends to one used to reconstruct string data character-by-character. A U.S. social security number consists of a string formatted like <code class="language-plaintext highlighter-rouge">xxx-xx-xxxx</code> with none unknown digits in three blocks separated by dashes. There are potentially one billion different strings that could appear in this column. However, by exploiting the structure of the data, a separate attack can be run to recover each digit individually using the summation attack, since there are only ten different values each digit could take. Queries of the form</p>
<pre><code class="language-SQL">
SELECT SUM(CAST(SUBSTRING(ssn, 3, 1) AS integer)) FROM rides ....
</code></pre>
<p>can be used to recover the 3rd digit from each row’s social security number. Running this attack for each digit then aggregating the individual guesses to construct a guess for each user’s entire social security number allows the attack to achieve perfect reconstruction on about 90 percent of the values. A similar attack worked on the <code class="language-plaintext highlighter-rouge">pickup_datetime</code> and <code class="language-plaintext highlighter-rouge">dropoff_datetime</code> columns, with separate attacks on the value in the seconds position, the minutes position, and so on, and finally piecing these together to correctly reconstruct about 85 percent of the values.</p>
<p>Again, Aircloak’s response was to restrict the query language. Both of the successful attacks relied on the use of some arithmetic inside of a <code class="language-plaintext highlighter-rouge">FLOOR</code> function to check whether or not a row is included in a particular query. Diffix now forbids the use of arithmetic with <em>bucketing functions</em> such as <code class="language-plaintext highlighter-rouge">FLOOR</code>, <code class="language-plaintext highlighter-rouge">CEIL</code>, <code class="language-plaintext highlighter-rouge">ROUND</code>, etc. This defeats strategies which choose random-ish subsets via this kind of hashing, but does not necessarily preclude the extraction of entropy from the data in other ways.</p>
<h4 id="whats-next">What’s Next?</h4>
<p>We’d again like to thank Aircloak for opening their system to attacks and critiques through the Diffix bounty program. By being so willing to expose their product in this way, they have provided a test bed for us to bridge the gap between theory and application and demonstrate how a linear reconstruction attack might work in practice. Vulnerability to these and other attacks are a potential threat to any data privacy system which does not account for the cumulative threat to privacy that may result from many seemingly-innocuous queries, not just Diffix. The attacks we describe here only require the attacker have access to some subset of the data with a sufficient amount of entropy, and while more entropy allows for more complete reconstruction, it may be possible to use something potentially very accessible like a list of users’ email addresses in this kind of attack to reconstruct a non-trivial amount of the secret data using queries against a system that adds independent noise to each query. Systems like this fall into the trap of the classic arms race, where a designer builds a system to protect against certain attacks, then a clever and determined adversary defeats the system, and the designer is forced to make revisions. This cycle may never terminate, leaving us perpetually unsure of when we can be confident that a system is secure enough to trust with sensitive data.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Descriptions of Diffix and Aircloak are based on <a href="https://www.aircloak.com">https://www.aircloak.com</a>, <a href="https://arxiv.org/pdf/1806.02075.pdf">https://arxiv.org/pdf/1806.02075.pdf</a>, <a href="https://demo.aircloak.com/docs/">https://demo.aircloak.com/docs/</a>, and the authors’ participation in the Aircloak bounty program. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The variance is proportional to the largest effect any single user has on the output. For <code class="language-plaintext highlighter-rouge">COUNT</code> queries, this largest contribution is 1, and for <code class="language-plaintext highlighter-rouge">SUM</code> queries, it’s roughly the magnitude of the largest value in the column. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Note that Diffix’s syntax restrictions don’t allow disjunctions (using <code class="language-plaintext highlighter-rouge">OR</code>s). An equivalent way of writing this that is allowed by Diffix would use <code class="language-plaintext highlighter-rouge">...WHERE ... AND clientId IN (2007, 2018,...)</code>. For such conditions, Diffix adds a noise layer for each element of the <code class="language-plaintext highlighter-rouge">IN</code> condition. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Sweeney, Latanya. “Simple demographics often identify people uniquely.” Health (San Francisco) 671.2000 (2000): 1-34. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Aloni CohenSasho NikolovZachary SchutzmanJonathan UllmanTue, 27 Oct 2020 00:11:38 -0400
https://differentialprivacy.org/diffix-attack/
https://differentialprivacy.org/diffix-attack/The Theory of Reconstruction Attacks<p>We often see people asking whether or not differential privacy might be overkill. Why do we need strong privacy protections like differential privacy when we’re only releasing approximate, aggregate statistical information about a dataset? Is it really possible to extract information about specific users from releasing these statistics? The answer turns out to be a resounding yes! The textbook by Dwork and Roth <a href="https://www.cis.upenn.edu/~aaroth/privacybook.html">[DR14]</a> calls this phenomenon the Fundamental Law of Information Recovery:</p>
<blockquote>
<p>Giving overly accurate answers to too many questions will inevitably destroy privacy.</p>
</blockquote>
<p>So what exactly does this fundamental law mean precisely, and how can we prove it? We can formalize and prove the law via <em>reconstruction attacks</em>, where an attacker can recover secret information from nearly every user in the dataset, simply by observing noisy answers to a modestly large number of (surprisingly simple) queries on the dataset. Reconstruction attacks were introduced in a seminal paper by Dinur and Nissim in 2003 <a href="https://dl.acm.org/doi/10.1145/773153.773173">[DN03]</a>. Although this paper predates differential privacy by a few years, the discovery of reconstruction attacks directly led to the definition of differential privacy, and shaped a lot of the early research on the topic. We now know that differentially private algorithms can, in some cases, match the limitations on accuracy implied by reconstruction attacks. When this is the case, we have a remarkably sharp transition from a blatant privacy violation when the accuracy is high enough to enable a reconstruction attack, to the strong protection given by differential privacy at the cost of only slightly lower accuracy.</p>
<p>Aside from the theoretical importance of reconstruction attacks, one may wonder if they can be carried out in practice, or if the attack model is unrealistic and can be avoided with some simple workarounds? In this series of posts, we argue that reconstruction attacks can be quite practical. In particular, we describe successful attacks by some of this post’s authors on a family of systems called <em>Diffix</em>, that attempt to prevent reconstruction without introducing as much noise as the reconstruction attacks suggest is necessary. To the best of our knowledge, these attacks represent the first successful attempt to reconstruct data from a commercial statistical-database system that is specifically designed to protect the privacy of the underlying data. A larger and much more significant demonstration of the practical power of reconstruction attacks was carried out by the US Census Bureau in 2018, motivating the Bureau’s adoption of differential privacy for data products derived from the 2020 decennial census <a href="https://queue.acm.org/detail.cfm?ref=rss&id=3295691">[GAM18]</a>.</p>
<p>This series will come in two parts: In this post, we will review the theory of reconstruction attacks, and present a model for reconstruction attacks that corresponds more directly to real attacks than the one that is typically presented. In the second post, we will describe attacks that were launched against various iterations of the <em>Diffix</em> system. \(
\newcommand{\uni}{\mathcal{X}} % The universe
\newcommand{\usize}{T} % Universe size
\newcommand{\elem}{x} % Generic universe element.
\newcommand{\pbs}{z} %Non-secret bits
\newcommand{\pbsuni}{\mathcal{Z}}
\renewcommand{\sb}{b} % Secret bit
\newcommand{\pds}{Z} %non-secret part of the data set
\newcommand{\ddim}{d} % Data dimension
\newcommand{\queries}{Q} % A set/workload of queries
\newcommand{\qmat}{\mat{Q}} % Query matrix
\newcommand{\qent}{w} % Entry of the query matrix
\newcommand{\hist}{h} % Histogram vector
\newcommand{\mech}{\mathcal{M}} % Generic Mechanism
\newcommand{\query}{q}
\newcommand{\queryfunc}{\varphi}
\newcommand{\ans}{a} % query answer
\newcommand{\qsize}{k}
\newcommand{\ds}{X}
\newcommand{\dsrow}{\elem} % same as elem above
\newcommand{\dsize}{n}
\newcommand{\priv}{\eps}
\newcommand{\privd}{\delta}
\newcommand{\acc}{\alpha}
\newcommand{\from}{:}
\newcommand{\set}[1]{\left{#1\right}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\var}{\mathrm{Var}}
\newcommand{\I}{\mathbb{I}}
\newcommand{\tr}{\mathrm{Tr}}
\newcommand{\eps}{\varepsilon}
\newcommand{\pmass}{\mathbbm{1}}
\newcommand{\zo}{\{0,1\}}
\newcommand{\mat}[1]{#1} % matrix notation: for now nothing
\)</p>
<h3 id="a-model-of-reconstruction-attacks">A Model of Reconstruction Attacks</h3>
<p>This part presents the basic theory of reconstruction attacks. We’ll introduce a model of reconstruction attacks that is a little different from what you would see if you read the papers, and then describe the main results of Dinur and Nissim. At the end we will briefly mention some variations that have been considered in the nearly two decades since.</p>
<p>Let us fix a dataset model, so that we can describe the attack precisely. (These attacks are very flexible and the ideas can usually be adapted to new models, as we’ll see at the end of this part.) We take the dataset to be a collection of \(\dsize\) records \(\ds = \{\elem_1,\dots,\elem_n\}\), each corresponding to the data of a single person. The attacker’s goal is to learn some piece of secret information about as many individuals as possible, so we think of each record as having the form \(\elem_i = (\pbs_i,\sb_i)\) where \(\pbs_i\) is some identifying information, and \(\sb_i \in \zo\) is some secret. We assume that the secret is binary, although this aspect of the model can be generalized. We can visualize such a dataset as a matrix \([\pds \mid \sb]\) with two blocks as follows:
\[ \left[ \begin{array}{c|c} \pbs_1 & \sb_1 \\ \vdots & \vdots \\ \pbs_n & \sb_n \end{array} \right] \]
For a concrete example, suppose each element in the dataset contains \(d\) binary attributes, and the attacker’s goal is to learn the last attribute of each user. In this case we would write each element as a pair \((\pbs_i, \sb_i)\) where \(\pbs_i \in \zo^{d-1}\) and \(\sb_i \in \zo\).</p>
<p>Note that this distinction between \(\pbs_i\) and \(\sb_i\) is only in the mind of the attacker, who has some prior information about the users, but is trying to learn some specific secret information. In order to make the attack simpler to describe, we will also assume that the attacker knows \(\pbs_1,\dots,\pbs_\dsize\), which is everything about the dataset except the secret bits, although this assumption can also be relaxed to a large extent. As a shorthand, we will refer to \(\pbs_1, \ldots, \pbs_\dsize\) as the prior information, and to \(\sb_1, \ldots,\sb_\dsize\) as the secret bits.</p>
<p>Our goal is to understand whether asking aggregate queries defined by the prior information can allow an attacker to learn non-trivial information about the secret bits. Perhaps the most basic type of aggregate query we can ask is a <em>counting query</em>, which is a query that asks what number of the data points satisfy a given property. The Dinur-Nissim attacks assume that the attacker can get approximate answers to a type of counting queries that ask how many data points satisfy some property defined in terms of the prior information, and also have the sensitive bit set to \(1\). Let us use the notation \(\pbsuni\) for the set of all possible values that the prior information can take. For the purposes of the attack, each query \(\query\) will be specified by a function \(\queryfunc \from \pbsuni \to \zo\) and have the specific form
\[
\query(\ds) = \sum_{j=1}^{\dsize} \queryfunc(\pbs_j) \cdot \sb_j.
\]
This is a good time to make one absolutely crucial point about this model, which is that</p>
<blockquote>
<p>all the users are treated completely symmetrically by the queries, and the attacker cannot issue a query that targets a specific user \(x_i\) by name or a specific subset of users. The different users are distinguished only by their data. Nonetheless, we will see how to learn information about specific users from the answers to these queries.</p>
</blockquote>
<p>Returning to our example with binary attributes, consider the very natural set of queries that asks for the inner product of the secret bits with each attribute in the prior information, which is a measure of the correlation between these two attributes. Then each query takes the form \(\query_i(\ds) = \sum_{j=1}^{n} \pbs_{j,i} \cdot \sb_{j}\).</p>
<p>The nice thing about this type of query is that we can express the answers to a set of queries \({\query_1,\dots,\query_\qsize}\) defined by \(\queryfunc_1, \ldots, \queryfunc_\qsize\) as the following matrix-vector product \(\qmat_{\pds}\cdot \mat{b}\):
\[ \left[ \begin{array}{c}\query_1(\ds) \\ \vdots \\ \query_\qsize(\ds) \end{array} \right] = \left[ \begin{array}{ccc} \queryfunc_1(\pbs_1) & \dots & \queryfunc_1(\pbs_\dsize) \\ \vdots & \ddots & \vdots \\ \queryfunc_\qsize(\pbs_1) & \dots & \queryfunc_k(\pbs_\dsize) \end{array} \right] \left[ \begin{array}{c} \sb_1 \\ \vdots \\ \sb_n \end{array} \right]
\]
so we can study this model using tools from linear algebra.</p>
<h3 id="an-inefficient-attack">An Inefficient Attack</h3>
<p>Exact answers to such queries are clearly revealing, because, the attacker can use the predicates \[ \queryfunc_i(z) = \begin{cases} 1 & \textrm{if } \pbs = \pbs_i \\ 0 & \textrm{otherwise} \end{cases} \] to single out a specific user and receive their bit \(\sb_i\). It is less obvious, however, that an attacker can learn a lot about the private bits even given noisy answers to the queries.</p>
<p>The first Dinur-Nissim attack shows that this is indeed possible—if the attacker can ask an unbounded number of counting queries, and each query is answered with, for example, 5% error, then the attacker can reconstruct 80% of the secret bits. This attack requires exponentially many queries to run, making it somewhat impractical, but it is a proof of concept that an attack can reconstruct a large amount of private information even from very noisy statistics. Later we will see how to scale down the attack to use fewer queries at the cost of requiring more accurate answers.</p>
<p>The attack itself is quite simple:</p>
<ul>
<li>
<p>For simplicity, assume all the \(\pbs_1, \ldots, \pbs_\dsize\) are distinct so that each user is uniquely identified by the prior information.</p>
</li>
<li>
<p>The attacker chooses the queries \(\query_1, \ldots, \query_\qsize\) so that the matrix \(\qmat_\pds\) has as its rows all of \(\zo^\dsize\). Namely, \(\qsize=2^\dsize\) and the functions \(\queryfunc_1, \ldots, \queryfunc_\qsize\) defining the queries take all possible values on \(\pbs_1, \ldots, \pbs_\dsize\).</p>
</li>
<li>
<p>The attacker receives a vector \(\ans\) of noisy answers to the queries, where \( |\query_{i}(\ds) - \ans_{i}| < \acc \dsize \) for each query \( \query_i \). In matrix notation, this means \[ \max_{i = 1}^\qsize |(\qmat_\pds\cdot {\sb})_i -\ans_i|= \| \qmat_\pds \cdot \sb -\ans\|_\infty \leq \alpha \dsize. \]
Note that, for \(\{0,1\}\)-valued queries, the answers range from \(0\) to \(\dsize\), so answers with additive error \(\pm 5\%\) corresponds to \(\acc = 0.05\).</p>
</li>
<li>
<p>Finally, the attacker outputs any guess \(\hat{\sb} = (\hat{\sb}_{1}, \ldots, \hat{\sb}_{n})\) of the private bits vector that is consistent with the answers and the additive error bound \(\acc\). In other words, \(\hat{\sb}\) just needs to satisfy \[\max_{i = 1}^\qsize |\ans_i - (\qmat_\pds\cdot \hat{\sb})_i|= \| \qmat_\pds \cdot \hat\sb - a \|_{\infty} \leq \alpha \dsize \]
Note that a solution always exists, since the true private bits \(\sb\) will do.</p>
</li>
</ul>
<p>Our claim is that any such guess \(\hat{b}\) in fact agrees with the true private bits \(b\) for all but \(4\acc \dsize\) of the users. The reason is that if \(\hat{\sb}\) disagreed with more than \(4\acc \dsize\) of the secret bits, then the answer to some query would have eliminated \(\hat{\sb}\) from contention. To see this, fix some \(\hat{\sb}\in \zo^\dsize\), and let \[ S_{01} = \{j: \hat{\sb}_j = 0, \sb_j = 1\} \textrm{ and } S_{10} = \{j: \hat{\sb}_j = 1, \sb_j = 0\}\]
If \(\hat{\sb}\) and \(\sb\) disagree on more than \(4\acc \dsize\) bits, then at least one of these two sets has size larger than \(2\acc \dsize\). Let us assume that this set is \(S_{01}\), and we’ll deal with the other case by symmetry. Suppose that the \(i\)-th row of \(\qmat_\pds\) is the indicator vector of \(S_{01}\), i.e., \[(\qmat_\pds)_{i,j} = 1 \iff j \in S_{01}.\] We then have
\[
|(\qmat_{\pds}\cdot {\sb})_i - (\qmat_{\pds}\cdot \hat{\sb})_i|= |S_{01}| > 2 \acc \dsize,
\]
but, at the same time, if \(\hat{\sb}\) were output by the attacker, we would have
\[
|(\qmat_{\pds}\cdot {\sb})_i - (\qmat_{\pds}\cdot \hat{\sb})_i| \le |\ans_i - (\qmat_\pds\cdot \hat{\sb})_i| + |(\qmat_\pds \cdot \sb)_i - \ans_{i}| \le 2\acc \dsize, \]
which is a contradiction. An important point to note is that the attacker does not need to know the set \(S_{10}\), or the corresponding \(i\)-th row of \(\qmat_\pds\) and query \(\query_i\). Since the attacker asks all possible queries determined by the prior information, we can be sure \(\query_i\) is one of these queries, and an accurate answer to it rules out this particular bad choice of \(\hat{\sb}\). To give you something concrete to cherish, we can summarize this discussion in the following theorem.</p>
<blockquote>
<p><strong>Theorem <a href="https://dl.acm.org/doi/10.1145/773153.773173">[DN03]</a>:</strong> There is a reconstruction attack that issues \(2^n\) queries to a dataset of \(n\) users, obtains answers with error \(\alpha n\), and reconstructs the secret bits of all but \(4 \alpha n\) users.</p>
</blockquote>
<h3 id="an-efficient-attack">An Efficient Attack</h3>
<p>The exponential Dinur-Nissim attack is quite powerful, as it recovers 80% of the secret bits even from answers with 5% error, but it has the drawback that it requires asking \(2^\dsize\) queries to a dataset with \(\dsize\) users. Note that this is inherent to some extent. Suppose we randomly subsample 50% of the dataset and answer the queries using only this subset by rescaling appropriately. Although this random subsampling does not guarantee any meaningful privacy, clearly no attacker can reconstruct 75% of the secret bits, since some of them are effectively deleted. However, the guarantees of random sampling tell us that any set of \(\qsize\) queries will be answered with maximum error \( \acc n = O(\sqrt{n \log \qsize})\), so we can answer \( 2^{\Omega(n)} \) queries with \(5\%\) error while provably preventing this sort of reconstruction.</p>
<p>However, Dinur and Nissim showed that if we obtain <em>highly accurate</em> answers—still noisy, but with error smaller than the sampling error—then we can reconstruct the dataset to high accuracy. We can also make the reconstruction process computationally efficient by using linear programming to replace the exhaustive search over all \(2^\dsize\) possible vectors of secrets. Specifically, we change the attack as follows:</p>
<ul>
<li>
<p>The attacker now chooses \(\qsize\) <em>randomly chosen</em> functions \( \varphi_i \from \pbsuni \to \{0,1\} \) for a much smaller \(\qsize = O(\dsize) \).</p>
</li>
<li>
<p>Upon receiving an answer vector \(\ans\), the attacker now searches for a <em>real-valued</em> \( \tilde{b} \in [0,1]^{\dsize} \) such that \( \| \ans - \qmat_\pds \cdot \tilde{b} \|_{\infty} \leq \acc n \). Note that this vector can be found efficiently via linear programming. The attacker then rounds each \( \tilde{b}_{i} \) to the nearest \( \hat{b}_{i} \in \{0,1\}\).</p>
</li>
</ul>
<p>It’s now much trickier to analyze this attack and show that it achieves low reconstruction error, and we won’t go into details in this post. However, the key idea is that, because the queries are chosen randomly, \( \qmat_\pds \) is a random matrix with entries in \( \{0,1\} \), and we can use the statistical properties of this random matrix to argue that, with high probability,
\[
\|\qmat_\pds \cdot \sb - \qmat_\pds \cdot \tilde{\sb}\|_\infty^2 \gtrsim |{i: \sb_i \neq \hat{\sb}_i}|.
\]
By the way we chose \(\tilde{\sb}\), we have
\[
\|\qmat_\pds \cdot \sb - \qmat_\pds \cdot \tilde{\sb}\|_\infty \le \|\qmat_\pds \cdot \sb - \ans\|_\infty + \| \ans - \qmat_\pds \cdot \tilde{b} \|_{\infty} \leq 2\acc n,
\]
so, by combining the inequalities we get that the reconstruction error is about \( O(\alpha^2 n^2) \). Note that, in order to reconstruct 80% of the secret bits using this attack, we now need the error to be \( \alpha n \ll \sqrt{n} \), but as long as this condition on the error is satisfied, we will have a highly accurate reconstruction. Let’s add this theorem to your goodie bags:</p>
<blockquote>
<p><strong>Theorem <a href="https://dl.acm.org/doi/10.1145/773153.773173">[DN03]</a>:</strong> There is an efficient reconstruction attack that issues \(O(n)\) random queries to a dataset of \(n\) users, obtains answer with error \(\alpha n\), and, with high probability, reconstructs the secret bits of all but \( O(\alpha^2 n^2)\) users.</p>
</blockquote>
<p>Although we modeled the queries, and thus the matrix \(\qmat_\pds\) as uniformly random, it’s important to note that we really only relied on the fact that
\[
\|\qmat_\pds \cdot \sb - \qmat_\pds \cdot \tilde{\sb}\|_\infty^2 \gtrsim
|\{i: \sb_i \neq \hat{\sb}_i\}|,
\]
and we can reconstruct while tolerating the same \(\Omega(\sqrt{n})\) error for any family of queries that gives rise to a matrix with this property. Intuitively, any <em>random-enough</em> family of queries will have this property. More specifically, the property is satisfied by any matrix with no small singular values <a href="https://dl.acm.org/doi/10.1007/978-3-540-85174-5_26">[DY08]</a> or with large discrepancy <a href="https://arxiv.org/abs/1203.5453">[MN12]</a>. There is a large body of work showing that many specific families of queries lead to reconstruction. For example, we can perform reconstruction using <em>conjunction queries</em> that ask for the marginal distribution of small subsets of the attributes <a href="https://dl.acm.org/doi/abs/10.1145/1806689.1806795">[KLSU10]</a>. That is, queries of the form “count the number of people with blue eyes and brown hair and a birthday in August.” In fairness, there are also families of queries that do not satisfy the property, or only satisfy quantitatively weaker versions of it, such as histograms and threshold queries, and for these queries it is indeed possible to achieve differential privacy with \( \ll \sqrt{n} \) error.</p>
<h3 id="conclusion">Conclusion</h3>
<p>This is going to be the end of our technical discussion, but before signing off, let’s mention some of the important extensions of this theorem that have been developed over the years:</p>
<ul>
<li>
<p>We can allow the secret information \(\sb\) to be integers or real numbers, rather than bits. The queries still return \(\qmat_\pds\cdot \sb\). The exponential attack then guarantees that, given answers with error \(\acc n\), the reconstruction \(\hat{\sb}\) satisfies \(\|\hat{\sb}-\sb\|_1 \le 4\acc n\). This means, for example, that the reconstructed secrets of all but \(4\alpha n\) users are within \(\pm 1\) of the true secrets. The efficient attack guarantees that \(\|\hat{\sb}-\sb\|_2^2 \le O(\acc^2 n^2)\), which means that the reconstructed secrets are within \(\pm 1\) for all but \(O(\acc^2 n^2)\) users.</p>
</li>
<li>
<p>It’s not crucial that <em>every</em> query be answered with error \( \ll \sqrt{n} \). If we are willing to settle
for an inefficient attack, then we can reconstruct even if only 51% of the queries have small error. If at least 75% have small error, then we can reconstruct efficiently <a href="https://dl.acm.org/doi/10.1145/1250790.1250804">[DMT07]</a>.</p>
</li>
<li>
<p>The reconstruction attacks still apply to the seemingly more general data model in which the private
dataset \(\ds\) is a subset of some arbitrary (but public) data universe \(\uni\). To see this, note that we can take \(\uni = \{\pbs_1, \ldots, \pbs_\dsize\}\), and we can interpret the secret bits \(\sb_i\) to indicate whether \(\pbs_i\) is an element of \(\ds\). Then the reconstruction attacks allow us to determine, up to some error, which elements of \(\uni\) are contained in \(\ds\). In the setting, the attack is sometimes called <em>membership inference</em>.</p>
</li>
<li>
<p>The fact that the efficient Dinur-Nissim reconstruction attack fails when the error is \( \gg \sqrt{n} \)
does not mean it’s easy to achieve privacy with error of that magnitude. As we mentioned earlier, we can achieve non-trivial error guarantees for a large number of queries simply by using a random subsample of half of the dataset, which is not a private algorithm in any reasonable sense of the word, as it can reveal everything about the chosen subset. As this example shows,</p>
<blockquote>
<p>preventing reconstruction attacks does not mean preserving privacy.</p>
</blockquote>
<p>In particular, there are membership-inference attacks that succeed in violating privacy even when the queries are answered with \( \gg \sqrt{n}\) error. We refer the reader to the survey <a href="https://privacytools.seas.harvard.edu/publications/exposed-survey-attacks-private-data">[DSSU17]</a> for a somewhat more in-depth survey of reconstruction and membership-inference attacks.</p>
</li>
</ul>
<p>Many types of queries give rise to the conditions under which reconstruction is possible. Stay tuned for our next post, where we show how to generate those types of queries in practice against a family of systems known as <em>Diffix</em> that are specifically designed to thwart reconstruction.</p>
Aloni CohenSasho NikolovZachary SchutzmanJonathan UllmanWed, 21 Oct 2020 12:30:00 -0400
https://differentialprivacy.org/reconstruction-theory/
https://differentialprivacy.org/reconstruction-theory/Conference Digest - NeurIPS 2020<p><a href="https://neurips.cc/Conferences/2020">NeurIPS 2020</a> is the biggest conference on machine learning, with tons of content on differential privacy in many different forms.
We were able to find two workshops, a competition, and 31 papers.
This was just going off the preliminary <a href="https://nips.cc/Conferences/2020/AcceptedPapersInitial">accepted papers list</a>, so it’s possible that we might have missed some papers on differential privacy – please let us know!
We will update this post later, once all the conference material (papers and videos) are publicly available.</p>
<h2 id="workshops">Workshops</h2>
<ul>
<li>
<p><a href="https://ppml-workshop.github.io/">Privacy Preserving Machine Learning - PriML and PPML Joint Edition</a></p>
</li>
<li>
<p><a href="http://icfl.cc/SpicyFL/2020">International Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL 2020)</a></p>
</li>
</ul>
<h2 id="competitions">Competitions</h2>
<ul>
<li><a href="https://www.vanderschaar-lab.com/privacy-challenge/">Hide-and-Seek Privacy Challenge: Synthetic Data Generation vs. Patient Re-identification with Clinical Time-series Data</a></li>
</ul>
<h2 id="papers">Papers</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/2007.05665">A Computational Separation between Private Learning and Online Learning</a><br />
<a href="https://cs-people.bu.edu/mbun/">Mark Bun</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.05975">Adversarially Robust Streaming Algorithms via Differential Privacy</a><br />
<a href="http://u.cs.biu.ac.il/~avinatan/">Avinatan Hasidim</a>, <a href="http://www.cs.tau.ac.il/~haimk/">Haim Kaplan</a>, <a href="https://www.tau.ac.il/~mansour/">Yishay Mansour</a>, <a href="https://research.google/people/YossiMatias/">Yossi Matias</a>, <a href="https://www.uri.co.il/">Uri Stemmer</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.07709">Auditing Differentially Private Machine Learning: How Private is Private SGD?</a><br />
<a href="https://www.ccis.northeastern.edu/home/jagielski/">Matthew Jagielski</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a>, <a href="https://www.ccs.neu.edu/home/alina/">Alina Oprea</a></p>
</li>
<li>
<p><a href="https://papers.nips.cc/paper/2020/hash/ab452534c5ce28c4fbb0e102d4a4fb2e-Abstract.html">Bayesian Pseudocoresets</a><br />
<a href="https://www.cl.cam.ac.uk/~dm754/">Dionysis Manousakas</a>, <a href="https://www.stat.ubc.ca/users/zuheng-david-xu">Zuheng Xu</a>, <a href="https://www.cl.cam.ac.uk/~cm542/">Cecilia Mascolo</a>, <a href="https://trevorcampbell.me/">Trevor Campbell</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.11707">Breaking the Communication-Privacy-Accuracy Trilemma</a><br />
<a href="https://web.stanford.edu/~wnchen/index.html">Wei-Ning Chen</a>, <a href="https://kairouzp.github.io/">Peter Kairouz</a>, <a href="https://web.stanford.edu/~aozgur/">Ayfer Ozgur</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06618">CoinPress: Practical Private Mean and Covariance Estimation</a><br />
<a href="https://sravb.github.io/">Sourav Biswas</a>, <a href="https://yihedong.me/">Yihe Dong</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.08007">Differentially Private Clustering: Tight Approximation Ratios</a><br />
<a href="https://sites.google.com/view/badihghazi/home">Badih Ghazi</a>, <a href="https://sites.google.com/site/ravik53/">Ravi Kumar</a>, <a href="https://pasin30055.github.io/">Pasin Manurangsi</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.11425">Differentially-Private Federated Linear Bandits</a><br />
<a href="http://web.mit.edu/dubeya/www/">Abhimanyu Dubey</a>, <a href="https://www.media.mit.edu/people/sandy/overview/">Alex Pentland</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.14658">Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC</a><br />
<a href="https://people.eecs.berkeley.edu/~arunganesh/">Arun Ganesh</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.08265">GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators</a><br />
<a href="https://cispa.de/en/people/dingfan.chen">Dingfan Chen</a>, <a href="https://tribhuvanesh.github.io/">Tribhuvanesh Orekondy</a>, <a href="https://cispa.saarland/group/fritz/">Mario Fritz</a></p>
</li>
<li>
<p><a href="https://papers.nips.cc/paper/2020/hash/e9bf14a419d77534105016f5ec122d62-Abstract.html">Improving Sparse Vector Technique with Renyi Differential Privacy</a><br />
<a href="https://jeremy43.github.io/">Yuqing Zhu</a>, <a href="https://sites.cs.ucsb.edu/~yuxiangw/">Yu-Xiang Wang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.10630">Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms</a><br />
<a href="http://web.stanford.edu/~asi/">Hilal Asi</a>, <a href="https://web.stanford.edu/~jduchi/">John Duchi</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.13660">Learning discrete distributions: user vs item-level privacy</a><br />
<a href="https://www.ece.cornell.edu/research/grad-students/yuhan-liu">Yuhan Liu</a>, <a href="http://theertha.info/">Ananda Theertha Suresh</a>, <a href="http://felixyu.org/">Felix Xinnan Yu</a>, <a href="https://research.google/people/author11555/">Sanjiv Kumar</a>, <a href="https://research.google/people/author125/">Michael D Riley</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2008.00331">Learning from Mixtures of Private and Public Populations</a><br />
<a href="https://sites.google.com/view/rbassily">Raef Bassily</a>, <a href="http://www.cs.technion.ac.il/~shaymrn/">Shay Moran</a>, <a href="http://web.cse.ohio-state.edu/~nandi.10/">Anupama Nandi</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.00701">Locally Differentially Private (Contextual) Bandits Learning</a><br />
<a href="https://scholar.google.com/citations?user=Bw-WdyUAAAAJ">Kai Zheng</a>, <a href="https://tianle.website/">Tianle Cai</a>, <a href="https://www.weiranhuang.com/">Weiran Huang</a>, <a href="http://www.ee.columbia.edu/~zgli/">Zhenguo Li</a>, <a href="http://www.liweiwang-pku.com/">Liwei Wang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.12601">Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms</a><br />
<a href="https://warwick.ac.uk/fac/sci/statistics/staff/academic-research/berrett/">Thomas Berrett</a>, <a href="http://cbutucea.perso.math.cnrs.fr/">Cristina Butucea</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.01980">On the Equivalence between Online and Private Learnability beyond Binary Classification</a><br />
<a href="https://scholar.google.com/citations?user=ajqlbHUAAAAJ">Young Hun Jung</a>, <a href="https://scholar.google.com/citations?user=5xt0ba0AAAAJ&hl=en">Baekjin Kim</a>, <a href="https://ambujtewari.github.io/">Ambuj Tewari</a></p>
</li>
<li>
<p><a href="https://proceedings.neurips.cc/paper/2020/hash/21d144c75af2c3a1cb90441bbb7d8b40-Abstract.html">Optimal Private Median Estimation under Minimal Distributional Assumptions</a><br />
<a href="https://tzamos.com/">Christos Tzamos</a>, <a href="http://www.cs.columbia.edu/~emvlatakis/">Emmanouil-Vasileios Vlatakis-Gkaragkounis</a>, <a href="http://www.mit.edu/~izadik/">Ilias Zadik</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2010.12603">Permute-and-Flip: A new mechanism for differentially-private selection</a><br />
<a href="https://people.cs.umass.edu/~rmckenna/">Ryan McKenna</a>, <a href="https://people.cs.umass.edu/~sheldon/">Daniel Sheldon</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.06605">Privacy Amplification via Random Check-Ins</a><br />
<a href="https://borjaballe.github.io/">Borja Balle</a>, <a href="https://kairouzp.github.io/">Peter Kairouz</a>, <a href="https://scholar.google.com/citations?user=iKPWydkAAAAJ">Brendan McMahan</a>, <a href="https://scholar.google.com/citations?user=iKPWydkAAAAJ">Om Thakkar</a>, <a href="https://athakurta.squarespace.com/">Abhradeep Thakurta</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1905.11947">Private Identity Testing for High-Dimensional Distributions</a><br />
<a href="http://www.cs.columbia.edu/~ccanonne/">Clement Canonne</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="https://audramarymcmillan.wixsite.com/mysite">Audra McMillan</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a>, <a href="https://www.ccs.neu.edu/home/lydiazak/">Lydia Zakynthinou</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.07839">Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity</a><br />
<a href="http://www.cs.tau.ac.il/~haimk/">Haim Kaplan</a>, <a href="https://www.tau.ac.il/~mansour/">Yishay Mansour</a>, <a href="https://www.uri.co.il/">Uri Stemmer</a>, <a href="https://dblp.org/pid/146/9658.html">Eliad Tsfadia</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.10129">Smoothed Analysis of Online and Differentially Private Learning</a><br />
<a href="https://www.cs.cornell.edu/~nika/">Nika Haghtalab</a>, <a href="http://timroughgarden.org/">Tim Roughgarden</a>, <a href="https://ashettyv.github.io/">Abhishek Shetty</a></p>
</li>
<li>
<p><a href="https://proceedings.neurips.cc/paper/2020/hash/a0dc078ca0d99b5ebb465a9f1cad54ba-Abstract.html">Smoothly Bounding User Contributions in Differential Privacy</a><br />
<a href="https://www.epasto.org/">Alessandro Epasto</a>, <a href="https://research.google/people/MohammadMahdian/">Mohammad Mahdian</a>, <a href="https://sites.google.com/view/jieming-mao">Jieming Mao</a>, <a href="https://people.csail.mit.edu/mirrokni/Welcome.html">Vahab Mirrokni</a>, <a href="https://www.linkedin.com/in/lijie-ren-57162633/">Lijie Ren</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.06914">Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses</a><br />
<a href="https://sites.google.com/view/rbassily">Raef Bassily</a>, <a href="http://vtaly.net/">Vitaly Feldman</a>, <a href="https://sites.google.com/view/cguzman/">Cristobal Guzman</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1902.03468">Synthetic Data Generators – Sequential and Private</a><br />
<a href="https://research.google/people/OlivierBousquet/">Olivier Bousquet</a>, <a href="https://www.tau.ac.il/~rlivni/">Roi Livni</a>, <a href="http://www.cs.technion.ac.il/~shaymrn/">Shay Moran</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.00010">The Discrete Gaussian for Differential Privacy</a><br />
<a href="http://www.cs.columbia.edu/~ccanonne/">Clement Canonne</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="http://www.thomas-steinke.net/">Thomas Steinke</a></p>
</li>
<li>
<p><a href="https://proceedings.neurips.cc/paper/2020/hash/e3019767b1b23f82883c9850356b71d6-Abstract.html">The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space</a><br />
<a href="https://cs-people.bu.edu/ads22/">Adam Smith</a>, <a href="https://shs037.github.io/">Shuang Song</a>, <a href="https://athakurta.squarespace.com/">Abhradeep Thakurta</a></p>
</li>
<li>
<p><a href="https://proceedings.neurips.cc/paper/2020/hash/08fb104b0f2f838f3ce2d2b3741a12c2-Abstract.html">Towards Better Generalization of Adaptive Gradient Methods</a><br />
<a href="https://sites.google.com/umn.edu/zhou0877/home">Yingxue Zhou</a>, <a href="https://belhalk.github.io/">Belhal Karimi</a>, <a href="https://www.linkedin.com/in/jinxingyu/">Jinxing Yu</a>, <a href="https://scholar.google.com/citations?user=5SZ4NjAAAAAJ&hl=en">Zhiqiang Xu</a>, <a href="https://www.stat.rutgers.edu/home/pingli/">Ping Li</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.08598">Towards practical differentially private causal graph discovery</a><br />
<a href="https://wanglun1996.github.io/">Lun Wang</a>, Qi Pang, <a href="https://people.eecs.berkeley.edu/~dawnsong/">Dawn Song</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.15429">Understanding Gradient Clipping in Private SGD: A Geometric Perspective</a><br />
<a href="https://scholar.google.com/citations?user=M0ki5ZgAAAAJ">Xiangyi Chen</a>, <a href="https://zstevenwu.com/">Steven Wu</a>, <a href="https://people.ece.umn.edu/~mhong/mingyi.html">Mingyi Hong</a></p>
</li>
</ul>
Gautam KamathWed, 07 Oct 2020 12:30:00 -0400
https://differentialprivacy.org/neurips2020/
https://differentialprivacy.org/neurips2020/Open Problem - Avoiding the Union Bound for Multiple Queries<p><strong>Background:</strong> Perhaps the best-studied problem in differential privacy is answering multiple counting queries.
The standard approach is to add independent, appropriately-calibrated (Laplace or Gaussian) noise to each query result and apply a composition theorem.
To bound the maximum error over the query answers, one takes a union bound over the independent noise samples.
However, this is <em>not</em> optimal.
The problem is to identify the optimal method (up to constant factors).</p>
<p><strong>Problem 1:</strong> Is there a randomized algorithm \(M : \{0,1\}^{n \times k} \rightarrow \mathbb{R}^k\) that is differentially private<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> and satisfies
\[
\forall x \in \{0, 1\}^{n \times k} \quad \mathbb{E}\left[ \left\|M(x) - \sum_{i=1}^n x_i \right\|_\infty \right] \leq c \sqrt{k}
\]
for some constant \( c > 0\) depending only on the privacy parameters?<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>Adding independent Gaussian noise to each coordinate/query yields \(c \sqrt{k \log k}\) in place of \(c \sqrt{k}\) above.
Steinke and Ullman [<a href="https://arxiv.org/abs/1501.06095">SU17</a>] showed that the union bound can <em>almost</em> be avoided and obtained \(c \sqrt{k \log \log k}\) using correlated noise.
The algorithm is nonetheless based on independent Gaussian noise, with the added step of using the exponential mechanism to identify high-error answers and correct them.</p>
<p>Note that a \(\Omega(\sqrt{k})\) lower bound is known [<a href="https://arxiv.org/abs/1311.3158">BUV18</a>, <a href="https://arxiv.org/abs/1501.06095">SU17</a>]. By [<a href="http://www.cs.utah.edu/~bhaskara/files/privacy.pdf">BDKT12</a>] it suffices to consider mechanisms \(M\) that add <em>instance-independent noise</em>. That is, \(M(x) = \sum_{i=1}^n x_i + Z\) where \(Z\) is some fixed noise distribution over \(\mathbb{R}^k\) that is independent of \(x\).</p>
<p><strong>Reward:</strong> For a positive solution, an all-you-can-eat sushi dinner at a sushi restaurant of your choice.
If the solution is an efficiently-sampleable distribution with a closed-form density, alcohol will be included.
For a negative solution, alcohol only.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
<p><strong>Other related work:</strong> [<a href="https://privacytools.seas.harvard.edu/files/privacytools/files/robust.pdf">DSSUV15</a>, <a href="https://privacytools.seas.harvard.edu/files/privacytools/files/complexityprivacy_1.pdf">Vad17</a>, <a href="https://arxiv.org/abs/1801.09236">AS18</a>].
A very recent work by Ganesh and Zhao [<a href="https://people.eecs.berkeley.edu/~arunganesh/papers/generalizedgaussians.pdf">GZ20</a>] improves the best upper bound from \(c\sqrt{k \log \log k}\) to \(c\sqrt{k \log \log \log k}\).</p>
<p><em>Submitted by <a href="http://www.thomas-steinke.net/">Thomas Steinke</a> and <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a> on April 9, 2019.</em></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Specifically, \(M\) is either 1-zCDP [<a href="https://arxiv.org/abs/1605.02065">BS16</a>] with \(c\) an absolute constant or, for every \(\delta > 0\), there is an \(M_\delta\) that is \((1, \delta)\)-DP with \(c = c(\delta) = c’ \cdot \sqrt{\log (1/\delta)}\) for an absolute constant \(c’\). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Here \(x_i \in \{0, 1\}^k \subset \mathbb{R}^k\) denotes the data vector of individual \(i\) and \(x = (x_1, x_2, \dots, x_n) \in \{0,1\}^{n \times k}\). For simplicity, we only consider expected error; high-probability error bounds are an immediate consequence. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Restaurant need not necessarily be all-you-can-eat. Maximum redeemable value US$500. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Gautam KamathSat, 03 Oct 2020 19:00:00 -0400
https://differentialprivacy.org/open-problem-avoid-union/
https://differentialprivacy.org/open-problem-avoid-union/Differentially Private PAC Learning<p>The study of differentially private PAC learning runs all the way from
its introduction in 2008 <a href="https://arxiv.org/abs/0803.0924" title="Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What Can We Learn Privately? FOCS 2008"><strong>[KLNRS08]</strong></a> to a best paper award at the
Symposium on Foundations of Computer Science (FOCS) this year <a href="https://arxiv.org/abs/2003.00563" title="Mark Bun, Roi Livni, and Shay Moran. An equivalence between private classification and online prediction. FOCS 2020"><strong>[BLM20]</strong></a>.
In this post, we’ll recap the history of this line of work, aiming for
enough detail for a rough understanding of the results and methods.</p>
<p>Before we get to the “what” and “how” of private PAC learning, it’s
worth thinking about the “why”. One motivation for this line of work is
that it neatly captures a fundamental question: does privacy in machine
learning come at a price? Machine learning is now sufficiently
successful and widespread for this question to have real import. But to
even start to address this question, we need a formalization of machine
learning that allows us to reason about possible trade-offs in a
rigorous way. Statistical learning theory, and its computational
formalization as PAC learning, provide one such clean and well-studied
model. We can therefore use PAC learning as a testbed whose insights we
might carry to other less idealized forms of learning.</p>
<p>With this motivation in mind, the rest of this post is structured as
follows. The first section covers the basics of the PAC model, and
subsequent sections gradually build up a chronology of results. When
possible, we give short sketches of the accompanying techniques.</p>
<h1 id="pac-learning">PAC Learning</h1>
<p>We’ll start with a brief overview of PAC learning absent any privacy
restrictions. Readers familiar with PAC learning can probably skip this
section while noting that</p>
<ol>
<li>
<p>(the cardinality version of) Occam’s razor is a baseline learner
using \(O(\log|\mathcal{H}|)\) samples,</p>
</li>
<li>
<p>VC dimension characterizes non-private PAC learning,</p>
</li>
<li>
<p>we’ll focus on the sample complexity of realizable PAC learning,</p>
</li>
<li>
<p>we’ll usually omit dependencies on accuracy and success probability
parameters, and</p>
</li>
<li>
<p>we’ll usually ignore computational efficiency.</p>
</li>
</ol>
<p>For readers needing a refresher on PAC learning, the basic element of
the “probably approximately correct” (PAC) framework <a href="https://dl.acm.org/doi/10.1145/1968.1972" title="Leslie G Valiant. A theory of the learnable. Communications of the ACM, 1984"><strong>[Val84]</strong></a> is a
<em>hypothesis</em>. Each hypothesis is a function
\(h \colon \mathcal{X}\to \{-1,1\}\) mapping <em>examples</em> from some space
\(\mathcal{X}\) to binary labels. A collection of hypotheses is a
<em>hypothesis class</em> \(\mathcal{H}\), e.g., thresholds (a.k.a. perceptrons),
rectangles, conjunctions, and so on. In the <em>realizable</em> setting, a
learner receives examples drawn from some unknown distribution and
labeled by an unknown \(h^\ast \in \mathcal{H}\). The learner’s goal is to
with high probability (“probably”) output a hypothesis that mostly
matches the labels of \(h^\ast\) on future examples from the unknown example
distribution (“approximately correct”). In the <em>agnostic</em> setting,
examples are not necessarily labeled by any \(h
\in \mathcal{H}\), and the goal is only to output a hypothesis that
approximates the best error of any hypothesis from \(\mathcal{H}\). As
mentioned above, we focus on the realizable setting unless otherwise
specified. In the <em>proper</em> setting, the learner must output a hypothesis
from \(\mathcal{H}\) itself. In the <em>improper</em> setting, this requirement
is removed.</p>
<p>In general, we say an algorithm \((\alpha,\beta)\)-PAC learns
\(\mathcal{H}\) with sample complexity \(n\) if \(n\) samples are sufficient
to with probability at least \(1-\beta\) obtain error at most \(\alpha\)
over new examples from the distribution. For the purposes of this post,
we generally omit these dependencies on \(\alpha\) and \(\beta\), as they
typically vary little or not at all when switching between non-private
and private PAC learning.</p>
<p>Fortunately, we always have a simple baseline learner based on empirical
risk minimization: given a set of labeled examples, iterate over all
hypotheses \(h \in \mathcal{H}\), check how many of the labeled examples
each \(h\) mislabels, and output a hypothesis that mislabels the fewest
examples. Using this learner, which is sometimes called “Occam’s razor,”
\(O(\log|\mathcal{H}|)\) samples suffice to PAC learn \(\mathcal{H}\).</p>
<p>At the same time, \(|\mathcal{H}|\) is a pretty coarse measure of
hypothesis class complexity, as it would immediately rule out learning
any infinite hypothesis class (of which there are many). Thus, as you
might expect, we can do better. We do so using <em>VC dimension</em>.
\(\mathsf{VCD}\left(\mathcal{H}\right)\) is the size of the largest
possible collection of examples such that, for every labeling of the
examples, \(\mathcal{H}\) contains a hypothesis with that labeling. With
VC dimension, we can essentially swap \(\log|\mathcal{H}|\) with
\(\mathsf{VCD}\left(\mathcal{H}\right)\) in the Occam’s razor bound and
PAC learn with \(O(\mathsf{VCD}\left(\mathcal{H}\right))\) samples. In
fact, the “Fundamental Theorem of Statistical Learning” says that PAC
learnability (realizable or agnostic) is equivalent to finite VC
dimension. In this sense, \(\mathsf{VCD}\left(\mathcal{H}\right)\) is a
good measure of how hard it is to PAC learn \(\mathcal{H}\). As a
motivating example that will re-appear later, note that for the
hypothesis class of 1-dimensional thresholds over \(T\) points,
\(\log |\mathcal{H}| = \log T\), while
\(\mathsf{VCD}\left(\mathcal{H}\right)\) is only 1.</p>
<p><img src="/images/thresh.png" width="400" alt="Example: a one-dimensional threshold function" style="margin:auto;display: block;" />
An illustration of 1-dimensional thresholds. A given threshold is determined by some point \(x^\ast \in [T]\): any example \(x \leq x^\ast\) receives label \(-1\), and any example \(x > x^\ast\) receives label 1.</p>
<h1 id="a-simple-private-pac-learner">A Simple Private PAC Learner</h1>
<p>It is straightforward to add a differential privacy constraint to the
PAC framework: the hypothesis output by the learner must be a
differentially private function of the labeled examples
\((x_1, y_1), \ldots, (x_n, y_n)\). That is, changing any one of the
examples — even to one with an inconsistent label — must not affect
the distribution over hypotheses output by the learner by too much.</p>
<p>Since we haven’t talked about any other PAC learner, we may as well
start with the empirical risk minimization-style Occam’s razor discussed
in the previous section, which simply selects a hypothesis that
minimizes empirical error. A private version becomes easy if we view
this algorithm in the right light. All it is doing is assigning a score
to each possible output (the hypothesis’ empirical error) and outputting
one with the best (lowest) score. This makes it a good candidate for
privatization by the <em>exponential mechanism</em> <a href="https://dl.acm.org/doi/10.1109/FOCS.2007.41" title="Frank McSherry, Kunal Talwar. Mechanism Design via Differential Privacy. FOCS 2007."><strong>[MT07]</strong></a>.</p>
<p>Recall that the exponential mechanism uses a scoring function over
outputs to release better outputs with higher probability, subject to
the privacy constraint. More formally, the exponential mechanism
requires a scoring function \(u(X,h)\) mapping (database, output) pairs to
real-valued scores and then selects a given output \(h\) with probability
proportional to \(\exp\left(\tfrac{\varepsilon
u(X,h)}{2\Delta(u)}\right)\). Thus a lower \(\varepsilon\) (stricter
privacy requirement) and larger \(\Delta(u) := \sup_h \sup_{X \sim X’} u(X,h) - u(X’,h) \) (scoring function more sensitive to changing one element in the database \(X\) to make \(X’\)) both lead to a more uniform (more
private) output distribution.</p>
<p>Fortunately for our PAC learning setting, empirical error is not a very
sensitive scoring function: changing one sample only changes empirical
error by 1. We can therefore use (negative) empirical error as our
scoring function \(u(X,h)\), apply the exponential mechanism, and get a
“private Occam’s razor.” This was exactly what Kasiviswanathan, Lee,
Nissim, Raskhodnikova, and Smith <a href="https://arxiv.org/abs/0803.0924" title="Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What Can We Learn Privately? FOCS 2008"><strong>[KLNRS08]</strong></a> did when they introduced
differentially private PAC learning in 2008. The resulting sample
complexity bounds differ from the generic Occam’s razor only by an
\(\varepsilon\) factor in the denominator, and
\(O(\log|\mathcal{H}|/\varepsilon)\) samples suffice to privately PAC
learn \(\mathcal{H}\).</p>
<p>Of course, our experience with non-private PAC learning suggests that we
shouldn’t be satisfied with this \(\log
|\mathcal{H}|\) dependence. Maybe VC dimension characterizes private PAC
learning, too?</p>
<h1 id="characterizing-pure-private-pac-learning">Characterizing Pure Private PAC Learning</h1>
<p>As it turns out, answering this question will take some time. We start
with a partial negative answer. Specifically, we’ll see a class with VC
dimension 1 and (a restricted form of) private sample complexity
arbitrarily larger than 1. We’ll also cover the first in a line of
characterization results for private PAC learning.</p>
<p>We first consider learners that satisfy <em>pure</em> privacy. Recall that pure
\((\varepsilon,0)\)-differential privacy forces output distributions that
may only differ by a certain \(e^\varepsilon\) multiplicative factor (like
the exponential mechanism above). The strictly weaker notion of
approximate \((\varepsilon,\delta)\)-differential privacy also allows a
small additive \(\delta\) factor. Second, we restrict ourselves to
<em>proper</em> learners, which may only output hypotheses from the learned
class \(\mathcal{H}\).</p>
<p>With these assumptions in place, in 2010, Beimel, Kasiviswanathan, and
Nissim <a href="https://dl.acm.org/doi/10.1007/978-3-642-11799-2_26" title="Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. TCC 2010"><strong>[BKN10]</strong></a> studied a hypothesis class called \(\mathsf{Point}_d\).
\(\mathsf{Point}_d\) consists of \(2^d\) hypotheses, one for each vector in
\(\{0,1\}^d\). Taking the set of examples \(\mathcal{X}\) to be \(\{0,1\}^d\)
as well, we define each hypothesis in \(\mathsf{Point}_d\) to label only
its associated vector as 1, and the remaining \(2^d-1\) examples as
-1. <a href="https://dl.acm.org/doi/10.1007/978-3-642-11799-2_26" title="Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. TCC 2010"><strong>[BKN10]</strong></a> showed that the hypothesis class \(\mathsf{Point}_d\) requires
\(\Omega(d)\) samples for proper pure private PAC learning. In contrast,
\(\mathsf{VCD}\left(\mathsf{Point}_d\right) = 1\), so this \(\Omega(d)\)
lower bound shows us that VC dimension does <em>not</em> characterize proper
pure private PAC learning.</p>
<p>This result uses the classic “packing” lower bound method, which powers
many lower bounds for pure differential privacy. The general packing
method is to first construct a large collection of databases which are
all “close enough” to each other but nonetheless all have different
“good” outputs. Once we have such a collection, we use <em>group privacy</em>.
Group privacy is a corollary of differential privacy that requires
databases differing in \(k\) elements to have \(k\varepsilon\)-close output
distributions. Because of group privacy, if we start with a collection
of databases that are close together, then the output distributions for
any two databases in the collection cannot be too different. This
creates a tension: utility forces the algorithm to produce different
output distributions for different databases, but privacy forces
similarity. The packing argument comes down to arguing that, unless the
databases are large, privacy wins out, and when privacy wins out then
there is some database where the algorithm probably produces a bad
output.</p>
<p>For \(\mathsf{Point}_d\), we sketch the resulting argument as follows.
Suppose we have an \(\varepsilon\)-private PAC learner that uses \(m\)
samples. Then we can define a collection of different databases of size
\(m\), one for each hypothesis in \(\mathsf{Point}_d\). By group privacy,
the output distribution for our private PAC learner changes by at most
\(e^{m\varepsilon}\) between any two of the databases in this collection.
Thus we can pick any \(h \in \mathsf{Point}_d\) and know that the
probability of outputting the wrong hypothesis is at least roughly
\(2^d \cdot e^{-m\varepsilon}\). Since we need this probability to be
small, rearranging implies \(m =
\Omega(d/\varepsilon)\).</p>
<p><a href="https://dl.acm.org/doi/10.1007/978-3-642-11799-2_26" title="Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. TCC 2010"><strong>[BKN10]</strong></a> then contrasted this result with an <em>improper</em> pure private PAC
learner. This learner applies the exponential mechanism to a class
\(\mathsf{Point}_d’\) of hypotheses derived from \(\mathsf{Point}_d\) —
but <em>not</em> necessarily a subset of \(\mathsf{Point}_d\) — gives an
improper pure private PAC learner with sample complexity \(O(\log
d)\). Since this learner is improper, it circumvents the “one database
per hypothesis” step of the packing lower bound. Moreover, <a href="https://dl.acm.org/doi/10.1007/978-3-642-11799-2_26" title="Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. TCC 2010"><strong>[BKN10]</strong></a> gave a
still more involved improper pure private PAC learner requiring only
\(O(1)\) samples. This separates proper pure private PAC learning from
improper pure private PAC learning. In contrast, the sample complexities
of proper and improper PAC learning absent privacy are the same up to
logarithmic factors in \(\alpha\) and \(\beta\).</p>
<p>In 2013, Beimel, Nissim, and Stemmer <a href="https://arxiv.org/abs/1402.2224" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. ITCS 2013"><strong>[BNS13]</strong></a> proved a more general
result. They gave the first characterization of pure (improper) private
PAC learning by defining a new hypothesis class measure called the
<em>representation dimension</em>, \(\mathsf{REPD}\left(\mathcal{H}\right)\).
Roughly, the representation dimension considers the collection of all
distributions \(\mathcal{D}\) over sets of hypotheses, not necessarily
from \(\mathcal{H}\), that “cover” \(\mathcal{H}\). By “cover,” we mean that
for any \(h
\in \mathcal{H}\), with high probability a set drawn from covering
distribution \(\mathcal{D}\) includes a hypothesis that mostly produces
labels that agree with \(h\). With this collection of distributions
defined, \(\mathsf{REPD}\left(\mathcal{H}\right)\) is the minimum over all
such covering distributions of the logarithm of the size of the largest
set in its support. Thus a hypothesis class that can be covered by a
distribution over small sets of hypotheses will have a small
representation dimension. With the notion of representation dimension in
hand, <a href="https://arxiv.org/abs/1402.2224" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. ITCS 2013"><strong>[BNS13]</strong></a> gave the following result:</p>
<blockquote>
<p><strong>Theorem 1</strong> (<a href="https://arxiv.org/abs/1402.2224" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. ITCS 2013"><strong>[BNS13]</strong></a>). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Theta(\mathsf{REPD}\left(\mathcal{H}\right))\).</p>
</blockquote>
<p>Representation dimension may seem like a strange definition, but a
sketch of the proof of this result helps illustrate the connection to
private learning. Recall from our private Occam’s razor, and the
improper pure private PAC learner above, that if we can find a good and
relatively small set of hypotheses to choose from, then we can apply the
exponential mechanism and call it a day. It is exactly this kind of
“good set of hypotheses” that representation dimension aims to capture.
A little more formally, given an upper bound on
\(\mathsf{REPD}\left(\mathcal{H}\right)\), we know there is some covering
distribution whose largest hypothesis set is not too big. That means we
can construct a learner that draws a hypothesis set from this covering
distribution and applies the exponential mechanism to it. Just as we
picked up a \(\log|\mathcal{H}|\) sample complexity dependence using
private Occam’s razor, since \(\mathsf{REPD}\left(\mathcal{H}\right)\)
measures the logarithm of the size of the largest hypothesis set in the
support, this pure private learner picks up a
\(\mathsf{REPD}\left(\mathcal{H}\right)\) sample complexity dependence
here. This gives us one direction of
Theorem 1.</p>
<p>This logic works in the other direction as well. To go from a pure
private PAC learner with sample complexity \(m\) to an upper bound on
\(\mathsf{REPD}\left(\mathcal{H}\right)\), we return to the group privacy
trick used by <a href="https://dl.acm.org/doi/10.1007/978-3-642-11799-2_26" title="Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. TCC 2010"><strong>[BKN10]</strong></a>. Suppose we fix a database of size \(m\) and pass it
to the learner. By group privacy and the learner’s accuracy guarantee,
if we fix some concept \(c\), the learner has probability at least roughly
\(e^{-m}\) of outputting a hypothesis that mostly agrees with \(c\). Thus if
we repeat this process roughly \(e^{m}\) times, we probably get at least
one hypothesis that mostly agrees with \(c\). In other words, this
repeated calling of the learner on the arbitrary database yields a
covering distribution for \(\mathcal{H}\). Since we called the learner
approximately \(e^m\) times, the logarithm of this is \(m\), and we get our
upper bound on \(\mathsf{REPD}\left(\mathcal{H}\right)\).</p>
<p>To recap, we now know that proper pure private PAC learning is strictly
harder than improper pure private PAC learning, which is characterized
by representation dimension. A picture sums it up. Note the dotted line,
since we don’t yet have any evidence separating finite representation
dimension and finite VC dimension.</p>
<p><img src="/images/private_pac_1.png" width="400" alt="Landscape of Private PAC, take 1" style="margin:auto;display: block;" /></p>
<h1 id="separating-pure-and-approximate-private-pac-learning">Separating Pure and Approximate Private PAC Learning</h1>
<p>So far, we’ve focused only on pure privacy. In this section, we move on
to the first separations between pure and approximate private PAC
learning, as well as the first connection between private learning and
<em>online</em> learning.</p>
<p>Our source is a pair of interconnected papers from around 2014. Among
other things, Feldman and Xiao <a href="https://arxiv.org/abs/1402.6278" title="Vitaly Feldman and David Xiao. Sample complexity bounds on differentially private learning via communication complexity. COLT 2014"><strong>[FX14]</strong></a> introduced <em>Littlestone
dimension</em> to private PAC learning. By connecting representation
dimension to results from communication complexity to Littlestone
dimension, they proved the following:</p>
<blockquote>
<p><strong>Theorem 2</strong> (<a href="https://arxiv.org/abs/1402.6278" title="Vitaly Feldman and David Xiao. Sample complexity bounds on differentially private learning via communication complexity. COLT 2014"><strong>[FX14]</strong></a>). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Omega(\mathsf{LD}\left(\mathcal{H}\right))\).</p>
</blockquote>
<p>Littlestone dimension \(\mathsf{LD}\left(\mathcal{H}\right)\) is, roughly,
the maximum number of mistakes an adversary can force an <em>online</em>
PAC-learning algorithm to make <a href="https://link.springer.com/article/10.1023/A:1022869011914" title="Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 1988"><strong>[Lit88]</strong></a>. We always have
\(\mathsf{VCD}\left(\mathcal{H}\right) \leq \mathsf{LD}\left(\mathcal{H}\right) \leq \log|\mathcal{H}|\),
but these inequalities can be strict. For example, denoting by
\(\mathsf{Thresh_T}\) the class of thresholds over \(\{1, 2, \ldots,
T\}\), since an adversary can force \(\Theta(\log T)\) wrong answers from
an online learner binary searching over \(\{1,2, \ldots, T\}\),
\(\mathsf{LD}\left(\mathsf{Thresh_T}\right) = \Omega(\log T)\). In
contrast, \(\mathsf{VCD}\left(\mathsf{Thresh_T}\right) = 1\).</p>
<p>At first glance it’s not obvious what
Theorem 2 adds over
Theorem 1. After all,
Theorem 1 gives an equivalence, not just a lower bound. One
advantage of
Theorem 2 is that Littlestone dimension is a known
quantity that has already been studied in its own right. We can now
import results like the lower bound on
\(\mathsf{LD}\left(\mathsf{Thresh_T}\right)\), whereas bounds on
\(\mathsf{REPD}\left(\cdot\right)\) are not common. A second advantage is
that Littlestone dimension conceptually connects private learning and
online learning: we now know that pure private PAC learning is no easier
than online PAC learning.</p>
<p>A second paper by Beimel, Nissim, and Stemmer <a href="https://arxiv.org/abs/1407.2674" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. APPROX-RANDOM 2013"><strong>[BNS13b]</strong></a> contrasted this
\(\Omega(\log T)\) lower bound for pure private learning of thresholds
with a \(2^{O(\log^\ast T)}\) upper bound for <em>approximate</em> private PAC
learning \(\mathsf{Thresh_T}\). Here \(\log^\ast\) denotes the very
slow-growing iterated logarithm, the number of times we must take the
logarithm of the argument to bring it \(\leq 1\). (We’re not kidding about
“very slow-growing” either:
\(\log^\ast(\text{number of atoms in universe}) \approx
4\).) With Feldman and Xiao’s result, this separates pure private PAC
learning from approximate private PAC learning. It also shows that
representation dimension does <em>not</em> characterize approximate private PAC
learning.</p>
<p>At the same time, Feldman and Xiao observed that the connection between
pure private PAC learning and Littlestone dimension is imperfect. Again
borrowing results from communication complexity, they observed that the
hypothesis class \(\mathsf{Line_p}\) (which we won’t define here) has
\(\mathsf{LD}\left(\mathsf{Line_p}\right) = 2\) but
\(\mathsf{REPD}\left(\mathsf{Line_p}\right)
= \Theta(\log(p))\). In contrast, they showed that an <em>approximate</em>
private PAC learner can learn \(\mathsf{Line_p}\) using
\(O\left(\tfrac{\log(1/\beta)}{\alpha}\right)\) samples. Since this
entails no dependence on \(p\) at all, it improves the separation between
pure and approximate private PAC learning given by <a href="https://arxiv.org/abs/1407.2674" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. APPROX-RANDOM 2013"><strong>[BNS13b]</strong></a>.</p>
<p>Let’s pause to recap what’s happened so far. We learned in the last
section that representation dimension characterizes pure private PAC
learning <a href="https://arxiv.org/abs/1402.2224" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. ITCS 2013"><strong>[BNS13]</strong></a>. We learned in this section that Littlestone dimension
gives lower bounds for pure private PAC learning but, as shown by
\(\mathsf{Line_p}\), these bounds are sometimes quite loose <a href="https://arxiv.org/abs/1402.6278" title="Vitaly Feldman and David Xiao. Sample complexity bounds on differentially private learning via communication complexity. COLT 2014"><strong>[FX14]</strong></a>.
\(\mathsf{Thresh_T}\) shows that representation dimension does not
characterize approximate private PAC learning <strong>[<a href="https://arxiv.org/abs/1402.6278" title="Vitaly Feldman and David Xiao. Sample complexity bounds on differentially private learning via communication complexity. COLT 2014">FX14</a>
; <a href="https://arxiv.org/abs/1407.2674" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. APPROX-RANDOM 2013">BNS13b</a>]</strong>, and we
still have no privacy-specific lower bounds for approximate private
learners. So the picture now looks like this:</p>
<p><img src="/images/private_pac_2.png" width="400" alt="Landscape of Private PAC, take 2" style="margin:auto;display: block;" /></p>
<p>In particular, we might still find that VC dimension characterizes
approximate private PAC learning!</p>
<h1 id="lower-bounds-for-approximate-private-pac-learning">Lower Bounds for Approximate Private PAC Learning</h1>
<p>We now dash this hope. In 2015, Bun, Nissim, Stemmer, and
Vadhan <a href="https://arxiv.org/abs/1504.07553" title="Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. Differentially private release and learning of threshold functions. FOCS 2015"><strong>[BNSV15]</strong></a> gave the first nontrivial lower bound for approximate
private PAC learning. They showed that learning \(\mathsf{Thresh_T}\) has
<em>proper</em> approximate private sample complexity \(\Omega(\log^\ast(T))\) and
\(O(2^{\log^\ast(T)})\).</p>
<p>We’ll at least try to give some intuition for the presence of \(\log^\ast\)
in the lower bound. Informally, the lower bound relies on an inductive
construction of a sequence of hard problems for databases of size
\(n=1, 2,
\ldots\). The \(k^{th}\) hard problem relies on a distribution over
databases of size \(k\) whose data universe is of of size exponential in
the size of the data universe for the \((k-1)^{th}\) distribution. The
base case is the uniform distribution over the two singleton databases
\(\{0\}\) and \(\{1\}\), and they show how to inductively construct
successive problems such that a solution for the \(k^{th}\) problem
implies a solution for the \((k-1)^{th}\) problem. Unraveling the
recursive relationship between the problem domain sizes implies a
general lower bound of roughly \(\log^\ast|X|\) for domain \(X\).</p>
<p>The inclusion of \(\log^\ast\) makes this is an extremely mild lower bound.
However, \(\log^\ast(T)\) can still be arbitrarily larger than 1, so this is
the first definitive evidence that proper approximate privacy introduces
a cost over non-private PAC learning.</p>
<p>In 2018, Alon, Livni, Malliaris, and Moran <a href="https://arxiv.org/abs/1806.00949" title="Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite Littlestone dimension. STOC 2019"><strong>[ALMM19]</strong></a> extended this
\(\Omega(\log^\ast T)\) lower bound for \(\mathsf{Thresh_T}\) to <em>improper</em>
approximate privacy. More generally, they gave concrete evidence for the
importance of thresholds, which have played a seemingly outsize role in
the work so far. They did so by relating a class’ Littlestone dimension
to its ability to “contain” thresholds. Here, we say \(\mathcal{H}\)
“contains” \(m\) thresholds if there exist \(m\) (unlabeled) examples
\(x_1,\ldots,x_m\) and hypotheses \(h_1, \ldots, h_m \in \mathcal{H}\) such
that the hypotheses “behave like” thresholds on the \(m\) examples, i.e.,
\(h_i(x_j) = 1 \Leftrightarrow j \geq
i\). With this language, they imported a result from model theory to show
that any hypothesis class \(\mathcal{H}\) contains
\(\log(\mathsf{LD}\left(\mathcal{H}\right))\) thresholds. This implies
that learning \(\mathcal{H}\) is at least as hard as learning
\(\mathsf{Thresh_T}\) with
\(T = \log(\mathsf{LD}\left(\mathcal{H}\right))\). Since
\(\log^\ast(\log(\mathsf{LD}\left(\mathcal{H}\right)))
= \Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\), combining these
two results puts the following limit on private PAC learning:</p>
<blockquote>
<p><strong>Theorem 3</strong> (<a href="https://arxiv.org/abs/1806.00949" title="Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite Littlestone dimension. STOC 2019"><strong>[ALMM19]</strong></a>). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\).</p>
</blockquote>
<p>Littlestone dimension characterizes online PAC learning, so we now know
that online PAC learnability is necessary for private PAC learnability.
Sufficiency, however, remains an open question. This produces the
following picture, where the dotted line captures the question of
sufficiency.</p>
<p><img src="/images/private_pac_3.png" width="400" alt="Landscape of Private PAC, take 3" style="margin:auto;display: block;" /></p>
<h1 id="characterizing-approximate-private-pac-learning">Characterizing Approximate Private PAC Learning</h1>
<p>Spurred by this question, several advances in private PAC learning have
appeared in the last year. First, Gonen, Hazan, and Moran strengthened
Theorem 3 by giving a constructive method for converting
<em>pure</em> private learners to online learners <a href="https://arxiv.org/abs/1905.11311" title="Alon Gonen, Elad Hazan, and Shay Moran. Private learning implies online learning: An efficient reduction. NeurIPS 2019"><strong>[GHM19]</strong></a>. Their result
reaches back to the 2013 characterization of pure private learning in
terms of representation dimension by using the covering distribution to
generate a collection of “experts” for online learning. Again revisiting
\(\mathsf{Thresh_T}\), Kaplan, Ligett, Mansour, Naor, and
Stemmer <a href="https://arxiv.org/abs/1911.10137" title="Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. Privately learning thresholds: Closing the exponential gap. COLT 2020"><strong>[KLMNS20]</strong></a> significantly reduced the \(O(2^{\log^\ast(T)})\) upper
bound of <a href="https://arxiv.org/abs/1504.07553" title="Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. Differentially private release and learning of threshold functions. FOCS 2015"><strong>[BNSV15]</strong></a> to just \(O((\log^\ast(T))^{1.5})\). And Alon, Beimel,
Moran, and Stemmer <a href="https://arxiv.org/abs/2003.04509" title="Noga Alon, Amos Beimel, Shay Moran, and Uri Stemmer. Closure properties for private classification and online prediction. COLT 2020"><strong>[ABMS20]</strong></a> justified this post’s focus on realizable
private PAC learning by giving a transformation from a realizable
approximate private PAC learner to an agnostic one at the cost of
slightly worse privacy and sample complexity. This built on an earlier
transformation that only applied to <em>proper</em> learners <a href="https://arxiv.org/abs/1407.2662" title="Amos Beimel, Kobbi Nissim, and Uri Stemmer. Learning privately with labeled and unlabeled examples. SODA 2015"><strong>[BNS15]</strong></a>.</p>
<p>Finally, Bun, Livni, and Moran <a href="https://arxiv.org/abs/2003.00563" title="Mark Bun, Roi Livni, and Shay Moran. An equivalence between private classification and online prediction. FOCS 2020"><strong>[BLM20]</strong></a> answered the open question posed
by <a href="https://arxiv.org/abs/1806.00949" title="Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite Littlestone dimension. STOC 2019"><strong>[ALMM19]</strong></a>:</p>
<blockquote>
<p><strong>Theorem 4</strong> (<a href="https://arxiv.org/abs/2003.00563" title="Mark Bun, Roi Livni, and Shay Moran. An equivalence between private classification and online prediction. FOCS 2020"><strong>[BLM20]</strong></a>). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\).</p>
</blockquote>
<p>To prove this, <a href="https://arxiv.org/abs/2003.00563" title="Mark Bun, Roi Livni, and Shay Moran. An equivalence between private classification and online prediction. FOCS 2020"><strong>[BLM20]</strong></a> introduced the notion of a <em>globally stable</em>
learner and showed how to convert an online learner to a globally stable
learner to a private learner. Thus, combined with the result of <a href="https://arxiv.org/abs/1806.00949" title="Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite Littlestone dimension. STOC 2019"><strong>[ALMM19]</strong></a>,
we now know that the sample complexity of private PAC learning any
\(\mathcal{H}\) is at least
\(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\) and at most
\(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\). In this sense, online
learnability characterizes private learnability.</p>
<p><img src="/images/private_pac_4.png" width="400" alt="Landscape of Private PAC, final take" style="margin:auto;display: block;" /></p>
<p>Narrowing the gap between the lower and upper bounds above is an open
question. Note that we cannot hope to close the gap completely. For the
lower bound, the current \(\mathsf{Thresh_T}\) upper bound implies that no
general lower bound can be stronger than
\(\Omega((\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))^{1.5})\). For the
upper bound, there exist hypotheses classes \(\mathcal{H}\) with
\(\mathsf{VCD}\left(\mathcal{H}\right) = \mathsf{LD}\left(\mathcal{H}\right)\)
(e.g., \(\mathsf{VCD}\left(\mathsf{Point}_d\right) = \mathsf{LD}\left(\mathsf{Point}_d\right)= 1\)), so since non-private PAC learning requires
\(\Omega(\mathsf{VCD}\left(\mathcal{H}\right))\) samples, the best
possible private PAC learning upper bound is
\(O(\mathsf{LD}\left(\mathcal{H}\right))\). Nevertheless, proving either
bound remains open.</p>
<h1 id="conclusion">Conclusion</h1>
<p>This concludes our post, and with it our discussion of this fundamental
question: the price of privacy in machine learning. We now know that in
the PAC model, proper pure private learning, improper pure private
learning, approximate private learning, and non-private learning are all
strongly separated. By the connection to Littlestone dimension, we also
know that approximate private learnability is equivalent to online
learnability. However, many questions about computational efficiency and
tight sample complexity bounds remain open.</p>
<p>As mentioned in the introduction, we focused on the clean yet widely
studied and influential model of PAC learning. Having characterized how
privacy enters the picture in PAC learning, we can hopefully convey this
understanding to other models of learning, and now approach these
questions from a rigorous and grounded point of view.</p>
<p>Congratulations to Mark Bun, Roi Livni, and Shay Moran on their best
paper award — and to the many individuals who paved the way before
them!</p>
<h1 id="acknowledgments">Acknowledgments</h1>
<p>Thanks to Kareem Amin and Clément Canonne for helpful feedback while
writing this post.</p>
Matthew JosephWed, 16 Sep 2020 14:00:00 -0400
https://differentialprivacy.org/private-pac/
https://differentialprivacy.org/private-pac/Conference Digest - ICML 2020<p><a href="https://icml.cc/virtual/2020">ICML 2020</a> is one of the premiere venues in machine learning, and generally features a lot of great work in differentially private machine learning.
This year is no exception: the relevant papers are listed below to the best of our ability, including links to the full versions of papers, as well as the conference pages (which contain slides and 15 minute videos for each paper).
As always, please inform us if we overlooked any papers on differential privacy.</p>
<h2 id="papers">Papers</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/1909.12732">Alleviating Privacy Attacks via Causal Learning</a> (<a href="https://icml.cc/virtual/2020/poster/6346">page</a>)<br />
<a href="https://www.microsoft.com/en-us/research/people/shtople/">Shruti Tople</a>, <a href="http://www.amitsharma.in/">Amit Sharma</a>, <a href="https://www.microsoft.com/en-us/research/people/adityan/">Aditya Nori</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1805.10341">An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm</a> (<a href="https://icml.cc/virtual/2020/poster/6240">page</a>)<br />
<a href="https://github.com/dpeng817">Chris DeCarolis</a>, <a href="https://twitter.com/exsidius">Mukul Ram</a>, <a href="https://www.cs.umd.edu/people/sesmaeil">Seyed Esmaeili</a>, <a href="https://sites.cs.ucsb.edu/~yuxiangw/">Yu-Xiang Wang</a>, <a href="http://furong-huang.com/">Furong Huang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1901.09697">Bayesian Differential Privacy for Machine Learning</a> (<a href="https://icml.cc/virtual/2020/poster/6547">page</a>)<br />
<a href="https://scholar.google.com/citations?user=BCWx7iQAAAAJ">Aleksei Triastcyn</a>, <a href="https://people.epfl.ch/boi.faltings">Boi Faltings</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.03030">Certified Data Removal from Machine Learning Models</a> (<a href="https://icml.cc/virtual/2020/poster/5895">page</a>)<br />
<a href="https://sites.google.com/view/chuanguo">Chuan Guo</a>, <a href="https://www.cs.umd.edu/~tomg/">Tom Goldstein</a>, <a href="https://awnihannun.com/">Awni Hannun</a>, <a href="https://lvdmaaten.github.io/">Laurens van der Maaten</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.00038">Context Aware Local Differential Privacy</a> (<a href="https://icml.cc/virtual/2020/poster/5775">page</a>)<br />
<a href="https://people.ece.cornell.edu/acharya/">Jayadev Acharya</a>, <a href="https://research.google/people/105175/">Kallista Bonawitz</a>, <a href="https://kairouzp.github.io/">Peter Kairouz</a>, <a href="https://research.google/people/106777/">Daniel Ramage</a>, <a href="http://www.zitengsun.com/">Ziteng Sun</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1905.12813">Data-Dependent Differentially Private Parameter Learning for Directed Graphical Models</a> (<a href="https://icml.cc/virtual/2020/poster/6262">page</a>)<br />
<a href="https://scholar.google.com/citations?user=lWWAZ4YAAAAJ">Amrita Roy Chowdhury</a>, <a href="http://pages.cs.wisc.edu/~thodrek/">Theodoros Rekatsinas</a>, <a href="http://pages.cs.wisc.edu/~jha/">Somesh Jha</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09745">Differentially Private Set Union</a> (<a href="https://icml.cc/virtual/2020/poster/6541">page</a>)<br />
<a href="https://www.microsoft.com/en-us/research/people/sigopi/">Sivakanth Gopi</a>, <a href="https://www.linkedin.com/in/pankajgulhane/">Pankaj Gulhane</a>, <a href="https://www.microsoft.com/en-us/research/people/jakul/">Janardhan Kulkarni</a>, <a href="https://heyyjudes.github.io/">Judy Hanwen Shen</a>, <a href="https://www.microsoft.com/en-us/research/people/milads/">Milad Shokouhi</a>, <a href="http://www.yekhanin.org/">Sergey Yekhanin</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.11651">Fair Learning with Private Demographic Data</a> (<a href="https://icml.cc/virtual/2020/poster/6499">page</a>)<br />
<a href="https://husseinmozannar.github.io/">Hussein Mozannar</a>, <a href="https://sites.google.com/site/mesrob/home/">Mesrob Ohannessian</a>, <a href="https://ttic.uchicago.edu/~nati/">Nathan Srebro</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.15744">Fast and Private Submodular and $k$-Submodular Functions Maximization with Matroid Constraint</a> (<a href="https://icml.cc/virtual/2020/poster/6365">page</a>)<br />
<a href="https://dblp.org/pid/166/1694.html">Akbar Rafiey</a>, <a href="http://research.nii.ac.jp/~yyoshida/">Yuichi Yoshida</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2006.00706">(Locally) Differentially Private Combinatorial Semi-Bandits</a> (<a href="https://icml.cc/virtual/2020/poster/6315">page</a>)<br />
<a href="https://scholar.google.com/citations?user=sioumZAAAAAJ">Xiaoyu Chen</a>, <a href="https://scholar.google.com/citations?user=Bw-WdyUAAAAJ">Kai Zheng</a>, <a href="https://twitter.com/zixinjackzhou">Zixin Zhou</a>, <a href="https://scholar.google.com/citations?user=m8m9nD0AAAAJ">Yunchang Yang</a>, <a href="https://www.microsoft.com/en-us/research/people/weic/">Wei Chan</a>, <a href="http://www.liweiwang-pku.com/">Liwei Wang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2007.05453">New Oracle-Efficient Algorithms for Private Synthetic Data Release</a> (<a href="https://icml.cc/virtual/2020/poster/5814">page</a>)<br />
<a href="https://sites.google.com/umn.edu/giuseppe-vietri/home">Giuseppe Vietri</a>, <a href="https://scholar.google.com/citations?user=dDVIyEQAAAAJ">Grace Tian</a>, <a href="https://cs-people.bu.edu/mbun/">Mark Bun</a>, <a href="http://www.thomas-steinke.net/">Thomas Steinke</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a></p>
</li>
<li>
<p><a href="https://proceedings.icml.cc/static/paper_files/icml/2020/1190-Paper.pdf">On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data</a> (<a href="https://icml.cc/virtual/2020/poster/5948">page</a>)<br />
<a href="http://www.acsu.buffalo.edu/~dwang45/">Di Wang</a>, <a href="https://scholar.google.com/citations?user=e3ZhEDEAAAAJ">Hanshen Xiao</a>, <a href="https://people.csail.mit.edu/devadas/">Srinivas Devadas</a>, <a href="https://cse.buffalo.edu/~jinhui/">Jinhui Xu</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1909.13830">Optimal Differential Privacy Composition for Exponential Mechanisms</a> (<a href="https://icml.cc/virtual/2020/poster/6687">page</a>)<br />
<a href="https://www.math.upenn.edu/~jinshuo/">Jinshuo Dong</a>, <a href="https://dblp.org/pid/155/9794.html">David Durfee</a>, <a href="https://scholar.google.com/citations?user=jr7gGB4AAAAJ">Ryan Rogers</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1909.01783">Oracle Efficient Private Non-Convex Optimization</a> (<a href="https://icml.cc/virtual/2020/poster/5815">page</a>)<br />
<a href="https://sethneel.com/">Seth Neel</a>, <a href="https://www.cis.upenn.edu/~aaroth/">Aaron Roth</a>, <a href="https://sites.google.com/umn.edu/giuseppe-vietri/home">Giuseppe Vietri</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a></p>
</li>
<li>
<p><a href="https://proceedings.icml.cc/static/paper_files/icml/2020/2341-Paper.pdf">Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead</a> (<a href="https://icml.cc/virtual/2020/poster/6134">page</a>)<br />
<a href="https://sites.google.com/view/badihghazi/home">Badih Ghazi</a>, <a href="https://sites.google.com/site/ravik53/">Ravi Kumar</a>, <a href="https://pasin30055.github.io/">Pasin Manurangsi</a>, <a href="https://www.itu.dk/people/pagh/">Rasmus Pagh</a></p>
</li>
<li>
<p><a href="https://proceedings.icml.cc/static/paper_files/icml/2020/6298-Paper.pdf">Private Outsourced Bayesian Optimization</a> (<a href="https://icml.cc/virtual/2020/poster/6783">page</a>)<br />
<a href="https://scholar.google.com/citations?user=7_2XTQ8AAAAJ">Dmitrii Kharkovskii</a>, <a href="https://daizhongxiang.github.io/">Zhongxiang Dai</a>, <a href="https://www.comp.nus.edu.sg/~lowkh/research.html">Bryan Kian Hsiang Low</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2004.10941">Private Query Release Assisted by Public Data</a> (<a href="https://icml.cc/virtual/2020/poster/6329">page</a>)<br />
<a href="https://sites.google.com/view/rbassily">Raef Bassily</a>, <a href="https://www.ccs.neu.edu/home/albertcheu/">Albert Cheu</a>, <a href="http://www.cs.technion.ac.il/~shaymrn/">Shay Moran</a>, <a href="http://www.cs.toronto.edu/~anikolov/">Aleksandar Nikolov</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a></p>
</li>
<li>
<p><a href="https://proceedings.icml.cc/static/paper_files/icml/2020/2453-Paper.pdf">Private Reinforcement Learning with PAC and Regret Guarantees</a> (<a href="https://icml.cc/virtual/2020/poster/6152">page</a>)<br />
<a href="https://sites.google.com/umn.edu/giuseppe-vietri/home">Giuseppe Vietri</a>, <a href="https://borjaballe.github.io/">Borja Balle</a>, <a href="https://people.cs.umass.edu/~akshay/">Akshay Krishnamurthy</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1910.01327">Privately Detecting Changes in Unknown Distributions</a> (<a href="https://icml.cc/virtual/2020/poster/5854">page</a>)<br />
<a href="https://sites.gatech.edu/rachel-cummings/">Rachel Cummings</a>, <a href="https://sites.google.com/view/skrehbiel/home">Sara Krehbiel</a>, <a href="https://scholar.google.com/citations?user=ayasb_wAAAAJ">Yuliia Lut</a>, <a href="https://wanrongz.github.io/">Wanrong Zhang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09463">Privately Learning Markov Random Fields</a> (<a href="https://icml.cc/virtual/2020/poster/5776">page</a>)<br />
<a href="https://huanyuzhang.github.io/">Huanyu Zhang</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="https://www.microsoft.com/en-us/research/people/jakul/">Janardhan Kulkarni</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1903.09822">Scalable Differential Privacy with Certified Robustness in Adversarial Learning</a> (<a href="https://icml.cc/virtual/2020/poster/6401">page</a>)<br />
<a href="https://sites.google.com/site/ihaiphan/">NhatHai Phan</a>, <a href="https://www.cise.ufl.edu/~mythai/">My T. Thai</a>, <a href="https://scholar.google.com/citations?user=OgXtPDIAAAAJ">Han Hu</a>, <a href="http://www.cs.kent.edu/~jin/">Ruoming Jin</a>, <a href="https://research.adobe.com/person/tong-sun/">Tong Sun</a>, <a href="https://ix.cs.uoregon.edu/~dou/">Dejing Dou</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.04493">Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion</a> (<a href="https://icml.cc/virtual/2020/poster/6734">page</a>)<br />
<a href="https://enosair.github.io/">Qinqing Zheng</a>, <a href="https://www.math.upenn.edu/~jinshuo/">Jinshuo Dong</a>, <a href="https://www.med.upenn.edu/apps/faculty/index.php/g275/p8939931">Qi Long</a>, <a href="http://www-stat.wharton.upenn.edu/~suw/">Weijie J. Su</a></p>
</li>
</ul>
Gautam KamathMon, 31 Aug 2020 14:00:00 -0400
https://differentialprivacy.org/icml2020/
https://differentialprivacy.org/icml2020/Conference Digest - COLT 2020<p><a href="https://www.learningtheory.org/colt2020/">COLT 2020</a> was held online in July, and featured nine papers on differential privacy, as well as a keynote talk by Salil Vadhan.
While differential privacy has always had a home in the COLT community, it seems like this year was truly exceptional in terms of the number of results.
We link all the content below, including pointers to the papers, videos on Youtube, and the page on the conference website.
Please let us know if we missed any papers on differential privacy, either in the comments below or by email.</p>
<h2 id="keynote">Keynote</h2>
<ul>
<li><a href="http://www.learningtheory.org/colt2020/virtual/speaker_1.html">The Theory and Practice of Differential Privacy</a> (<a href="https://www.youtube.com/watch?v=4bpFDpT1t7I">video</a>)<br />
<a href="https://salil.seas.harvard.edu/">Salil Vadhan</a></li>
</ul>
<h2 id="papers">Papers</h2>
<ul>
<li>
<p><a href="https://arxiv.org/abs/1907.08743">Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit</a> (<a href="https://www.youtube.com/watch?v=dgGdARyU6oY">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_96.html">page</a>)<br />
<a href="https://people.ece.cornell.edu/acharya/">Jayadev Acharya</a>, <a href="http://www.cs.columbia.edu/~ccanonne/">Clément L. Canonne</a>, <a href="https://web.stanford.edu/~yjhan/">Yanjun Han</a>, <a href="http://www.zitengsun.com/">Ziteng Sun</a>, <a href="https://ece.iisc.ac.in/~htyagi/">Himanshu Tyagi</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2003.04509">Closure Properties for Private Classification and Online Prediction</a> (<a href="https://www.youtube.com/watch?v=U9hqJH6sEyY">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_320.html">page</a>)<br />
<a href="https://web.math.princeton.edu/~nalon/">Noga Alon</a>, <a href="https://www.cs.bgu.ac.il/~beimel/">Amos Beimel</a>, <a href="http://www.cs.technion.ac.il/~shaymrn/">Shay Moran</a>, <a href="https://www.uri.co.il/">Uri Stemmer</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.01452">Pan-Private Uniformity Testing</a> (<a href="https://www.youtube.com/watch?v=yMXAjEGDXdI">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_141.html">page</a>)<br />
<a href="http://amin.kareemx.com/">Kareem Amin</a>, <a href="https://www.majos.net/">Matthew Joseph</a>, <a href="https://sites.google.com/view/jieming-mao">Jieming Mao</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.01100">Efficient, Noise-Tolerant, and Private Learning via Boosting</a> (<a href="https://www.youtube.com/watch?v=xCh7oZKcINs">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_304.html">page</a>)<br />
<a href="https://cs-people.bu.edu/mbun/">Mark Bun</a>, <a href="https://marco.ntime.org/">Marco L. Carmosino</a>, <a href="http://cseweb.ucsd.edu/~jlsorrel/">Jessica Sorrell</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.10541">PAC Learning with Stable and Private Predictions</a> (<a href="https://www.youtube.com/watch?v=jZlgmBUQ4nU">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_37.html">page</a>)<br />
<a href="https://yuvaldagan.wordpress.com/">Yuval Dagan</a>, <a href="http://vtaly.net/">Vitaly Feldman</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09465">Locally Private Hypothesis Selection</a> (<a href="https://www.youtube.com/watch?v=MGeBYQ7lJYw">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_5.html">page</a>)<br />
<a href="https://www.microsoft.com/en-us/research/people/sigopi/">Sivakanth Gopi</a>, <a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="https://www.microsoft.com/en-us/research/people/jakul/">Janardhan D. Kulkarni</a>, <a href="http://www.cs.toronto.edu/~anikolov/">Aleksandar Nikolov</a>, <a href="https://zstevenwu.com/">Zhiwei Steven Wu</a>, <a href="https://huanyuzhang.github.io/">Huanyu Zhang</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2002.09464">Private Mean Estimation of Heavy-Tailed Distributions</a> (<a href="https://www.youtube.com/watch?v=6NVuAZqxrSE">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_6.html">page</a>)<br />
<a href="http://www.gautamkamath.com/">Gautam Kamath</a>, <a href="http://www.ccs.neu.edu/home/vikrantsinghal/">Vikrant Singhal</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.10137">Privately Learning Thresholds: Closing the Exponential Gap</a> (<a href="https://www.youtube.com/watch?v=uGTfJsJAkh0">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_219.html">page</a>)<br />
<a href="http://www.cs.tau.ac.il/~haimk/">Haim Kaplan</a>, <a href="https://www.cs.huji.ac.il/~katrina/">Katrina Ligett</a>, <a href="https://www.tau.ac.il/~mansour/">Yishay Mansour</a>, <a href="http://www.wisdom.weizmann.ac.il/~naor/">Moni Naor</a>, <a href="https://www.uri.co.il/">Uri Stemmer</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2001.09122">Reasoning About Generalization via Conditional Mutual Information</a> (<a href="https://www.youtube.com/watch?v=c5fzeqiTwWk">video</a>, <a href="https://www.learningtheory.org/colt2020/virtual/papers/paper_98.html">page</a>)<br />
<a href="http://www.thomas-steinke.net/">Thomas Steinke</a>, <a href="http://www.ccs.neu.edu/home/lydiazak/">Lydia Zakynthinou</a></p>
</li>
</ul>
Gautam KamathTue, 25 Aug 2020 10:00:00 -0400
https://differentialprivacy.org/colt2020/
https://differentialprivacy.org/colt2020/Why Privacy Needs Composition<p>We’re back! In our last <a href="\average-case-dp">post</a> we discussed some of the subtle pitfalls of formulating the assumptions underlying average-case relaxations of differential privacy. This time we’re going to look at the composition property of differential privacy—that is, the fact that running two independent differentially private algorithms on your data and combining their outputs is still differentially private. This is a key property of differential privacy and is actually closely related to the worst-case nature of differential privacy.</p>
<p>Composition is really the crucial property that has made differential privacy successful. Data analysis doesn’t happen in a vacuum, and the greatest threat to privacy comes from combining multiple pieces of information. These pieces of information can come from a single source that releases detailed statistics, or they could come from separate sources. So it’s critical to understand how the composition of multiple pieces of information can affect privacy.</p>
<p>In this post we’ll give some examples to illustrate why we need composition, and why composition is challenging for average-case relaxations of differential privacy. Composition is what allows you to design sophisticated differentially private algorithms out of simple building blocks, and it’s what allows one organization to release differentially private statistics without having to understand the entire ecosystem of related information that has been or will be released. As we’ll see, the challenges of composing average-case privacy guarantees are also very closely related to the subtleties that arise in thinking about the adversary’s beliefs.</p>
<h3 id="differencing-attacks">Differencing Attacks</h3>
<p>Let’s start with a simple example of composition that was alluded to in our last post.</p>
<p>You’ve just started a new job and signed up for the health insurance provided by your employer. Thus, your employer is able to obtain aggregated data from the insurance provider. In particular, your employer can ask “How many of our employees have submitted claims for condition X?” However, your employer should not be able to find out whether or not <em>you</em> have condition X.
For concreteness, condition X could be a mental health condition, drug addiction, being pregnant, terminal cancer, or an expensive chronic illness. Each of these could result in some kind of employment discrimination.</p>
<p>The employer may find out that 417 employees have condition X. That’s OK; on its own, this number reveals very little about whether or not <em>you</em> have condition X, as long as your employer is uncertain about how many employees <em>other than you</em> have condition X. We can formalize this as some kind of average-case or Bayesian privacy guarantee. Thus the health-insurance company is comfortable releasing this number exactly. But, yesterday, before you started your job, it also seemed reasonable to allow your employer to ask the exact same question, and yesterday the answer was 416. Thus your employer concludes that you have condition X.</p>
<p>In this example, we see how two pieces of information—the count before you started and the count after you started—each of which seems innocuous on its own can be combined to reveal private information. This is a simple example of a <em>differencing attack</em> and composition is important in part because it prevents these attacks.</p>
<p>This example involves only two pieces of information. However, an attack could combine many pieces of information. For example, the counts could be broken down by sex, race/ethnicity, age, location, and tobacco use.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> Additional data may also be obtained from other sources, such as public records, social media, voluntary disclosures, healthcare providers, financial records, employment records, or even illicit sources. The possibilities for attacks grow rapidly as more information is made available. And an employer is only one example of a potential privacy adversary.</p>
<p>The point of this example is that it’s easy to argue that one piece of information is harmless to privacy by making plausible-looking assumptions about the adversary. But this intuition rapidly breaks down once you consider the bigger picture where there are many pieces of information that can complete the puzzle. That’s why we need rigorous methods for understanding privacy and its composition.</p>
<h3 id="quantifying-composition">Quantifying Composition</h3>
<p>How does differential privacy prevent a differencing attack like the one we just discussed? The simplest way is to add a little bit of random noise to each answer. On the first day, instead of releasing the exact count 416, we could release a noisy count, say, 420. Then on the second day, instead of releasing the true count 417, we release another noisy count, say, 415. More precisely, it is common to add noise to counts drawn from a Laplace or Gaussian distribution. These figures are still close enough to the true values to be useful, but the difference of 1 is now obscured by the noise, so your privacy is protected.</p>
<p>Since the noise is unknown to <em>any</em> potential adversary, it introduces uncertainty that protects the contribution that an individual makes to the count. Taking the difference of two independent noisy counts results in something that is still noisy. However, we must be careful to quantify this privacy guarantee, particularly when it comes to composition.</p>
<p>So, how much noise do we need to add? Let’s go back to the example and suppose the insurance company provides noisy answers where the noise has mean zero and some fixed variance. Your employer could simply ask the same question again and again and each time receive a different noisy answer. Averaging these noisy answers will effectively reduce the variance of the added noise and allow the true answer to be discerned. That leaves us back where we started.</p>
<p>The moral of this revised example is that the scale of the noise must increase if we allow more access to the data, so more questions means more noise in each answer.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
Asking the same question again and again may seem silly. There are easy ways to defend against this and some similar attacks. (E.g., by returning the same answer each time instead of generating fresh noise.) But, unfortunately, the underlying phenomenon cannot be circumvented. One of the seminal works that led to differential privacy <a href="https://dl.acm.org/doi/10.1145/773153.773173" title="Irit Dinur, Kobbi Nissim. Revealing Information While Preserving Privacy. PODS 2003"><strong>[DN03]</strong></a> showed that there is an inherent tradeoff between the number of questions to be answered and the amount of noise that needs to be added to protect privacy. The general attack is simple: Instead of asking the same query again and again, the attacker asks “random” queries.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> This attack only requires basic linear algebra and, importantly, has been demonstrated on real systems <a href="https://arxiv.org/abs/1810.05692" title="Aloni Cohen, Kobbi Nissim. Linear Program Reconstruction in Practice. 2018."><strong>[CN18]</strong></a>.</p>
<h3 id="adaptive-composition">Adaptive Composition</h3>
<p>There are actually two kinds of composition to consider. There is <strong>non-adaptive composition</strong>, where the questions to be asked are pre-specified and thus independent of the data, and there is <strong>adaptive composition</strong>, where the questions may themselves depend on the results of prior access to the data. Adaptive composition arises in an interactive system where queries are submitted one-by-one and each answer is returned before the next query is submitted. So far, we have really only considered non-adaptive composition.</p>
<p>Any interactive system must take adaptive composition into account. A natural algorithm which asks adaptive questions is gradient descent for minimizing a function that is determined by private data (e.g., for logistic regression on medical records). At each step, the algorithm asks for a gradient of the function, which depends on the private data, at the current point. Then the point is updated according to the reported gradient and the process repeats. Since the updated point depends on the previous answer, the next gradient computation is adaptive.</p>
<p>The good news is that differential privacy can handle adaptive composition just fine. However, to handle adaptive composition, it’s really important that you have a worst-case privacy definition like differential privacy. As we will see below, average-case variants of differential privacy cannot handle adaptive composition. Intuitively, the problem is that whatever distributional assumption you might make about the data or query a priori is unlikely to hold when you condition on past interactions with the same data or related data.</p>
<p>Here’s a technical example that shows the difficulty of adaptive composition. Our data \(x \in \{-1,+1\}^n\) is a vector of \(n\) bits, one bit per person.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> Because we’re considering average-case differential privacy, we’ll model this vector as uniformly random. Consider the following slightly odd algorithm \(M_2(x,v)\)—it takes a vector \(v \in \{-1,+1\}^n\) from the user, and, if the correlation \(\langle x, v \rangle / n\) between \(v\) and \(x\) is smaller than \(\varepsilon/2\), the query returns \(\emptyset\), but, if the correlation between \(v\) and \(x\) is larger than \(\varepsilon/2\), the query returns the dataset \(x\). In isolation this algorithm satisfies an average-case version of differential privacy, because if \(n\) is large enough and \(x\) is uniformly random, then it’s very unlikely that the user can guess a vector \(v\) that causes this algorithm to output anything other than \(\emptyset\). This algorithm may seem contrived; it is a simple stand-in for any algorithm that behaves very well most of the time, but fails completely on some rare inputs.</p>
<p>Now, consider another, more familiar differentially private algorithm called randomized response <a href="https://www.jstor.org/stable/2283137?seq=1" title="Stanley Warner. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. Journal of the American Statistical Association 1965."><strong>[W65]</strong></a>. For those not familiar, this algorithm \(M_1(x)\) outputs a vector \(y \in \{-1,+1\}^n\), where \(y_i\) is slightly more likely to be \(x_i\) than \(-x_i\). Specifically, we set \(y_i = x_i\) with probability \((1+\varepsilon)/2\) and \(y_i = - x_i\) otherwise. This satisfies \(\log(\frac{1+\varepsilon}{1-\varepsilon})\)-differential privacy or, roughly, \(2\varepsilon\)-differential privacy. The upshot is that we obtain a vector \(y\) where the correlation between \(x\) and \(y\) is about \(\varepsilon\), i.e. \(\langle x , y \rangle / n \approx \varepsilon\).</p>
<p>OK, so \(M_1\) and \(M_2\) both satisfy strong average-case versions of differential privacy when the data is uniform, but what about their composition? Well, the bad news is that running \(y = M_1(x)\) followed by \(M_2(x,y)\) is going to return the dataset \(x\) with probability approaching 100%! That’s because \(y\) was designed precisely to be a vector with correlation about \(\varepsilon\) with \(x\), and this is exactly the key that gets \(M_2\) to unlock the dataset.</p>
<p>What went wrong here is that, even if \(x\) really is uniformly random, it’s very far from it when conditioned on the output \(y=M_1(x)\). To analyze \(M_2(x,y)\) we must look at the distribution of \(x\) conditioned on \(y\). This distribution is going to be messy and may as well be a worst-case distribution, which means we must leave the realm of average-case privacy.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Composition guarantees that, as long as each part of your system is differentially private, then the overall system is too. It would be difficult to build sophisticated systems without this property. And it’s what allows one organization to release differentially private statistics without having to worry about what other information might be out there. In short, composition is what allows differential privacy to deal with the complexities of the real world.</p>
<p>It is unlikely that differential privacy would have taken off as a field of research without this composition property. Any proposal for an alternative approach to privacy-preserving data analysis should first be evaluated in terms of how it handles composition.</p>
<p>This post only scratches the surface. In particular, we haven’t talked about the quantitative aspects of composition; that’s where the fun really begins. We will leave you with some pointers to further reading on the topic:</p>
<ul>
<li><a href="http://www.annualreviews.org/eprint/E84vbD3Yzw4ff7YPAjnv/full/10.1146/annurev-statistics-060116-054123" title="Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman. Exposed! A Survey of Attacks on Private Data. Annual Review of Statistics and its Applications 2017."><strong>[DSSU17]</strong></a> This is a survey of attacks which explains quantitatively the relationship between noise, number of questions, and privacy risks.</li>
<li><a href="https://arxiv.org/abs/1311.0776" title="Peter Kairouz, Sewoong Oh, Pramod Viswanath. The Composition Theorem for Differential Privacy. ICML 2015."><strong>[KOV15]</strong></a> <a href="https://arxiv.org/abs/1507.03113" title="Jack Murtagh, Salil Vadhan. The Complexity of Computing the Optimal Composition of Differential Privacy. TCC 2016"><strong>[MV15]</strong></a> <a href="https://arxiv.org/abs/1603.01887" title="Cynthia Dwork, Guy Rothbum. Concentrated Differential Privacy. 2016."><strong>[DR16]</strong></a> <a href="https://arxiv.org/abs/1605.02065" title="Mark Bun, Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. TCC 2016."><strong>[BS16]</strong></a> <a href="https://arxiv.org/abs/1702.07476" title="Ilya Mironov. Renyi Differential Privacy. CSF 2017."><strong>[M17]</strong></a> <a href="https://arxiv.org/abs/1905.02383" title="Jinshuo Dong, Aaron Roth, Weijie Su. Gaussian Differential Privacy. Journal of the Royal Statistical Society: Series B. 2020"><strong>[DRS19]</strong></a> On the positive side, these papers analyze how differential privacy composes, yielding sharp quantitative bounds.</li>
</ul>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>A good rule of thumb is that, if the number of released values is much larger than the number of people, then a privacy attack is probably possible. This is analogous to the rule from algebra that, if the number of constraints (released values) is greater than the number of unknown variables (people’s data), then the unknowns can be worked out. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Exactly quantifying how much noise is needed as the number of questions grows leads to the concept of a “privacy budget.” That is, we must precisely quantify how differential privacy degrades under composition. This is a very deep topic and is something we hope to discuss in future posts. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>The queries do not need to be random <strong><a href="https://iacr.org/archive/crypto2008/51570469/51570469.pdf" title="Cynthia Dwork, Sergey Yekhanin. New Efficient Attacks on Statistical Disclosure Control Mechanisms. CRYPTO 2008">[DY08]</a></strong>. The queries simply need to be “sufficiently distinct”, which can be formulated precisely as being nearly orthogonal vectors. Random, or even pseudorandom queries (e.g., hash functions), will almost certainly satisfy this property. In general, it is fairly likely that a set of queries will have this property and allow a reconstruction attack; that is, it is hard to <em>avoid</em> this phenomenon. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This representation of the dataset as a vector of bits \(x \in \{-1,+1\}^n \) is an abstraction. The entries in the dataset would actually be something like a set of pairs \( ( u_i, x_i ) \) for \(i = 1, \cdots, n \), where \(u_i\) is various information that identifies the individual concerned (name, address, race, date of birth, etc.). <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Thomas SteinkeJonathan UllmanSun, 16 Aug 2020 10:00:00 -0400
https://differentialprivacy.org/privacy-composition/
https://differentialprivacy.org/privacy-composition/