Differential PrivacyWebsite for the differential privacy research community
https://differentialprivacy.org
Open Problem: Private All-Pairs Distances<p><strong>Background:</strong> Suppose we are interested in computing the distance between two vertices in a graph. Under edge or node differential privacy, this problem is not promising because the removal of a single edge can make distances change from 1 to \(n − 1\) or can even disconnect the graph. However, a different setting that makes sense to consider is that of a weighted graph \((G, w)\) whose topology \(G = (V, E)\) is publicly known but edge weight function \(w : E \to \mathbb{R}^+\) must be kept private. (For instance, consider transit times on a road network. The topology of the road network may be publicly available as a map, but the edge weights corresponding to transit times may be based on private GPS locations of individual cars.)</p>
<p>Suppose that two weight functions \(w\) and \(w’\) of the same graph \(G\) are considered to be neighbors if they differ by at most 1 in \(\ell^1\) norm. Then the length of any fixed path is sensitivity-one, so the distance between any pair of vertices is also sensitivity-one and can be released privately via the Laplace mechanism. But what if we want to release all \(\Theta(n^2)\) distances between all pairs of vertices in \(G\)? We can do this with accuracy roughly \(O(n \log n )/\varepsilon\) by adding noise to each edge, or roughly \(O(n \sqrt{\log(1/\delta)}/\varepsilon)\) using composition theorems. Both of these are roughly \(n/\varepsilon\). But is this linear dependence on \(n\) inherent, or is it possible to release all-pairs distances with error sublinear in \(n\)?</p>
<p>This setting and question were considered in [<a href="https://arxiv.org/abs/1511.04631">S16</a>].</p>
<p><strong>Problem 1:</strong> Let \(G\) be an arbitrary public graph, and \(w : E \to \mathbb{R}^+\) be an edge weight function. Can we release approximate all-pairs distances in \((G, w)\) with accuracy sublinear in \(n\) while preserving the privacy of the edge weight function, where two weight functions \(w, w’\) are neighbors if \(\|w − w’\|_1 \le 1\)? Or can we show that any private algorithm must have error \(\Omega(n)\)? A weaker (but nontrivial) lower bound would also be nice.</p>
<p><strong>Reward:</strong> A bar of chocolate.</p>
<p><strong>Other related work:</strong> [<a href="https://arxiv.org/abs/1511.04631">S16</a>] provided algorithms with better error for two special cases, trees and graphs of a priori bounded weight. For trees, it is possible to release all-pairs distances with error roughly \(O(\log^{1.5} n)/\varepsilon\), while for arbitrary graphs with edge weights restricted to the interval \([0,M]\), it is possible to release all-pairs distances with error roughly \(O( \sqrt{nM\varepsilon^{-1}\log(1/\delta)})\)</p>
<p><em>Submitted by <a href="http://www.mit.edu/~asealfon/">Adam Sealfon</a> on April 9, 2019.</em></p>
Audra McMillanWed, 05 Aug 2020 14:00:00 -0400
https://differentialprivacy.org/open-problem-all-pairs/
https://differentialprivacy.org/open-problem-all-pairs/The Pitfalls of Average-Case Differential Privacy<p>Differential privacy protects against extremely strong adversaries—even ones who know the entire dataset except for one bit of information about one individual. Since its inception, people have considered ways to relax the definition to assume a more realistic adversary. A natural way to do so is to incorporate some distributional assumptions. That is, rather than considering a worst-case dataset, assume the dataset is drawn from some distribution and provide some form of “average-case” or “Bayesian” privacy guarantee with respect to this distribution. This is especially tempting as it is common for statistical analysis to work under distributional assumptions.</p>
<p>In this post and in a planned follow-up post, we will discuss some pitfalls of average-case or Bayesian versions of differential privacy. To avoid keeping you in suspense:</p>
<ul>
<li>The average-case assumptions in relaxations of differential privacy are qualitatively different to and much more brittle than the typical assumptions made about how the data is generated.</li>
<li>Average-case relaxations do not satisfy the strong composition properties that have made differential privacy so successful.</li>
<li>It is safer to use distributional assumptions in the accuracy analysis instead of the privacy analysis. That is, we can provide average-case utility and worst-case privacy. Recent work has shown that this model can capture most of the advantages of distributional assumptions.</li>
</ul>
<p>We will show some illustrative examples for each of these points, but we will be purposefully vague as to exactly which alternative definition we are considering, as these issues arise in a wide variety of definitions. Our hope is not to shut down discussion of these relaxations, or to single out specific definitions as flawed. There are specific concrete applications where average-case differential privacy might be useful, and our goal is to highlight some issues that must be carefully considered in each application.</p>
<h3 id="assumptions-about-nature-vs-assumptions-about-the-adversary">Assumptions about nature vs. assumptions about the adversary?</h3>
<p>In any reasonable definition of privacy, we have to think about whom we are hiding sensitive information from. This person—“the adversary”—could be a stranger, a close friend, a relative, a corporation we do business with, or the government, and who they are affects what information they have access to and what defenses are appropriate. How the adversary can access the private system defines the <a href="\trustmodels">trust model</a>. Distributional assumptions correspond to the adversary’s side information. Our key point is:</p>
<blockquote>
<p>Assumptions incorporated into the definition of privacy are assumptions about the adversary and these are qualitatively different from assumptions about “nature,” which is the process that generates the data.</p>
</blockquote>
<p>For example, suppose an employer learns that two of its employees have expensive medical conditions. On its own, this information does not identify those employees and this privacy intuition could be formalized via distributional assumptions. But these distributional assumptions will break if the employer later receives some side information. For example, the other healthy employees may voluntarily disclose their medical status or the employer may find out that, before you were hired, that number was only one. (Incidentally, this is an example of a failure of composition, which we will discuss in another post.)</p>
<p>This example illustrates how assumptions about the adversary that might seem reasonable in a vacuum can be invalidated by context. Plus, assumptions about the adversary can be invalidated by <em>future</em> side information, and you can’t retract a privacy leak once it happens the way you can a medical study. So assumptions about the adversary are much less future-proof than assumptions about nature.</p>
<h3 id="all-models-are-wrong-but-some-are-useful">All models are wrong, but some are useful</h3>
<p>One justification for incorporating distributional assumptions into the privacy definition is that the person using the data is often making these assumptions anyway—for example, that the data is i.i.d. Gaussian, or that two variables have some underlying linear relationship to be discovered. So, if the assumption were false, wouldn’t we already be in trouble? Not really.</p>
<blockquote>
<p>It’s important to remember the old saw “all models are wrong, but some are useful.” Some models have proven themselves useful for statistical purposes, but that does not mean they are useful as a basis for privacy.</p>
</blockquote>
<p>For example, our methods may be robust to the relatively friendly ways that nature deviates from the model, but we can’t trust adversaries to be as friendly.</p>
<p>For a toy example, suppose we model our data as coming from a normal distribution \( N(\mu,\sigma^2) \), but actually the data is collected at two different testing centers, one of which rounds its measurements to the nearest integer and the other of which provides two decimal places of precision. This rounding makes the model wrong, but won’t significantly affect our estimate of the mean. However, just looking at the estimate of the mean might reveal that someone in the dataset went to the second testing center, potentially compromising that person’s privacy.</p>
<p>A more natural setting where this issue arises is in dealing with <em>outliers</em> or other extreme examples, which we will discuss in the next section.</p>
<h3 id="privacy-for-outliers">Privacy for outliers</h3>
<p>The usual worst-case definition of differential privacy provides privacy for everyone, including outliers. Although there are lots of ways to achieve differential privacy, in order to compare definitions, it will help to restrict attention to the basic approach based on calibrating noise to sensitivity:</p>
<p>Suppose we have a private dataset \( x \in \mathcal{X}^n \) containing the data of \( n \) individuals, and some real-valued query \( q : \mathcal{X}^n \to \mathbb{R} \). The standard way to release an estimate of \( q(x) \) is to compute
\[
M(x) = q(x) + Z \cdot \sup_{\textrm{neighboring}~x’,x”} |q(x’) - q(x”)|
\]
where
\(
\sup_{\textrm{neighboring}~x’,x”} |q(x’) - q(x”)|
\)
is called the “worst-case sensitivity” of \( q \) and \( Z \) is some noise, commonly drawn from a Laplace or Gaussian distribution.</p>
<p>Unfortunately, the worst-case sensitivity may be large or even infinite for basic statistics of interest, such as the mean \( q(x) = \frac{1}{n} \sum_{i} x_i \) of unbounded real values. There are a variety of differentially private algorithms for addressing this problem,<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> but that is not what this post is about. It’s tempting to, instead, try to scale the noise to some notion of “average-case sensitivity,” with the goal of satisfying some average-case version of differential privacy. For example, suppose the data is drawn from some normal distribution \( N(\mu,\sigma^2) \) and the neighboring datasets \( x', x'' \) are each \( n \) i.i.d. samples from this distribution, but differing on exactly one random sample. Then the worst-case sensitivity of the mean is infinite:
\[
\sup_{\textrm{neighboring}~x’,x”} |q(x’) - q(x”)| = \infty,
\]
but the average-sensitivity is proportional to \(1/n\):
\[
\mathbb{E}_{\textrm{neighboring}~x’, x”}(|q(x’) - q(x”)|) \approx \frac{\sigma}{n}.
\]
Thus, under an average-case privacy guarantee, we can estimate the mean with very little noise.</p>
<p>But what happens to privacy if this assumption fails, perhaps because of outliers? Imagine computing the average wealth of a subset of one hundred Amazon employees who test positive for COVID-19, and discovering that it’s over one billion dollars. Maybe Jeff Bezos isn’t feeling well?<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
<p>Yes, this example is a little contrived, since you probably shouldn’t have computed the empirical mean of such skewed data anyway. But, if this fact leaks out, you can’t just go back in time and truncate the data or compute the median instead. Privacy tends to be high-stakes both because of the potential consequences of a breach and the inability to retract or correct a privacy violation after it is discovered.</p>
<p>In the next section we’ll see a slightly more complex example where average-case privacy / average-case sensitivity fails to protect privacy even when the distributional assumptions hold.</p>
<h3 id="example-pairwise-correlations-and-linear-regression">Example: pairwise correlations and linear regression</h3>
<p>Suppose our dataset \( X \) is a matrix \( \{-1,+1\}^{n \times (d+1)} \) where each row \( X_i \) corresponds to one person’s data and each column corresponds to one feature. For simplicity, let’s suppose our distributional assumption is that the dataset is completely uniform—each bit is sampled independently and uniformly from \( \{-1,+1\} \). We’ll think of the first \( d \) columns as “features” and the last column as a “secret label.”</p>
<p>First, consider the set of pairwise correlations between each feature and the secret label:
\[
q_j(X) = \sum_{i = 1}^{n} X_{i,j} X_{i,d+1}
\]
for \( j = 1,\dots,d \). Note that \( q_j(X) \) has mean 0 and variance \( n \) under our distributional model of the data.</p>
<p>Now, suppose we have a weight vector \( w \in \mathbb{R}^{d} \) and want to estimate the weighted average of correlations
\[
q(X) = \sum_{j = 1}^{d} w_j q_j(X) = \sum_{i=1}^{n} \sum_{j=1}^{d} w_{j} X_{i,j} X_{i,d+1}
\]
This statistic may look a little odd, but it’s pretty close to computing the average squared error of the linear predictor \( \hat X_{i,d+1} = \sum_{j=1}^{d} w_j X_{i,j} \) given by the weight vector \( w \), which is a natural thing to estimate.</p>
<p>The worst-case sensitivity of \(q\) is proportional to \( \|w\|_1 \).<br />
However, it’s not too hard to show that, under our distributional model, the average-case sensitivity is much lower; it is proportional to \( \| w \|_2 \). Thus, using average-case privacy may allow us to add significantly less noise.</p>
<p>What could go wrong here? Well, we’ve implicitly assumed that the weights \( w \) are independent of the data \( X \). That is, the person specifying the weights has no knowledge of the data itself, only its distribution. Suppose the weights are specified by an adversary who has learned the \( d \) features of the first individual (although there is nothing special about considering the first individual), who sets the weights to \(w = (X_{1,1},\dots,X_{1,d}) \). Another calculation shows that, in this case, <em>even when our model of the data is exactly correct</em>, the query \( q(X) \) has mean \( d \cdot X_{1,d+1} \) and standard deviation approximately \( \sqrt{nd} \). Thus, if \( d \gg n \) we can confidently determine the secret label \( X_{1,d+1} \) of the first individual from the value \( q(X) \). Moreover, adding noise of standard deviation \( \ll d \) will not significantly affect the adversary’s ability to learn the secret label. But, earlier, we argued that average-case sensitivity is proportional to \( \|w \|_2 = \sqrt{d} \), so this form of average-case privacy fails to protect a user’s data in this scenario! Note that adding noise proportional to \( \|w\|_1 = d \) would satisfy (worst-case) differential privacy and would thwart this adversary.</p>
<blockquote>
<p>What went wrong is that the data satisfied our assumptions, but the adversary’s beliefs about the data did not!</p>
</blockquote>
<p>The set of reasonable distributions to consider for the adversary’s beliefs may look very different from the set of reasonable distributions to consider for your analysis of the data. You may think that it’s not reasonable for the attacker to choose this weight vector \( w \) containing a lot of prior information about an individual, but assuming that the attacker cannot obtain or specify such a vector is very different from assuming that the data is uniform, and requires its own justification.</p>
<p>Before wrapping up, let’s just make a couple more observations about this example:</p>
<ul>
<li>This attack is pretty robust. The assumption that the data is uniform with independent features can be relaxed significantly. It’s also not necessary for the adversary to exactly know all the features of the first user, all we need is for the weights to have correlation \( \gg \sqrt{nd} \) with the features. For example, if the dataset is genomic data, having the data of a relative might suffice.</li>
<li>This problem isn’t specific to high-dimensional data with \( d \gg n \). If we allow more general types of “queries”, then a similar attack is possible when there are only \( d \approx \log n \) features.</li>
<li>To make this example as crisp as possible, we allowed an adversarial data analyst to specify the weight vector \( w \). You might think examples like this can’t arise if the algorithm designer specifies all of the queries internally, but ensuring that requires great care (as we’ll see in our upcoming post about composition).</li>
</ul>
<h3 id="conclusion">Conclusion</h3>
<p>As we have discussed, the main issue that arises in average-case or Bayesian versions of differential privacy is that we must make strong assumptions about the adversary. A simple distributional assumption about the data, which may be entirely reasonable for statistical analysis, entails assuming a naïve adversary with essentially no side information, which is not reasonable from a privacy perspective.</p>
<p>In a future post, we will discuss <em>composition</em>, which is a key robustness property and really the secret to differential privacy’s success. As we’ll see, average-case versions of differential privacy do not enjoy strong composition properties the way worst-case differential privacy does, which makes them much harder to deploy.</p>
<p>Incorporating assumptions about the adversary into the privacy guarantee requires great care; and it is safest to make fewer assumptions, which quickly pushes us towards the worst-case definition of differential privacy. Nevertheless, assumptions about the adversary are often made implicitly and it is worth studying how to make these explicit.</p>
<p>So, is there are role for distributional assumptions in differential privacy? Yes! Although we’ve discussed the pitfalls of making the <em>privacy guarantee</em> contingent on distributional assumptions, none of these pitfalls apply to making the <em>utility guarantee</em> contingent on distributional assumptions, as is normally done in statistical analysis. In recent years, this combination—worst-case privacy, average-case utility—has been fruitful, and seems to allow many of the benefits that average-case privacy definitions seek to capture. For example, recent work has shown that worst-case differential privacy permits accurate mean and covariance estimation of unbounded data under natural modeling assumptions <a href="https://arxiv.org/abs/1711.03908" title="Vishesh Karwa, Salil Vadhan. Finite Sample Differentially Private Confidence Intervals. ITCS 2018."><strong>[KV18]</strong></a>, <a href="https://arxiv.org/abs/1805.00216" title="Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman. Privately Learning High-Dimensional Distributions. COLT 2019."><strong>[KLSU19]</strong></a>, <a href="https://arxiv.org/abs/1906.02830" title="Mark Bun, Thomas Steinke. Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation. NeurIPS 2019."><strong>[BS19]</strong></a>, <a href="https://arxiv.org/abs/2001.02285" title="Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, Adam Groce. Differentially Private Confidence Intervals. 2020."><strong>[DFMBG20]</strong></a>, <a href="https://arxiv.org/abs/2002.09464" title="Gautam Kamath, Vikrant Singhal, Jonathan Ullman. Private Mean Estimation of Heavy-Tailed Distributions. COLT 2020."><strong>[KSU20]</strong></a>, <a href="https://arxiv.org/abs/2006.06618" title="Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman. CoinPress: Practical Private Mean and Covariance Estimation. 2020."><strong>[BDKU20]</strong></a>, but this remains an active area of research.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p>For example, there are approaches based on various paradigms like Smooth Sensitivity <a href="http://www.cse.psu.edu/~ads22/pubs/NRS07/NRS07-full-draft-v1.pdf" title="Kobbi Nissim, Sofya Raskhodnikova, Adam Smith. Smooth Sensitivity and Sampling in Private Data Analysis. STOC 2007."><strong>[NRS07]</strong></a> <a href="https://arxiv.org/abs/1906.02830" title="Mark Bun, Thomas Steinke. Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation. NeurIPS 2019."><strong>[BS19]</strong></a>, Propose-Test-Release <a href="http://www.stat.cmu.edu/~jinglei/dl09.pdf" title="Cynthia Dwork, Jing Lei. Differential Privacy and Robust Statistics. STOC 2009."><strong>[DL09]</strong></a>, or Truncation/Winsorization <a href="http://www.cse.psu.edu/~ads22/pubs/2011/stoc194-smith.pdf" title="Adam Smith. Privacy-preserving Statistical Estimation with Optimal Convergence Rates. STOC 2011."><strong>[S11]</strong></a> <a href="https://arxiv.org/abs/1711.03908" title="Vishesh Karwa, Salil Vadhan. Finite Sample Differentially Private Confidence Intervals. ITCS 2018."><strong>[KV18]</strong></a> <a href="https://arxiv.org/abs/1805.00216" title="Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman. Privately Learning High-Dimensional Distributions. COLT 2019."><strong>[KLSU19]</strong></a> <a href="https://arxiv.org/abs/2002.09464" title="Gautam Kamath, Vikrant Singhal, Jonathan Ullman. Private Mean Estimation of Heavy-Tailed Distributions. COLT 2020."><strong>[KSU20]</strong></a> to name a few. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>If you are confident that Jeff Bezos or other extremely high-wealth individuals are not in the sample, then you could <em>truncate</em> each sample and compute the mean of the truncated samples. This would give worst-case privacy, and, if you are correct in your assumption, would not affect the mean. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Thomas SteinkeJonathan UllmanWed, 22 Jul 2020 10:00:00 -0400
https://differentialprivacy.org/average-case-dp/
https://differentialprivacy.org/average-case-dp/Conference Digest - STOC 2020<p><a href="http://acm-stoc.org/stoc2020/">STOC 2020</a> was recently held online, as one of the first major theory conferences during the COVID-19 era.
It featured four papers on differential privacy, which we list and link below.
Each one is accompanied by a video from the conference, as well as a longer video if available.
Please let us know if we missed any papers on differential privacy, either in the comments below or by email.</p>
<ul>
<li>
<p><a href="https://arxiv.org/abs/1911.08339">The Power of Factorization Mechanisms in Local and Central Differential Privacy</a> (<a href="https://www.youtube.com/watch?v=hSenRTxhZhM">video</a>)<br />
<a href="https://dblp.uni-trier.de/pers/hd/e/Edmonds:Alexander">Alexander Edmonds</a>, <a href="http://www.cs.toronto.edu/~anikolov/">Aleksandar Nikolov</a>, <a href="https://www.ccs.neu.edu/home/jullman/">Jonathan Ullman</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/2005.04763">Private Stochastic Convex Optimization: Optimal Rates in Linear Time</a> (<a href="https://www.youtube.com/watch?v=Tlc-z-MFAmM">video</a>)<br />
<a href="http://vtaly.net/">Vitaly Feldman</a>, <a href="https://tomerkoren.github.io/">Tomer Koren</a>, <a href="http://kunaltalwar.org/">Kunal Talwar</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1911.04014">Interaction is necessary for distributed learning with privacy or communication constraints</a> (<a href="https://www.youtube.com/watch?v=AWgzaFOU_HM">video</a>)<br />
<a href="https://yuvaldagan.wordpress.com/">Yuval Dagan</a>, <a href="http://vtaly.net/">Vitaly Feldman</a></p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1906.05271">Does Learning Require Memorization? A Short Tale about a Long Tail</a> (<a href="https://www.youtube.com/watch?v=sV59uoWJRnk">video</a>, <a href="https://www.youtube.com/watch?v=Fp7cgHRl8Yc">longer video</a>)<br />
<a href="http://vtaly.net/">Vitaly Feldman</a></p>
</li>
</ul>
Gautam KamathMon, 20 Jul 2020 10:00:00 -0400
https://differentialprivacy.org/stoc2020/
https://differentialprivacy.org/stoc2020/Trust models, and notions of privacy<p>There exist various notions of differential privacy which, while sharing a common core, differ in some key specific aspects. Broadly speaking, vary among a few main axes, such as the type of guarantee they provide, the specific similarity between data they consider, and the trust model they aim to address. This last point will be the focus of this post: <em>which notion of privacy is best suited to the specific scenario at hand?</em></p>
<p>We will cover 4 of these notions.</p>
<ul>
<li>(central) differential privacy (DP)</li>
<li>local differential privacy (LDP)</li>
<li>pan-privacy</li>
<li>shuffle privacy</li>
</ul>
<p>Typically, the world can be divided in a few categories: (i) the users, who hold the data; (ii) the “server,” who runs the algorithm; and (iii) the rest of the world, which does what the rest of the world does. As the name indicates, the <em>trust model</em> boils down to the following simple question: as a user, <strong>who do you trust</strong> with your sensitive data?</p>
<p>In the <em>DP model</em> <strong>[DMNS06]</strong>, the answer is essentially “the server, and nobody else.” Users are happy to provide their data to the server, which runs the algorithm on the resulting dataset; however, the <em>output</em> of that algorithm, which is released to the (untrusted) world, needs to be private, and not reveal sensitive information about any single user.</p>
<p>In the <em>LDP model</em> <strong>[EGS03,KLNRS08]</strong>, the server itself is untrusted, and the answer is “nobody.” Any data communicated by the users must already be private, and even a prying server cannot learn much about any single user. Of course, this is a strictly more stringent privacy model than the central DP one, and this comes at a price: the utility one can obtain from the same amount of data is typically smaller than in the DP model.</p>
<p>The <em>pan-privacy model</em> <strong>[DNPRY10]</strong> introduces the notion of time. Each user contributes their data to the server sequentially, one after the other; once the server is done receiving and processing this data, the output is revealed to the world. The answer to the question then is that users trust the server <em>at the time they send it their data</em>, but maybe not in the future (and they <em>definitely</em> don’t trust the outside world). Put differently, this captures settings where a server can be compromised: at the time a user sends their data, they trust the server; if the server is compromised at any point in the future, then the data already in the server <em>stays</em> private (but, of course, sending any more data after the server has already been attacked is a bad idea).</p>
<p>Finally, the recent <em>shuffle model</em> of privacy <strong>[CSUZZ19,EFMRTT19]</strong> is in some sense intermediate between the central and local models of DP: users do not trust the server (and, god forbid, they still don’t trust the outside world!); however, they do trust some small blackbox in the middle, whose role is to randomly, well, <em>shuffle</em> the data. That is, when all users send their data to the untrusted server, this box-in-the-middle randomly permutes all the data points, so that the server had no idea who sent which part of the data. This simple-yet-helpful trusted backbox, in turn, can be implemented using e.g., cryptographic primitives; and the goal is to try and provide stronger privacy than in the DP model, while suffering a smaller utility loss than in the stringent LDP model.</p>
<p>It is important to note that <em>there is no right or wrong model</em> of privacy here, and one cannot say that any of the above notion is “better” than the others with regard to both privacy and accuracy. They all aim at modeling different scenarios, and provide incomparable guarantees: depending on your situation, pick the one that fits best.</p>
<hr />
<p><strong>[<a href="https://arxiv.org/abs/1808.01394">CSUZZ19</a>]</strong> Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, Maxim Zhilyaev:
<em>Distributed Differential Privacy via Shuffling.</em> EUROCRYPT (1) 2019: 375-403</p>
<p><strong>[<a href="https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405">DMNS06</a>]</strong> Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam D. Smith:
<em>Calibrating Noise to Sensitivity in Private Data Analysis.</em> TCC 2006: 265-284</p>
<p><strong>[<a href="https://conference.iiis.tsinghua.edu.cn/ICS2010/content/papers/6.html">DNPRY10</a>]</strong> Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, Sergey Yekhanin:
<em>Pan-Private Streaming Algorithms.</em> ICS 2010: 66-80</p>
<p><strong>[<a href="https://arxiv.org/abs/1811.12469">EFMRTT19</a>]</strong> Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Abhradeep Thakurta:
<em>Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity.</em> SODA 2019: 2468-2479</p>
<p><strong>[<a href="https://dl.acm.org/doi/10.1145/773153.773174">EGS03</a>]</strong> Alexandre V. Evfimievski, Johannes Gehrke, Ramakrishnan Srikant:
<em>Limiting privacy breaches in privacy preserving data mining.</em> PODS 2003: 211-222</p>
<p><strong>[<a href="https://arxiv.org/abs/0803.0924">KLNRS08</a>]</strong> Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, Adam D. Smith:
<em>What Can We Learn Privately?</em> FOCS 2008: 531-540</p>
Clément CanonneFri, 17 Jul 2020 11:45:00 -0400
https://differentialprivacy.org/trustmodels/
https://differentialprivacy.org/trustmodels/Welcome to DifferentialPrivacy.org!<p>Hello, welcome to this new website! Our goal is to serve as a hub for the differential privacy research community and to promote the work in this area. Please read on to learn more!</p>
<p>We anticipate posting a variety of content, from announcements to mini-surveys of topics in the differential privacy literature. These are archived on our <a href="https://differentialprivacy.org/categories/">Posts</a> page.
We have also assembled a collection of <a href="https://differentialprivacy.org/resources/">Resources</a>, which we hope will help newcomers learn and enter the field.</p>
<p>We have created a <a href="https://groups.google.com/forum/#!forum/differential-privacy-org">mailing list</a> for the differential privacy community.
The goal is to create a channel which could reach the entire differential privacy community at once.
We envision this list being used only to send out announcements of the most broad interest, and as such, it is anticipated to be very low-traffic (≈ 1 post per month).
Click <a href="https://groups.google.com/forum/#!forum/differential-privacy-org/join">here</a> to join.</p>
<p>To follow the latest updates on DifferentialPrivacy.org, you can:</p>
<ol>
<li>Follow us on <a href="https://twitter.com/DiffPriv">Twitter</a></li>
<li>Subscribe to our <a href="https://differentialprivacy.org/feed.xml">RSS feed</a></li>
<li>Sign up for <a href="https://feedburner.google.com/fb/a/mailverify?uri=DifferentialPrivacy">email updates</a> (note: distinct from the Google Groups mailing list)</li>
<li>Set this website to be your homepage ;)</li>
</ol>
<p>This is a community-driven effort and we welcome participation.
If you are interested in contributing, please reach out to us (by email or in the comments below).
Further details are on <a href="https://differentialprivacy.org/about/">About</a> and <a href="https://github.com/differentialprivacy/differentialprivacy">Github</a>.</p>
<p>To get things started, here is a definition:</p>
<blockquote>
<p><strong>Definition 1.</strong> [<a href="https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405">DMNS06</a>, <a href="https://www.iacr.org/archive/eurocrypt2006/40040493/40040493.pdf">DKMMN06</a>]</p>
<p>A randomized algorithm \(M : \mathcal{X}^n \to \mathcal{Y}\) is \((\varepsilon,\delta)\)-differentially private if, for all \(x,x’ \in \mathcal{X}^n\) differing on a single entry and all measurable \(E \subseteq \mathcal{Y}\), we have \[\mathbb{P}[M(x) \in E] \le e^\varepsilon \cdot \mathbb{P}[M(x’) \in E] + \delta.\]</p>
</blockquote>
Gautam KamathThomas SteinkeJonathan UllmanZhiwei Steven WuThu, 16 Jul 2020 21:00:00 -0400
https://differentialprivacy.org/welcome/
https://differentialprivacy.org/welcome/