# One-shot DP Top-k mechanisms

In the last blog post, we showed that the exponential mechanism enjoys improved composition bounds over general pure DP mechanisms due to a property called bounded range. For this post, we will present another useful, and somewhat surprising, property of the exponential mechanism in its application of top-$$k$$ selection.

# A Better Privacy Analysis of the Exponential Mechanism

A basic and frequent task in data analysis is selection – given a set of options $$\mathcal{Y}$$, output the (approximately) best one, where “best” is defined by some loss function $$\ell : \mathcal{Y} \times \mathcal{X}^n \to \mathbb{R}$$ and a dataset $$x \in \mathcal{X}^n$$. That is, we want to output some $$y \in \mathcal{Y}$$ that approximately minimizes $$\ell(y,x)$$. Naturally, we are interested in private selection – i.e., the output should be differentially private in terms of the dataset $$x$$. This post discusses algorithms for private selection – in particular, we give an improved privacy analysis of the popular exponential mechanism.

# Open Problem - Optimal Query Release for Pure Differential Privacy

Releasing large sets of statistical queries is a centerpiece of the theory of differential privacy. Here, we are given a dataset $$x = (x_1,\dots,x_n) \in [T]^n$$, and a set of statistical queries $$f_1,\dots,f_k$$, where each query is defined by some bounded function $$f_j : [T] \to [-1,1]$$, and (abusing notation) is defined as $f_j(x) = \frac{1}{n} \sum_{i=1}^{n} f_j(x_i).$ We use $$f(x) = (f_1(x),\dots,f_k(x))$$ to denote the vector consisting of the true answers to all these queries. Our goal is to design an $$(\varepsilon, \delta)$$-differentially private algorithm $$M$$ that takes a dataset $$x\in [T]^n$$ and outputs a random vector $$M(x)\in \mathbb{R}^k$$ such that $$\| M(x) - f(x) \|$$ is small in expectation for some norm $$\|\cdot\|$$. Usually algorithms for this problem also give high probability bounds on the error, but we focus on expected error for simplicity.

# Conference Digest - ICML 2021

ICML 2021, one of the biggest conferences in machine learning, naturally has a ton of interesting sounding papers on the topic of differential privacy. We went through this year’s accepted papers and aggregated all the relevant papers we could find. In addition, this year features three workshops on the topic of privacy, as well as a tutorial. As always, please inform us if we overlooked any papers on differential privacy.

# Statistical Inference is Not a Privacy Violation

On April 28, 2021, the US Census Bureau released a new demonstration of its differentially private Disclosure Avoidance System (DAS) for the 2020 US Census. The public were given a month to submit feedback before the system is finalized. This demonstration data and the feedback has generated a lot of discussion, including media coverage on National Public Radio, in the Washington Post, and via the Associated Press. The DAS is also the subject of an ongoing lawsuit.

# Call for Papers - Workshop on the Theory and Practice of Differential Privacy (TPDP 2021)

Work on differential privacy spans a number of different research communities, including theoretical computer science, machine learning, statistics, security, law, databases, cryptography, programming languages, social sciences, and more. Each of these communities may choose to publish their work in their own community’s venues, which could result in small groups of differential privacy researchers becoming isolated. To alleviate these issues, we have the Workshop on the Theory and Practice of Differential Privacy (TPDP), which is intended to bring these subcommunities together under one roof (well, a virtual one at least for 2020 and 2021).

# ALT Highlights - An Equivalence between Private Learning and Online Learning (ALT '21 Tutorial)

Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference ALT 2021, including plenary talks, tutorials, trends in learning theory, and more! To reach a broad audience, the series will be disseminated as guest posts on different blogs in machine learning and theoretical computer science. Given the topic of this post, we felt DifferentialPrivacy.org was a great fit. This initiative is organized by the Learning Theory Alliance, and overseen by Gautam Kamath. All posts in ALT Highlights are indexed on the official Learning Theory Alliance blog.

# What is δ, and what δifference does it make?

There are many variants or flavours of differential privacy (DP) some weaker than others: often, a given variant comes with own guarantees and “conversion theorems” to the others. As an example, “pure” DP has a single parameter $$\varepsilon$$, and corresponds to a very stringent notion of DP:

# Conference Digest - TPDP 2020

TPDP 2020 is a workshop focused on differential privacy. As such, it’s a great place to learn about recent developments in the DP research community. It will be held on 13 November and is co-located with CCS, but, of course, it’s virtual this year. Registration is only US\$35 if you register by Friday, 30 October. Check out the 8 excellent talks and 71 posters below – wow, the workshop has grown!

# Reconstruction Attacks in Practice

This is the second of two posts describing the theory and practice of reconstruction attacks. To read the first post, which covers the theoretical basis of such attacks, [click here].

# The Theory of Reconstruction Attacks

We often see people asking whether or not differential privacy might be overkill. Why do we need strong privacy protections like differential privacy when we’re only releasing approximate, aggregate statistical information about a dataset? Is it really possible to extract information about specific users from releasing these statistics? The answer turns out to be a resounding yes! The textbook by Dwork and Roth [DR14] calls this phenomenon the Fundamental Law of Information Recovery:

Giving overly accurate answers to too many questions will inevitably destroy privacy.

# Conference Digest - NeurIPS 2020

NeurIPS 2020 is the biggest conference on machine learning, with tons of content on differential privacy in many different forms. We were able to find two workshops, a competition, and 31 papers. This was just going off the preliminary accepted papers list, so it’s possible that we might have missed some papers on differential privacy – please let us know! We will update this post later, once all the conference material (papers and videos) are publicly available.

# Open Problem - Avoiding the Union Bound for Multiple Queries

Background: Perhaps the best-studied problem in differential privacy is answering multiple counting queries. The standard approach is to add independent, appropriately-calibrated (Laplace or Gaussian) noise to each query result and apply a composition theorem. To bound the maximum error over the query answers, one takes a union bound over the independent noise samples. However, this is not optimal. The problem is to identify the optimal method (up to constant factors).

# Differentially Private PAC Learning

The study of differentially private PAC learning runs all the way from its introduction in 2008 [KLNRS08] to a best paper award at the Symposium on Foundations of Computer Science (FOCS) this year [BLM20]. In this post, we’ll recap the history of this line of work, aiming for enough detail for a rough understanding of the results and methods.

# Conference Digest - ICML 2020

ICML 2020 is one of the premiere venues in machine learning, and generally features a lot of great work in differentially private machine learning. This year is no exception: the relevant papers are listed below to the best of our ability, including links to the full versions of papers, as well as the conference pages (which contain slides and 15 minute videos for each paper). As always, please inform us if we overlooked any papers on differential privacy.

# Conference Digest - COLT 2020

COLT 2020 was held online in July, and featured nine papers on differential privacy, as well as a keynote talk by Salil Vadhan. While differential privacy has always had a home in the COLT community, it seems like this year was truly exceptional in terms of the number of results. We link all the content below, including pointers to the papers, videos on Youtube, and the page on the conference website. Please let us know if we missed any papers on differential privacy, either in the comments below or by email.

# Why Privacy Needs Composition

We’re back! In our last post we discussed some of the subtle pitfalls of formulating the assumptions underlying average-case relaxations of differential privacy. This time we’re going to look at the composition property of differential privacy—that is, the fact that running two independent differentially private algorithms on your data and combining their outputs is still differentially private. This is a key property of differential privacy and is actually closely related to the worst-case nature of differential privacy.

# Open Problem - Private All-Pairs Distances

Background: Suppose we are interested in computing the distance between two vertices in a graph. Under edge or node differential privacy, this problem is not promising because the removal of a single edge can make distances change from 1 to $$n − 1$$ or can even disconnect the graph. However, a different setting that makes sense to consider is that of a weighted graph $$(G, w)$$ whose topology $$G = (V, E)$$ is publicly known but edge weight function $$w : E \to \mathbb{R}^+$$ must be kept private. (For instance, consider transit times on a road network. The topology of the road network may be publicly available as a map, but the edge weights corresponding to transit times may be based on private GPS locations of individual cars.)

# The Pitfalls of Average-Case Differential Privacy

Differential privacy protects against extremely strong adversaries—even ones who know the entire dataset except for one bit of information about one individual. Since its inception, people have considered ways to relax the definition to assume a more realistic adversary. A natural way to do so is to incorporate some distributional assumptions. That is, rather than considering a worst-case dataset, assume the dataset is drawn from some distribution and provide some form of “average-case” or “Bayesian” privacy guarantee with respect to this distribution. This is especially tempting as it is common for statistical analysis to work under distributional assumptions.

# Conference Digest - STOC 2020

STOC 2020 was recently held online, as one of the first major theory conferences during the COVID-19 era. It featured four papers on differential privacy, which we list and link below. Each one is accompanied by a video from the conference, as well as a longer video if available. Please let us know if we missed any papers on differential privacy, either in the comments below or by email.