Open Problem: Selection via Low-Sensitivity Queries
Two of the basic tools for building differentially private algorithms are noise addition for answering low-sensitivity queries and the exponential mechanism for selection. Could we do away with the exponential mechanism and simply use low-sensitivity queries to perform selection?
Formal Problem Statement
Recall that the exponential mechanism is a differentially private algorithm that takes a private dataset \(x \in \mathcal{X}^n\) and a public loss function \(\ell : \mathcal{X}^n \times \mathcal{Y} \to \mathbb{R}\) and returns \(Y \in \mathcal{Y}\) such that \(\mathbb{E}_Y[\ell(x, Y)] \le \min_{y\in\mathcal{Y}} \ell(x, y) + O(\frac{1}{\varepsilon} \log |\mathcal{Y}|)\), where \(\varepsilon\) is the differential privacy parameter. (We will suppress the privacy parameter for simplicity.) The question is whether we can replace the exponential mechanism with an algorithm that is based only on adding noise to low-sensitivity queries.
Problem 1: There is a (private) dataset \(x \in \mathcal{X}^n\) and a (public) loss function \(\ell : \mathcal{X}^n \times \mathcal{Y} \to \mathbb{R}\) that has sensitivity-\(1\) in its first argument. That is, for all \(x,x’ \in \mathcal{X}^n\) differing in a single entry and all \(y \in \mathcal{Y}\), we have \(|\ell(x,y) − \ell(x’,y)| \le 1\).
The goal is to construct an algorithm that outputs \(Y \in \mathcal{Y}\) such that \[\mathbb{E}_Y[\ell(x, Y)] \le \min_{y\in\mathcal{Y}} \ell(x, y) + O(\log |\mathcal{Y}|).\tag{1}\] However, the algorithm cannot access \(x\) directly. Instead there is an oracle which provides noisy answers to \(k\) sensitivity-\(1\) queries. Specifically, each query is specified by a sensitivity-\(1\) function \(q : \mathcal{X}^n \to \mathbb{R}\), which is submitted to the oracle, and the oracle returns a sample from \(\mathcal{N}(q(x),k)\). The number of queries \(k\) may be chosen arbitrarily and the queries may be specified adaptively.
Obviously, if the algorithm has direct access to \(x\) of if the oracle didn’t add any noise, this problem would be trivial (just query \(q(x)=\ell(x,y)\) for all \(y\in\mathcal{Y}\) – i.e., \(k=|\mathcal{Y}|\) queries – and output the minimum).
The noise added by the oracle ensures that the algorithm is differentially private. Thus the goal of this algorithm is directly comparable with the guarantee of the exponential mechanism.
Partial Solution
As a starting point, the following binary-tree-based algorithm attains expected excess loss \(O((\log|\mathcal{Y}|)^{3/2})\) instead of the desired \(O(\log|\mathcal{Y}|)\).
Construct a complete binary tree with the leaves corresponding to the elements of \(\mathcal{Y}\). Walk down the tree from the root to a leaf as follows and output the leaf’s element. At each node, query the oracle with \[q(x) = \frac{1}{2} \big(\min_{y\in\text{ left subtree}} \ell(x, y)\big) − \frac{1}{2} \big(\min_{y\in\text{ right subtree}} \ell(x, y)\big).\tag{2}\] If the oracle’s answer is positive, move to the right child; otherwise, move left.
The number of queries for this algorithm is \(k=\lceil \log_2|\mathcal{Y}| \rceil\). And it’s easy to check that Equation 2 has sensitivity-\(1\).
For the utility analysis, let \(A_1,A_2,\cdots,A_k\) denote the nodes on the path from the root \(A_1\) to the leaf \(A_k\) that we output. We track the minimum loss on the subtree rooted at the current node – i.e., \(B_i := \min_{y \text{ in subtree rooted at } A_i} \ell(x,y)\).
Initially, we have \(B_1 = \min_{y\in\mathcal{Y}} \ell(x, y) \), which is the desired quantity. And \(B_k\) is the loss of the final output. We also have \(B_1 \le B_2 \le \cdots \le B_k\), since each successive subtress is a subset of the previous one. To complete the analysis we need only show that \(\mathbb{E}[B_{i+1}] \le B_i + O(\sqrt{\log|\mathcal{Y}|})\) for all \(i\).
If the (noiseless) value of the query in Equation 2 is positive, then the minimizer is in the right subtree and vice versa. If at step \(i\) the algorithm chooses the “correct” child, then \(B_{i+1}=B_i\). But, if the algorithm chooses the “incorrect” child, we have \(B_{i+1} = B_i + 2|q_i(x)|\), where \(q_i\) is the query (given in Equation 2) that was asked to the oracle in step \(i\).
What is the probability of choosing the wrong child? Well, it’s the probability that the noise added to \(q_i(x)\) flips the sign – i.e., \(\mathbb{P}[\mathsf{sign}(\mathcal{N}(q_i(x),k)) \ne \mathsf{sign}(q_i(x))]\). Putting these together and doing a bit of algebraic manipulation, we have \[ \mathbb{E}[B_{i+1}] = B_i + 2|q_i(x)| \cdot \mathbb{P}[\mathcal{N}(0,k)\ge|q_i(x)|] \]\[ ~~~~~~~~~~~~~~~ \le B_i + 2\sqrt{k} \cdot \max_{v \ge 0} v \cdot \mathbb{P}[\mathcal{N}(0,1) \ge v].\tag{3}\] The quantity \(\max_{v \ge 0} v \cdot \mathbb{P}[\mathcal{N}(0,1) \ge v] \in [0.169,0.17]\) is a constant (attained at \(v \approx 0.75\)). Thus we have \(\mathbb{E}[B_k] \le 0.34 k^{3/2} = O((\log|\mathcal{Y}|)^{3/2})\).
Who Cares?
Oh man, tough crowd. I care. But it’s a fair question – why is this open problem interesting?
A positive solution to this open problem would demonstrate the power of low-sensitivity queries and illustrate how almost all differentially private tasks can be boiled down to noise addition. Note that the reverse reduction is trivial: We can use the exponential mechanism to answer low-sensitivity queries. Namely, we can set \(\ell(x,y)=|q(x)-y|\). Thus a positive solution to this problem would show an equivalence between selection and low-sensitivity queries.
In practice, the exponential mechanism works fine, so we don’t really need this algorithm. Nevertheless, I think this could lead to something insightful, and maybe even useful: There are situations where we can do better than the exponental mechanism or at least better than the standard analysis of the exponential mechanism. An alternative algorithm might open up more avenues for improving on the exponential mechanism.
To give some examples where we know how to beat the standard analysis of the exponential mechanism: First, suppose the loss function can be decomposed as \[\ell(x,y) = \ell(x,(y_1,y_2,\cdots,y_d)) = \ell_1(x,y_1) + \ell_2(x,y_2) + \cdots + \ell_d(x,y_d). \tag{4}\] Then the analysis of the exponential mechanism can also be decomposed into the composition of \(d\) independent exponential mechanisms, which yields better asymptotic results via the advanced composition theorem. A second example is when there is one option \(y_* \in \mathcal{Y}\) that stands out from the other options – i.e., \(\ell(x,y_*) \le \min_{y \in \mathcal{Y} \setminus {y_*}} \ell(x,y) - c\), where \(c\) is sufficiently large. In this case we can privately output \(y_*\) with an improved dependence on the number of options \(|\mathcal{Y}|\) [CHS14,BDRS18,BKSW19].
A negative solution – that is, an impossibility result – would show that selection is a fundamental and indivisible primitive of differentially private algorithms. This would be surprising and thus interesting. The proof technique would presumably also be novel.
Remarks
This open problem was first published in 2019. (And I asked a related question back in 2017.) I’m reposting it because, well, it’s still open. (There are a few other open problems in the 2019 list, although some have been solved by now.)
Problem 1 is stated in terms of Gaussian noise addition (and implicitly performs optimal/advanced composition). The problem also makes sense with Laplace noise addition (and basic composition). Let’s state that formally:
Problem 2: There is a (private) dataset \(x \in \mathcal{X}^n\) and a (public) loss function \(\ell : \mathcal{X}^n \times \mathcal{Y} \to \mathbb{R}\) that has sensitivity-\(1\) in its first argument. That is, for all \(x,x’ \in \mathcal{X}^n\) differing in a single entry and all \(y \in \mathcal{Y}\), we have \(|\ell(x,y) − \ell(x’,y)| \le 1\).
The goal is to construct an algorithm that outputs \(Y \in \mathcal{Y}\) such that \[\mathbb{E}_Y[\ell(x, Y)] \le \min_{y\in\mathcal{Y}} \ell(x, y) + O(\log |\mathcal{Y}|).\tag{5}\] However, the algorithm cannot access \(x\) directly. Instead there is an oracle which provides noisy answers to \(k\) sensitivity-\(1\) queries. Specifically, each query is specified by a sensitivity-\(1\) function \(q : \mathcal{X}^n \to \mathbb{R}\), which is submitted to the oracle, and the oracle returns a sample from \(q(x)+\mathsf{Lap}(k)\). The number of queries \(k\) may be chosen arbitrarily and the queries may be specified adaptively.
For pure DP, the binary tree algorithm would achieve excess loss \(O((\log |\mathcal{Y}|)^2)\) instead of \(O(\log |\mathcal{Y}|)\).
The contrast between pure DP and Gaussian DP is interesting because the exponential mechanism satisfies pure DP and relaxing to approximate DP doesn’t allow us to do any better. But, comparing Problem 1 and Problem 2, it seems like the Gaussian case should be easier. I can’t quite put my finger on it, but I feel like there’s something interesting to say about this distinction and I hope resolving this open problem would shed light on it.
[cite this]