One-Sided Differential Privacy
Differential privacy is defined in terms of pairs of neighboring datasets. That is, \(M\) is \((\varepsilon,\delta)\)-differentially private if, for all measurable events \(T\) and all neighboring pairs of datasets \(x,x’\), we have \[\mathbb{P}[M(x) \in T] \le e^\varepsilon \mathbb{P}[M(x’) \in T] + \delta.\] There are three common ways to define neighors: (i) \(x\) and \(x’\) differ only by the replacement of one person’s data, (ii) \(x\) and \(x’\) differ only by the addition or removal of one person’s data, or (iii) \(x\) and \(x’\) differ only by the addition, removal, or replacement of one person’s data. All three of these are symmetric in the sense that we can swap \(x\) and \(x’\). This symmetry is nice mathematically, but maybe you’ve wondered what happens if we make it asymmetric; if so, read on.
Since replacement is inherently symmetric, we will focus on the asymmetry of addition or removal. Let’s write out the two definitions of one-sided differential privacy that we will work with:
Definition 1 (Addition-only Differential Privacy). A randomized algorithm \(M\) is \((\varepsilon,\delta)\)-differentially private for addition if, for all measurable events \(T\) and all pairs of datasets \(x,x’\) such that \(x\) is formed by adding one element to \(x’\), we have \[\mathbb{P}[M(x) \in T] \le e^\varepsilon \mathbb{P}[M(x’) \in T] + \delta.\]
Definition 2 (Removal-only Differential Privacy). A randomized algorithm \(M\) is \((\varepsilon,\delta)\)-differentially private for removal if, for all measurable events \(T\) and all pairs of datasets \(x,x’\) such that \(x’\) is formed by removing one element from \(x\), we have \[\mathbb{P}[M(x’) \in T] \le e^\varepsilon \mathbb{P}[M(x) \in T] + \delta.\]
To make this post less confusing we will stick to the notational convention that \(x’ \subseteq x\).
Interpreting the Asymmetric Guarantees
Before getting technical, let’s try to get some intuition for how these two definitions differ. I generally think of the promise of differential privacy as the following \[\mathbb{P}\left[\text{bad event happens} \atop \text{in the real world}\right] \le e^\varepsilon \mathbb{P}\left[\text{bad event happens in a} \atop \text{hypothetical ideal world}\right] + \delta. \tag{1}\] Thinking of it like this hopefully gives some sense for how the two datasets in the definition have differing roles – the real world is the real world and the hypothetical ideal world is a counterfactual that we devise. From a privacy standpoint, your ideal world is usually one where your data has been deleted. In other words, we typically want something like \[\mathbb{P}\left[\text{bad event happens} \atop \text{in the real world}\right] \le e^\varepsilon \mathbb{P}\left[\text{bad event happens} \atop \text{without access to your data}\right] + \delta. \tag{2}\] This corresponds to to addition-only differential privacy (Definition 1).
OK, hopefully, I’ve convinced you that Definition 1 is what we really want. Is there any reason to work with Definition 2? I can think of two reasons:
-
Suppose the dataset is a collection of people who are deemed to be wise and pure of heart. Perhaps you have been omitted from this dataset (for whatever reason) and you want to be part of this dataset. In this case, your ideal world is the one where your data is inserted, and Definition 2 is what you want.
-
The other reason is that I’ve implicitly assumed that bad events are unlikely. Equation 1 can become vacuous, say, if \(\varepsilon=1\) and \(\mathbb{P}[\text{bad event happens in a hypothetical ideal world}]\ge 0.5\).
In such a case, we can apply Definition 2 to the good event (\(\notin T\)) instead of the bad event (\(\in T\)): \[\mathbb{P}[M(x’) \notin T] \le e^\varepsilon \mathbb{P}[M(x) \notin T] + \delta,\] which rearranges to \[\mathbb{P}[M(x) \in T] \le e^{-\varepsilon} \mathbb{P}[M(x’) \in T] + 1 - e^{-\varepsilon} + e^{-\varepsilon} \delta .\] This guarantee is of the form of Definition 1. It’s weird, but it’s useful when your bad event in the hypothetical ideal world (\(M(x’) \in T\)) is actually quite likely.
This calculation shows that \((\varepsilon,\delta)\)-differential privacy for removal implies \((-\varepsilon,1-e^{-\varepsilon}+e^{-\varepsilon}\delta)\)-differential privacy for addition (and vice versa). In other words, the difference between the two definitions is, in some sense, just a matter of differing parameter regimes.
These two reasons are a little contrived. I’d still argue that Definition 1 is more natural, but Definition 2 has its place.
One-Sided Laplace Noise
OK, so far we’ve stated the definitions and tried to give them some intuitive meaning. What can we actually do with these definitions?
Let’s suppose we have a counting query \(q\) (or, more generally, a monotone low-sensitivity query). Under standard two-sided differential privacy, we can answer the query with Laplace noise addition. What can we do under one-sided differential privacy? Of course, we can still use Laplace noise, but can we do something with better accuracy under this weaker privacy notion?
It turns out we can use one-sided Laplace noise – more commonly known as Exponential noise – to get one-sided differential privacy:
Proposition 3 (One-sided Noise for Addition-only Differential Privacy). Let \(q : \mathcal{X}^* \to \mathbb{R}\) be monotone (i.e., \(x’ \subseteq x \implies q(x’) \le q(x)\)) and have sensitivity 1 (i.e. \(q(x)-q(x’) \le |x \setminus x’| \)). Let \(\xi\) denote a standard exponential random variable (i.e., \(\forall y \in \mathbb{R} ~ ~ \mathbb{P}[\xi \ge y]=\min\{1,\exp(-y)\}\)).
Define \(M : \mathcal{X}^* \to \mathbb{R}\) by \(M(x) = q(x) + \frac{1}{\varepsilon}\xi\). Then \(M\) satisfies \((\varepsilon,0)\)-differential privacy for addition (Definition 1).
Proof. Let \(x’ \subseteq x\) be inputs differing by the addition of one element – i.e. \(|x \setminus x’|=1\). And let \(T \subseteq \mathbb{R}\) be measurable. By assumption, \(q(x)-1 \le q(x’) \le q(x)\). Now \[\mathbb{P}[M(x) \in T] = \mathbb{P}[q(x) + \frac{1}{\varepsilon} \xi \in T] ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ \] \[ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ = \int_T e^{-\varepsilon (y-q(x))} \varepsilon \mathbb{I}[y \ge q(x)]\mathrm{d}y\] \[ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ \le \int_T e^{-\varepsilon (y-q(x’)-1)} \varepsilon \mathbb{I}[y \ge q(x’)]\mathrm{d}y\] \[ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ = e^\varepsilon \int_T e^{-\varepsilon (y-q(x’))} \varepsilon \mathbb{I}[y \ge q(x’)]\mathrm{d}y\] \[ ~ ~ ~ = e^\varepsilon \mathbb{P}[M(x’) \in T] .\] Here we used the fact that the shifted and scaled exponential \(\mu + \lambda \xi\) (with \(\lambda>0\)) has probability density \(e^{-\lambda(y-\mu)}\lambda\mathbb{I}[y \ge \mu]\) at \(y\) to write the probabilities as integrals. And we used \(q(x)-1 \le q(x’) \le q(x)\) to obtain the bounds \(e^{-\varepsilon (y-q(x))} \le e^{-\varepsilon (y-q(x’)-1)}\) and \(\mathbb{I}[y \ge q(x)] \le \mathbb{I}[y \ge q(x’)]\). ∎
We can also do the other side:
Proposition 4 (One-sided Noise for Removal-only Differential Privacy). Let \(q : \mathcal{X}^* \to \mathbb{R}\) be monotone and have sensitivity 1. Let \(\xi\) denote a standard exponential random variable.
Define \(M : \mathcal{X}^* \to \mathbb{R}\) by \(M(x) = q(x) - \frac{1}{\varepsilon}\xi\). Then \(M\) satisfies \((\varepsilon,0)\)-differential privacy for removal (Definition 2).
The proof is symmetric. A few remarks are in order:
-
If \(M\) satisfies both addition-only and removal-only differential privacy then \(M\) satisfies regular two-sided differential privacy. In particular, if we define \(M(x) = q(x) + \frac{1}{\varepsilon}\xi - \frac{1}{\varepsilon}\xi’\) where \(\xi\) and \(\xi’\) are independent standard exponential random variables, then \(M\) satisfies regular two-sided differential privacy. And – surprise, surprise – this is identical to the regular Laplace mechanism; \(\xi-\xi’\) follows a standard Laplace distribution.
Thus decomposing the differential privacy guarantee into addition- and removal-only parts corresponds exactly to decomposing the Laplace noise into positive and negative parts!
-
Exponential noise has half the variance of Laplace noise. Thus relaxing to one-sided differential privacy saves you a factor of two in the variance, relative to two-sided differential privacy. That’s a significant win.
-
The exponential mechani—er, the mechanism in Propositions 3 and 4 is biased. This can be fixed. Namely, \(M(x) = q(x) \pm \frac{1}{\varepsilon}(\xi-1)\) would be unbiased and would satisfy the same privacy guarantee.
-
We can extend Propositions 3 and 4 to histograms. I.e., a collection of counts where each person contributes to only one count. Histograms demonstrate that positive and negative noise are fundamentally different. Namely, suppose the true count is \(q(x)=0\), then negative noise results in a negative number, but since counts can’t be negative, we’ll just round this back to 0. So negative noise will not change zero counts. On the other hand, positive noise will take zero counts and increase them. Histograms are often sparse (i.e., most counts are zero), so this is a significant difference.
Subsampling
We’ve posted about privacy amplification by subsampling before. Let’s examine it under one-sided differential privacy.
For addition-only differential privacy, nothing much changes compared to regular two-sided differential privacy: If we start with \((\varepsilon,\delta)\)-differential privacy for addition and combine it with Poisson subsampling with probability \(p\), then we get \((\varepsilon’,p\delta)\)-differential privacy for addition with \(\varepsilon’ = \log(1+p(\exp(\varepsilon)-1))\). And the proof is the same as for two-sided differential privacy.
However, for removal-only differential privacy, strange things happen:
Theorem 5 (Subsampling is Removal-only Differentially Private). For \(p \in [0,1)\), define \(S_p : \mathcal{X}^* \to \mathcal{X}^*\) to be the Poisson random subsampling procedure that includes each element independently with probability \(p\) – i.e., \(\forall x \in \mathcal{X}^* ~ S_p(x) \subseteq x\) and, independently for all \(x_i \in x\), we have \(\mathbb{P}[x_i \in S_p(x)]=p\). Then \(S_p\) is \((\varepsilon,0)\)-differentially private for removal (Definition 2) for \(\varepsilon=-\log(1-p)\).
Proof.
Let \(x’ \subseteq x\) be inputs differing by the removal of one element – i.e. \(x’ = x \setminus \{x_i\}\).
And let \(T \subseteq \mathcal{X}^*\) be measurable.
We need to show that \[\mathbb{P}[S_p(x’) \in T] \le e^\varepsilon \mathbb{P}[S_p(x) \in T].\]
By independence, the distribution of \(S_p(x)\) conditioned on \(x_i \notin S_p(x)\) is equivalent to the distribution of \(S_p(x \setminus \{x_i\})\), which is simply \(S_p(x’)\).
Similarly, the distribution of \(S_p(x)\) conditioned on \(x_i \in S_p(x)\) is equivalent to the distribution of \(S_p(x’) \cup \{x_i\}\).
Now we can decompose:
\(\mathbb{P}[S_p(x) \in T] = \mathbb{P}[S_p(x) \in T \mid x_i \in S_p(x)] \mathbb{P}[x_i \in S_p(x)] \)
\( ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + \mathbb{P}[S_p(x) \in T \mid x_i \notin S_p(x)] \mathbb{P}[x_i \notin S_p(x)]\)
\( ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ = \mathbb{P}[S_p(x’) \cup \{x_i\} \in T] p + \mathbb{P}[S_p(x’) \in T] (1-p)\)
\( ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ \ge \mathbb{P}[S_p(x’) \in T] e^{-\varepsilon},\)
which rearranges to give the desired result.
∎
Note that \(\varepsilon=-\log(1-p)\) is equivalent to \(p = 1 - e^{-\varepsilon}\). And for small values we have \(\varepsilon \approx p\).
OK, what do we make of this? Combined with a differentially private algorithm, subsampling does amplify privacy, but this is saying that subsampling on its own provides removal-only differential privacy. This algorithm outputs a \(p\)-fraction of the dataset in the clear; intuitively this should not be private. I interpret this result as telling us that something is fishy about removal-only differential privacy; and this connects back to the interpretation discussion earlier.
From Addition-only DP to Two-sided DP
Regular two-sided differential privacy is the conjunction of addition- and removal-only differential privacy. And we saw with the one-sided Laplace noise that we can combine two one-sided differentially private algorithms to get a two-sided differentially private algorithm. Now we’re going to turn this into a generic recipe, where Theorem 5 provides the generic removal-only half of the recipe.
Theorem 6 (Addition-only DP + Subsampling = Two-sided DP). [GKKMS24, Lemma 3.1]
Let \(M : \mathcal{X}^* \to \mathcal{Y}\) satisfy \((\varepsilon,\delta)\)-differential privacy for addition (Definition 1). For \(p \in [0,1)\), define \(S_p : \mathcal{X}^* \to \mathcal{X}^*\) to be the Poisson random subsampling procedure that includes each element independently with probability \(p\).
Define \(\widetilde{M} : \mathcal{X}^* \to \mathcal{Y}\) by \(\widetilde{M}(x) = M(S_p(x)))\). Then \(\widetilde{M}\) satisfies \((\widetilde{\varepsilon},p\delta)\)-differential privacy (for addition and removal) for \[\widetilde{\varepsilon} = \max\{-\log(1-p), \log(1+p(\exp(\varepsilon)-1))\}.\]
Proof Sketch. Let \(x’ \subseteq x\) be inputs differing by the addition or removal of one element – i.e. \(|x \setminus x’|=1\). And let \(T \subseteq \mathcal{Y}\) be measurable. We have to prove two directions.
First, the removal direction: \[\mathbb{P}[\widetilde{M}(x’) \in T] \le \frac{1}{1-p} \mathbb{P}[\widetilde{M}(x) \in T] \le e^{\widetilde{\varepsilon}} \mathbb{P}[\widetilde{M}(x) \in T]\] follows from Theorem 5 and the fact that \(\widetilde{M}\) is a postprocessing of \(S_p\).
Second, the addition direction \[\mathbb{P}[\widetilde{M}(x) \in T] \le (1-p + pe^\varepsilon) \mathbb{P}[\widetilde{M}(x’) \in T] + p\delta\] follows from the addition-only differential privacy of the base algorithm \(M\) and privacy amplification by subsampling. ∎
This is an interesting result – it provides a new way to construct differentially private algorithms. In particular, combining Theorem 6 with Proposition 3 shows that subsampling and adding exponential noise satisfies differential privacy. To be precise, if \(q : \mathcal{X}^* \to \mathbb{R}\) is monotone and sensitivity 1, then, for \(p=1-e^{-\varepsilon}\), \(\lambda = \frac{1}{\log\left(1 + e^\varepsilon\right)},\) and \(\xi\) a standard exponential random variable, \[\widetilde{M}(x) = q(S_p(x)) + \lambda \xi \tag{3}\] defines a \((\varepsilon,0)\)-differentially private algorithm (for addition and removal).1
We can also view this as a negative result. Propositions 3 and 4 showed that we can reduce the variance of count queries by a factor of two by relaxing from standard two-sided differential privacy to one-sided differential privacy. How much more gain could we get? Theorem 6 shows that we cannot improve accuracy by too much via a relaxed one-sided privacy definition, since any algorithm satisfying the relaxed addition-only definition can be converted into one that satisfies the two-sided definition.
Conclusion
In this post we explored asymmetric one-sided differential privacy. We proved a few results, but the real takeaways are the insights we made along the way! By separating the two sides of the standard two-sided differential privacy guarantee, we gain a deeper understanding of what is going on in the definition.
We saw both at the intuitive and formal level that addition-only differential privacy (Definition 1) is a more meaningful guarantee than removal-only differential privacy (Definition 2). Indeed, the removal side can be tacked on relatively easily via subsampling (Theorem 6).
Should we be using one-sided differential privacy guarantees in practice? We definitely shouldn’t rely on removal-only differential privacy. But addition-only differential privacy seems meaningful, so perhaps its worthwhile if it lets us improve constants. However, in my humble opinion, the added complexity of one-sided differential privacy is probably not worthwhile. It’s more of a curiosity, or perhaps an intermediate analytical tool.2
Finally, we note that there are a couple of papers that have explored one-sided differential privacy:
- Ghazi, Kamath, Kumar, Manurangsi, and Sealfon [GKKMS24] used addition-only differential privacy as an intermediate analytical tool; in particular, they proved Theorem 6 (and used it).
- Kotsogiannis, Doudalis, Haney, Machanavajjhala, and Mehrotra [KDHMM20] define one-sided differential privacy differently than we do. Roughly, they define one-sided differential privacy in terms of replacement (rather than addition/removal); to make this asymmetric they classify records as either sensitive or non-sensitive and they allow replacement of a sensitive record with a non-sensitive record. They analyzed one-sided Laplace noise in this setting (cf. Proposition 3).
- Takagi, Kato, Cao, and Yoshikawa [TCY22] define asymmetric differential privacy in terms of replacement similarly to Kotsogiannis et al. and also give an analysis of one-sided Laplace noise.
- Chen, Hu, Zhuang, Zhao, and Yu [CHZZY25] define one-sided personalized differential privacy using the same replacement formalism as Kotsogiannis et al., but with an added twist where each person has their own privacy parameter \(\varepsilon\).
-
There exist settings where the algorithm in Equation 3 achieves lower variance than the Laplace mechanism; working out this parameter regime is left as an exercise for the reader. ↩
-
Some speculative examples where addition-only differential privacy might be a useful tool: There are some situations – e.g., the shuffle model – where it is easier to add noise than to subtract noise. In such cases addition-only differential privacy may be a useful tool for designing and analyzing algorithms. Another example is when using differential privacy to prove generalization guarantees. ↩
[cite this]