Our data is subject to many different uses. Many entities will have access to our data and those entities will perform many different analyses that involve our data. The greatest risk to privacy is that an attacker will combine multiple pieces of information from the same or different sources and that the combination of these will reveal sensitive details about us.
Thus we cannot study privacy leakage in a vacuum; it is important that we can reason about the accumulated privacy leakage over multiple independent analyses, which is known as composition. We have previously discussed why composition is so important for differential privacy.
This is the first in a series of posts on composition in which we will explain in more detail how compositoin analyses work.
Composition is quantitative. The differential privacy guarantee of the overall system will depend on the number of analyses and the privacy parameters that they each satisfy. The exact relationship between these quantities can be complex. There are various composition theorems that give bounds on the overall parameters in terms of the parameters of the parts of the system.
The simplest composition theorem is what is known as basic composition, which applies to pure \(\varepsilon\)-DP (although it can be extended to approximate \((\varepsilon,\delta)\)-DP):
Theorem (Basic Composition)
Let \(M_1, M_2, \cdots, M_k : \mathcal{X}^n \to \mathcal{Y}\) be randomized algorithms. Suppose \(M_j\) is \(\varepsilon_j\)-DP for each \(j \in [k]\).
Define \(M : \mathcal{X}^n \to \mathcal{Y}^k\) by \(M(x)=(M_1(x),M_2(x),\cdots,M_k(x))\), where each algorithm is run independently. Then \(M\) is \(\varepsilon\)-DP for \(\varepsilon = \sum_{j=1}^k \varepsilon_j\).