Covariance-Aware Private Mean Estimation, Efficiently
Last week, the Mark Fulk award for best student paper at COLT 2023 was awarded to the following two papers on private mean estimation:
- A Fast Algorithm for Adaptive Private Mean Estimation, by John Duchi, Saminul Haque, and Rohith Kuditipudi [DHK23];
- Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions by Gavin Brown, Samuel B. Hopkins, and Adam Smith [BHS23].
The main result of both papers is the same: the first computationally-efficient \(O(d)\)-sample algorithm for differentially-private Gaussian mean estimation in Mahalanobis distance. In this post, we’re going to unpack the result and explain what this means.
Gaussian mean estimation is a classic statistical task: given \(X_1, \dots, X_n \in \mathbb{R}^d\) sampled i.i.d. from a \(d\)-dimensional Gaussian \(N(\mu, \Sigma)\), output an vector \(\hat \mu \in \mathbb{R}^d\) that approximates the true mean \(\mu \in \mathbb{R}^d\). But what do we mean by approximates? What distance measure should we use? A reasonable first guess is the \(\ell_2\)-norm: output an estimate \(\hat \mu\) that minimizes \(\|\hat \mu - \mu\|_2\).
However, we would ideally measure the quality of an estimate in an affine-invariant manner: if the problem instance (i.e., the estimate, the dataset, and the underlying distribution) is shifted and rescaled, then the error should remain unchanged. Affine invariance allows us to perform such transformations of our data and not artificially make the problem easier or harder. This property clearly isn’t satisfied by the \(\ell_2\)-norm: simply scaling the problem down would allow us to report an estimate with arbitrarily low error. In other words, the distance metric needs to be calibrated to the covariance \(\Sigma \in \mathbb{R}^{d \times d}\).
Instead, we consider error measured according to the Mahalanobis distance: output an estimate \(\hat \mu\) that minimizes \(\|\Sigma^{-1/2}(\hat \mu - \mu)\|_2\), where \(\Sigma\) is the (unknown) covariance of the underlying distribution. Note that, if the covariance matrix \(\Sigma = I\), then this reduces to the \(\ell_2\)-distance. Indeed, a valid interpretation of the Mahalanobis distance is to imagine rescaling the problem so that the covariance \(\Sigma\) is mapped to the identity matrix, and measuring \(\ell_2\)-distance after this transformation. A common way to think about Mahalanobis distance operationally is that it necessitates a more accurate estimate in directions with small variance, while permitting more error in directions with large variance.
OK, so how do we learn the mean of a Gaussian in Mahalanobis distance? In the non-private setting, the answer is simple: just take the empirical mean \(\hat \mu = \frac{1}{n} \sum_{i=1}^n X_i\)! It turns out that with \(O(d)\) samples, the empirical mean provides an accurate estimate (in Mahalanobis distance) of the true mean \(\mu\). Note that these guarantees hold regardless of the true covariance matrix \(\Sigma\).
It isn’t quite so easy when we want to do things privately. The most natural way would be add noise to the empirical mean. However, we first have to “clip” the datapoints (i.e., rescale any points that are “too large”) in order to limit the sensitivity of this statistic. This is where the challenges arise: we would ideally like to clip the data based on the shape of the (unknown) covariance matrix \(\Sigma\) [KLSU19]. Deviating significantly from \(\Sigma\) would either introduce bias due to clipping too many points, or add excessive amounts of noise. Unfortunately, the covariance matrix \(\Sigma\) is unknown, and privately estimating it (in an appropriate metric) requires \(\Omega(d^{3/2})\) samples [KMS22]. This is substantially larger than the \(O(d)\) sample complexity of non-private Gaussian mean estimation. Furthermore, this covariance estimation step really is the bottleneck. Given a coarse estimate of \(\Sigma\), only \(O(d)\) additional samples are required to estimate the mean privately in Mahalanobis distance. This leads to the intriguing question: is it possible to privately estimate the mean of a Gaussian without explicitly estimating the covariance matrix?
The answer is yes! A couple years back, Brown, Gaboardi, Smith, Ullman, and Zakynthinou [BGSUZ21] gave two different algorithms for private Gaussian mean estimation in Mahalanobis distance, which both require only \(O(d)\) samples. Interestingly, the two algorithms are quite different from each other. One simply adds noise to the empirical mean based on the empirical covariance matrix. The other one turns to a technique from robust statistics, sampling a point with large Tukey depth using the exponential mechanism. As described here, neither of these methods is differentially private yet – they additionally require a pre-processing step which checks if the dataset is sufficiently well-behaved, which happens with high probability when the data is generated according to a Gaussian distribution. The major drawback of both algorithms: they require exponential time to compute.
The two awarded papers [DHK23] and [BHS23] resolve this issue, giving the first computationally efficient \(O(d)\) sample algorithms for private mean estimation in Mahalanobis distance. Interestingly, the algorithms in both papers follow the same recipe as the first algorithm mentioned above: add noise to the empirical mean based on the empirical covariance matrix. The catch is that the empirical mean and covariance are replaced with stable estimates of the empirical mean and covariance, where stability bounds how much the estimators can change due to modification of individual datapoints. Importantly, these stable estimators are efficient to compute. Further details of these subroutines are beyond the scope of this post, but the final algorithm simply adds noise to the stably-estimated mean based on the stably-estimated covariance. Different extensions of these results are explored in the two papers, including estimation of covariance, and mean estimation in settings where the distribution may be heavy-tailed or rank-deficient.
Most of the algorithms described above are based on some notion of robustness, thus suggesting connections to the mature literature on robust statistics. These connections have been explored as far back as 2009, in foundational work by Dwork and Lei [DL09]. Over the last couple of years, there has been a flurry of renewed interest in links between robustness and privacy, including, e.g., [BKSW19, KSU20, KMV22, LKO22, HKM22, GH22, HKMN23, AKTVZ23, AUZ23], beyond those mentioned above. For example, some works [GH22, HKMN23, AUZ23] show that, under certain conditions, a robust estimator implies a private one, and vice versa. The two awarded papers expand this literature in a somewhat different direction – the type of stability property considered leads to algorithms which qualitatively differ from those considered prior. It will be interesting to see how private and robust estimation evolve together over the next several years.
Congratulations once more to the authors of both awarded papers on their excellent results!
[cite this]