Testing the Lipschitz Property over Product Distributions with Applications to Data Privacy

* Final gross prices may vary according to local VAT.

Get Access


In the past few years, the focus of research in the area of statistical data privacy has been in designing algorithms for various problems which satisfy some rigorous notions of privacy. However, not much effort has gone into designing techniques to computationally verify if a given algorithm satisfies some predefined notion of privacy. In this work, we address the following question: Can we design algorithms which tests if a given algorithm satisfies some specific rigorous notion of privacy (e.g., differential privacy)?

We design algorithms to test privacy guarantees of a given algorithm \(\mathcal{A}\) when run on a dataset x containing potentially sensitive information about the individuals. More formally, we design a computationally efficient algorithm \({\cal T}_{priv}\) that verifies whether \(\mathcal{A}\) satisfies differential privacy on typical datasets (DPTD) guarantee in time sublinear in the size of the domain of the datasets. DPTD, a similar notion to generalized differential privacy first proposed by [3], is a distributional relaxation of the popular notion of differential privacy [14].

To design algorithm \({\cal T}_{priv}\) , we show a formal connection between the testing of privacy guarantee for an algorithm and the testing of the Lipschitz property of a related function. More specifically, we show that an efficient algorithm for testing of Lipschitz property can be used as a subroutine in \({\cal T}_{priv}\) that tests if an algorithm satisfies differential privacy on typical datasets.

Apart from formalizing the connection between the testing of privacy guarantee and testing of the Lipschitz property, we generalize the work of [21] to the setting of property testing under product distribution. More precisely, we design an efficient Lipschitz tester for the case where the domain points are drawn from hypercube according to some fixed but unknown product distribution instead of the uniform distribution.