Abstract
Automated decision systems are increasingly used to make consequential decisions in people’s lives. Due to the sensitivity of the manipulated data and the resulting decisions, several ethical concerns need to be addressed for the appropriate use of such technologies, particularly fairness and privacy. Unlike previous work, which focused on centralized differential privacy (DP) or on local DP (LDP) for a single sensitive attribute, in this paper, we examine the impact of LDP in the presence of several sensitive attributes (i.e., multi-dimensional data) on fairness. Detailed empirical analysis on synthetic and benchmark datasets revealed very relevant observations. In particular, (1) multi-dimensional LDP is an efficient approach to reduce disparity, (2) the variant of the multi-dimensional approach of LDP (we employ two variants) matters only at low privacy guarantees (high \(\epsilon\)), and (3) the true decision distribution has an important effect on which group is more sensitive to the obfuscation. Last, we summarize our findings in the form of recommendations to guide practitioners in adopting effective privacy-preserving practices while maintaining fairness and utility in machine learning applications.
Similar content being viewed by others
Notes
In this paper, we use the term protected to designate sensitive attributes from a fairness perspective and the term sensitive to designate sensitive attributes from a privacy perspective.
In the rest of the paper we use attribute and variable interchangeably.
True positive rate = \(\frac{TP}{TP+FN}\)
False positive rate = \(\frac{FP}{FP+TN}\)
Positive predictive values = \(\frac{TP}{TP+FP}\)
C and A follow Binomial distributions while M follows Multinomial distribution.
The 50K threshold is used in the well-known Adult dataset mostly used in the literature (Dua 2017).
As this observation is about the accuracy, only the last two fairness metrics are concerned, that is, OAD and PRD corresponding to the two lower rows of Fig. 5.
Note that this observation is also confirmed in the Compas dataset (Fig. 8) but inverted since the privileged group in this dataset is the group \(A=0\). We generated a second synthetic dataset where the group \(A=0\) is privileged to confirm the inversed behavior. The plots can be found in Appendix Appendix A.1.
Again, the behavior is reversed for the Compas dataset (Fig. 8) for the same reason as the previous observation.
A back-door path is a path between A and Y with an edge into A.
References
Alves G, Bernier F, Couceiro M, Makhlouf K, Palamidessi C, Zhioua S (2022) Survey on fairness notions and related tensions. arXiv preprint arXiv:2209.13012
Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias. propublica. See https://www. propublica. org/article/machine-bias-risk-assessments-in-criminal-sentencing
Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2021) Random sampling plus fake data: multidimensional frequency estimates with local differential privacy. In: Proceedings of the 30th ACM international conference on information & knowledge management, CIKM ’21, New York, NY, USA . Association for Computing Machinery, pp 47–57
Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2022) Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates. Digit Commun Netw
Arcolezi HH, Couchot JF, Gambs S, Palamidessi C, Zolfaghari M (2022) Multi-Freq-LDPy: multiple frequency estimation under local differential privacy in python. In: Atluri V, Di Pietro R, Jensen CD, Meng W (eds) Computer Security—ESORICS 2022. Springer Nature Switzerland, Cham, pp 770–775
Arcolezi HH, Makhlouf K, Palamidessi C (2023) (Local) differential privacy has NO disparate impact on fairness. In: Data and applications security and privacy XXXVII. Springer Nature Switzerland, pp 3–21
Bagdasaryan E, Poursaeed O, Shmatikov V (2019) Differential privacy has disparate impact on model accuracy. Adv Neural Inf Process Syst 32
Barocas S, Hardt M, Narayanan A (2019) Fairness and machine learning. http://www.fairmlbook.org
Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2021) Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1):3–44
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chang H, Shokri R (2021) On the privacy risks of algorithmic fairness. In 2021 IEEE European symposium on security and privacy (EuroS &P). IEEE, pp 292–303
Chen C, Liang Y, Xu X, Xie S, Kundu A, Payani A, Hong Y, Shu K (2022) When fairness meets privacy: Fair classification with semi-private sensitive attributes. In: Workshop on trustworthy and socially responsible machine learning, NeurIPS 2022
Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2):153–163
Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806
da Costa Filho JS, Machado JC (2023) FELIP: a local differentially private approach to frequency estimation on multidimensional datasets. In: Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 671–683. OpenProceedings.org
de Oliveira AS, Kaplan C, Mallat K, Chakraborty T (2023) An empirical analysis of fairness notions under differential privacy. arXiv preprint arXiv:2302.02910
Differential Privacy Team Apple (2017) Learning with privacy at scale
Ding F, Hardt M, Miller J, Schmidt L (2021) Retiring adult: new datasets for fair machine learning. Adv Neural Inf Process Syst 34:6478–6490
Domingo-Ferrer J, Soria-Comas J (2022) Multi-dimensional randomized response. IEEE Trans Knowl Data Eng 34(10):4933–4946
Dua D, Graff C (2017) UCI machine learning repository
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Reingold O (ed) Theory of cryptography. Springer, Berlin, pp 265–284
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226
Dwork C, Roth A (2014) et al. The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407
Erlingsson Ú, Pihur V, Korolova A (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 1054–1067
Farrand T, Mireshghallah F, Singh S, Trask A (2020) Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 workshop on privacy-preserving machine learning in practice, pp 15–19
Ficiu B, Lawrence ND, Paleyes A (2023) Automated discovery of trade-off between utility, privacy and fairness in machine learning models. arXiv preprint arXiv:2311.15691
Fioretto F, Tran C, Van Hentenryck P, Zhu K (2022) Differential privacy and fairness in decisions and learning tasks: a survey. arXiv preprint arXiv:2202.08187
Ganev G, Oprisanu B, De Cristofaro E (2022) Robin hood and Matthew effects: differential privacy has disparate impact on synthetic data. In: International conference on machine learning. PMLR, pp 6944–6959
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29:3315–3323
Impact ldp on fairness repository (2023). https://github.com/KarimaMakhlouf/Impact_of_LDP_on_Fairness
Jagielski M, Kearns M, Mao J, Oprea A, Roth A, Sharifi-Malvajerdi S, Ullman J (2019) Differentially private fair learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, 09–15 Jun, pp 3000–3008
Kairouz P, Bonawitz K, Ramage D (2016) Discrete distribution estimation under local privacy. In: International conference on machine learning. PMLR, pp 2436–2444
Kasiviswanathan SP, Lee HK, Nissim K (2011) What can we learn privately? SIAM J. Comput. 40(3):793–826
Kikuchi H (2022) Castell: scalable joint probability estimation of multi-dimensional data randomized with local differential privacy. arXiv preprint arXiv:2212.01627
Liu G, Tang P, Hu C, Jin C, Guo S (2023) Multi-dimensional data publishing with local differential privacy. In Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 183–194. OpenProceedings.org
Makhlouf K, Zhioua S, Palamidessi C (2021) Machine learning fairness notions: bridging the gap with real-world applications. Inf. Process. Manag. 58(5):102642
Makhlouf K, Zhioua S, Palamidessi C (2021) On the applicability of machine learning fairness notions. 23(1):14–23
Makhlouf K, Zhioua S, Palamidessi C (2022) Identifiability of causal-based ml fairness notions. In: 2022 14th international conference on computational intelligence and communication networks (CICN), pp 1–8
Mangold P, Perrot M, Bellet A, Tommasi M (2023) Differential privacy has bounded impact on fairness in classification. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds) Proceedings of the 40th international conference on machine learning, volume 202 of proceedings of machine learning research. PMLR, 23–29, pp 23681–23705
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):1–35
Mitchell S, Potash E, Barocas S, D’Amour A, Lum K (2021) Algorithmic fairness: choices, assumptions, and definitions. Ann Rev Stat Appl 8:141–163
Mozannar H, Ohannessian M, Srebro N (2020) Fair learning with private demographic data. In: International conference on machine learning. PMLR, pp 7066–7075
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12:2825–2830
Ren X, Yu CM, Yu W, Yang S, Yang X, McCann JA, Philip SY (2018) LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forens. Secur. 13(9):2151–2166
Tran C, Fioretto F, Van Hentenryck P (2021) Differentially private and fair deep learning: a Lagrangian dual approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 11, pp 9932–9939
Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 IEEE/ACM international workshop on software fairness (FairWare). IEEE, pp 1–7
Wang T, Blocki J, Li N, Jha S (2017) Locally differentially private protocols for frequency estimation. In: 26th USENIX security symposium (USENIX Security 17). USENIX Association, Vancouver, BC, pp 729–745
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309):63–69
Xu D, Yuan S, Wu X (2019) Achieving differential privacy and fairness in logistic regression. In: Companion proceedings of The 2019 world wide web conference, pp 594–599
Acknowledgements
This work is supported by the Cybersecurity Center research grant number PCC-Grant-202229 at Prince Mohammad Bin Fahd University (PMU). The work of Catuscia Palamidessi, Karima Makhlouf and Sami Zhioua is also partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement 835294). The work of Héber H. Arcolezi is supported by the “ANR 22-PECY-0002” IPOP (Interdisciplinary Project on Privacy) project of the Cybersecurity PEPR and by MIAI @ Grenoble Alpes (“ANR-19-P3IA-0003”).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Panagiotis Papapetrou.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A.1 Results of the synthetic dataset 2
The synthetic dataset 2 follows the exact same causal model depicted in Fig. 2. The data distribution is the only difference between the Synthetic datasets 1 and 2. More specifically, synthetic data 2 differs from synthetic data 1 solely by Y distribution.
See Fig. 6.
1.2 A. 2 Results of the synthetic dataset 1 and the Compas datasets for Sect. 4.3
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Makhlouf, K., Arcolezi, H.H., Zhioua, S. et al. On the impact of multi-dimensional local differential privacy on fairness. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01031-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10618-024-01031-0