On the impact of multi-dimensional local differential privacy on fairness

Makhlouf, Karima; Arcolezi, Héber H.; Zhioua, Sami; Brahim, Ghassen Ben; Palamidessi, Catuscia

doi:10.1007/s10618-024-01031-0

On the impact of multi-dimensional local differential privacy on fairness

Published: 27 May 2024

(2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Karima Makhlouf^1,3,
Héber H. Arcolezi²,
Sami Zhioua³,
Ghassen Ben Brahim⁴ &
…
Catuscia Palamidessi^1,3

56 Accesses
2 Altmetric
Explore all metrics

Abstract

Automated decision systems are increasingly used to make consequential decisions in people’s lives. Due to the sensitivity of the manipulated data and the resulting decisions, several ethical concerns need to be addressed for the appropriate use of such technologies, particularly fairness and privacy. Unlike previous work, which focused on centralized differential privacy (DP) or on local DP (LDP) for a single sensitive attribute, in this paper, we examine the impact of LDP in the presence of several sensitive attributes (i.e., multi-dimensional data) on fairness. Detailed empirical analysis on synthetic and benchmark datasets revealed very relevant observations. In particular, (1) multi-dimensional LDP is an efficient approach to reduce disparity, (2) the variant of the multi-dimensional approach of LDP (we employ two variants) matters only at low privacy guarantees (high \(\epsilon\)), and (3) the true decision distribution has an important effect on which group is more sensitive to the obfuscation. Last, we summarize our findings in the form of recommendations to guide practitioners in adopting effective privacy-preserving practices while maintaining fairness and utility in machine learning applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Disclosure, Security, and Data Quality

Reconciling privacy and utility: an unscented Kalman filter-based framework for differentially private machine learning

Article 07 December 2022

Differential Privacy Mechanisms: A State-of-the-Art Survey

Notes

In this paper, we use the term protected to designate sensitive attributes from a fairness perspective and the term sensitive to designate sensitive attributes from a privacy perspective.
In the rest of the paper we use attribute and variable interchangeably.
True positive rate = \(\frac{TP}{TP+FN}\)
False positive rate = \(\frac{FP}{FP+TN}\)
Positive predictive values = \(\frac{TP}{TP+FP}\)
C and A follow Binomial distributions while M follows Multinomial distribution.
The 50K threshold is used in the well-known Adult dataset mostly used in the literature (Dua 2017).
As this observation is about the accuracy, only the last two fairness metrics are concerned, that is, OAD and PRD corresponding to the two lower rows of Fig. 5.
Note that this observation is also confirmed in the Compas dataset (Fig. 8) but inverted since the privileged group in this dataset is the group \(A=0\). We generated a second synthetic dataset where the group \(A=0\) is privileged to confirm the inversed behavior. The plots can be found in Appendix Appendix A.1.
Again, the behavior is reversed for the Compas dataset (Fig. 8) for the same reason as the previous observation.
A back-door path is a path between A and Y with an edge into A.

References

Alves G, Bernier F, Couceiro M, Makhlouf K, Palamidessi C, Zhioua S (2022) Survey on fairness notions and related tensions. arXiv preprint arXiv:2209.13012
Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias. propublica. See https://www. propublica. org/article/machine-bias-risk-assessments-in-criminal-sentencing
Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2021) Random sampling plus fake data: multidimensional frequency estimates with local differential privacy. In: Proceedings of the 30th ACM international conference on information & knowledge management, CIKM ’21, New York, NY, USA . Association for Computing Machinery, pp 47–57
Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2022) Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates. Digit Commun Netw
Arcolezi HH, Couchot JF, Gambs S, Palamidessi C, Zolfaghari M (2022) Multi-Freq-LDPy: multiple frequency estimation under local differential privacy in python. In: Atluri V, Di Pietro R, Jensen CD, Meng W (eds) Computer Security—ESORICS 2022. Springer Nature Switzerland, Cham, pp 770–775
Arcolezi HH, Makhlouf K, Palamidessi C (2023) (Local) differential privacy has NO disparate impact on fairness. In: Data and applications security and privacy XXXVII. Springer Nature Switzerland, pp 3–21
Bagdasaryan E, Poursaeed O, Shmatikov V (2019) Differential privacy has disparate impact on model accuracy. Adv Neural Inf Process Syst 32
Barocas S, Hardt M, Narayanan A (2019) Fairness and machine learning. http://www.fairmlbook.org
Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2021) Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1):3–44
Article MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Chang H, Shokri R (2021) On the privacy risks of algorithmic fairness. In 2021 IEEE European symposium on security and privacy (EuroS &P). IEEE, pp 292–303
Chen C, Liang Y, Xu X, Xie S, Kundu A, Payani A, Hong Y, Shu K (2022) When fairness meets privacy: Fair classification with semi-private sensitive attributes. In: Workshop on trustworthy and socially responsible machine learning, NeurIPS 2022
Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2):153–163
Article Google Scholar
Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806
da Costa Filho JS, Machado JC (2023) FELIP: a local differentially private approach to frequency estimation on multidimensional datasets. In: Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 671–683. OpenProceedings.org
de Oliveira AS, Kaplan C, Mallat K, Chakraborty T (2023) An empirical analysis of fairness notions under differential privacy. arXiv preprint arXiv:2302.02910
Differential Privacy Team Apple (2017) Learning with privacy at scale
Ding F, Hardt M, Miller J, Schmidt L (2021) Retiring adult: new datasets for fair machine learning. Adv Neural Inf Process Syst 34:6478–6490
Google Scholar
Domingo-Ferrer J, Soria-Comas J (2022) Multi-dimensional randomized response. IEEE Trans Knowl Data Eng 34(10):4933–4946
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Reingold O (ed) Theory of cryptography. Springer, Berlin, pp 265–284
Chapter Google Scholar
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226
Dwork C, Roth A (2014) et al. The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407
Erlingsson Ú, Pihur V, Korolova A (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 1054–1067
Farrand T, Mireshghallah F, Singh S, Trask A (2020) Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 workshop on privacy-preserving machine learning in practice, pp 15–19
Ficiu B, Lawrence ND, Paleyes A (2023) Automated discovery of trade-off between utility, privacy and fairness in machine learning models. arXiv preprint arXiv:2311.15691
Fioretto F, Tran C, Van Hentenryck P, Zhu K (2022) Differential privacy and fairness in decisions and learning tasks: a survey. arXiv preprint arXiv:2202.08187
Ganev G, Oprisanu B, De Cristofaro E (2022) Robin hood and Matthew effects: differential privacy has disparate impact on synthetic data. In: International conference on machine learning. PMLR, pp 6944–6959
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29:3315–3323
Google Scholar
Impact ldp on fairness repository (2023). https://github.com/KarimaMakhlouf/Impact_of_LDP_on_Fairness
Jagielski M, Kearns M, Mao J, Oprea A, Roth A, Sharifi-Malvajerdi S, Ullman J (2019) Differentially private fair learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, 09–15 Jun, pp 3000–3008
Kairouz P, Bonawitz K, Ramage D (2016) Discrete distribution estimation under local privacy. In: International conference on machine learning. PMLR, pp 2436–2444
Kasiviswanathan SP, Lee HK, Nissim K (2011) What can we learn privately? SIAM J. Comput. 40(3):793–826
Article MathSciNet Google Scholar
Kikuchi H (2022) Castell: scalable joint probability estimation of multi-dimensional data randomized with local differential privacy. arXiv preprint arXiv:2212.01627
Liu G, Tang P, Hu C, Jin C, Guo S (2023) Multi-dimensional data publishing with local differential privacy. In Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 183–194. OpenProceedings.org
Makhlouf K, Zhioua S, Palamidessi C (2021) Machine learning fairness notions: bridging the gap with real-world applications. Inf. Process. Manag. 58(5):102642
Article Google Scholar
Makhlouf K, Zhioua S, Palamidessi C (2021) On the applicability of machine learning fairness notions. 23(1):14–23
Makhlouf K, Zhioua S, Palamidessi C (2022) Identifiability of causal-based ml fairness notions. In: 2022 14th international conference on computational intelligence and communication networks (CICN), pp 1–8
Mangold P, Perrot M, Bellet A, Tommasi M (2023) Differential privacy has bounded impact on fairness in classification. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds) Proceedings of the 40th international conference on machine learning, volume 202 of proceedings of machine learning research. PMLR, 23–29, pp 23681–23705
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):1–35
Article Google Scholar
Mitchell S, Potash E, Barocas S, D’Amour A, Lum K (2021) Algorithmic fairness: choices, assumptions, and definitions. Ann Rev Stat Appl 8:141–163
Article MathSciNet Google Scholar
Mozannar H, Ohannessian M, Srebro N (2020) Fair learning with private demographic data. In: International conference on machine learning. PMLR, pp 7066–7075
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Book Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12:2825–2830
MathSciNet Google Scholar
Ren X, Yu CM, Yu W, Yang S, Yang X, McCann JA, Philip SY (2018) LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forens. Secur. 13(9):2151–2166
Article Google Scholar
Tran C, Fioretto F, Van Hentenryck P (2021) Differentially private and fair deep learning: a Lagrangian dual approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 11, pp 9932–9939
Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 IEEE/ACM international workshop on software fairness (FairWare). IEEE, pp 1–7
Wang T, Blocki J, Li N, Jha S (2017) Locally differentially private protocols for frequency estimation. In: 26th USENIX security symposium (USENIX Security 17). USENIX Association, Vancouver, BC, pp 729–745
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309):63–69
Article Google Scholar
Xu D, Yuan S, Wu X (2019) Achieving differential privacy and fairness in logistic regression. In: Companion proceedings of The 2019 world wide web conference, pp 594–599

Download references

Acknowledgements

This work is supported by the Cybersecurity Center research grant number PCC-Grant-202229 at Prince Mohammad Bin Fahd University (PMU). The work of Catuscia Palamidessi, Karima Makhlouf and Sami Zhioua is also partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement 835294). The work of Héber H. Arcolezi is supported by the “ANR 22-PECY-0002” IPOP (Interdisciplinary Project on Privacy) project of the Cybersecurity PEPR and by MIAI @ Grenoble Alpes (“ANR-19-P3IA-0003”).

Author information

Authors and Affiliations

Inria Saclay, Palaiseau, France
Karima Makhlouf & Catuscia Palamidessi
Inria Centre at the University Grenoble Alpes, Grenoble, France
Héber H. Arcolezi
École Polytechnique (IPP), Palaiseau, France
Karima Makhlouf, Sami Zhioua & Catuscia Palamidessi
College of Computer Engineering and Science, Prince Mohammad Bin Fahd University, Dammam, Saudi Arabia
Ghassen Ben Brahim

Authors

Karima Makhlouf
View author publications
You can also search for this author in PubMed Google Scholar
Héber H. Arcolezi
View author publications
You can also search for this author in PubMed Google Scholar
Sami Zhioua
View author publications
You can also search for this author in PubMed Google Scholar
Ghassen Ben Brahim
View author publications
You can also search for this author in PubMed Google Scholar
Catuscia Palamidessi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karima Makhlouf.

Additional information

Responsible editor: Panagiotis Papapetrou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A.1 Results of the synthetic dataset 2

The synthetic dataset 2 follows the exact same causal model depicted in Fig. 2. The data distribution is the only difference between the Synthetic datasets 1 and 2. More specifically, synthetic data 2 differs from synthetic data 1 solely by Y distribution.

See Fig. 6.

1.2 A. 2 Results of the synthetic dataset 1 and the Compas datasets for Sect. 4.3

See Figs. 7 and 8.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Makhlouf, K., Arcolezi, H.H., Zhioua, S. et al. On the impact of multi-dimensional local differential privacy on fairness. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01031-0

Download citation

Received: 06 December 2023
Accepted: 28 April 2024
Published: 27 May 2024
DOI: https://doi.org/10.1007/s10618-024-01031-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of multi-dimensional local differential privacy on fairness

Abstract

Access this article

Similar content being viewed by others

Information Disclosure, Security, and Data Quality

Reconciling privacy and utility: an unscented Kalman filter-based framework for differentially private machine learning

Differential Privacy Mechanisms: A State-of-the-Art Survey

Notes

References

Acknowledgements