Skip to main content
Log in

On the impact of multi-dimensional local differential privacy on fairness

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Automated decision systems are increasingly used to make consequential decisions in people’s lives. Due to the sensitivity of the manipulated data and the resulting decisions, several ethical concerns need to be addressed for the appropriate use of such technologies, particularly fairness and privacy. Unlike previous work, which focused on centralized differential privacy (DP) or on local DP (LDP) for a single sensitive attribute, in this paper, we examine the impact of LDP in the presence of several sensitive attributes (i.e., multi-dimensional data) on fairness. Detailed empirical analysis on synthetic and benchmark datasets revealed very relevant observations. In particular, (1) multi-dimensional LDP is an efficient approach to reduce disparity, (2) the variant of the multi-dimensional approach of LDP (we employ two variants) matters only at low privacy guarantees (high \(\epsilon\)), and (3) the true decision distribution has an important effect on which group is more sensitive to the obfuscation. Last, we summarize our findings in the form of recommendations to guide practitioners in adopting effective privacy-preserving practices while maintaining fairness and utility in machine learning applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In this paper, we use the term protected to designate sensitive attributes from a fairness perspective and the term sensitive to designate sensitive attributes from a privacy perspective.

  2. In the rest of the paper we use attribute and variable interchangeably.

  3. True positive rate = \(\frac{TP}{TP+FN}\)

  4. False positive rate = \(\frac{FP}{FP+TN}\)

  5. Positive predictive values = \(\frac{TP}{TP+FP}\)

  6. C and A follow Binomial distributions while M follows Multinomial distribution.

  7. The 50K threshold is used in the well-known Adult dataset mostly used in the literature (Dua 2017).

  8. As this observation is about the accuracy, only the last two fairness metrics are concerned, that is, OAD and PRD corresponding to the two lower rows of Fig. 5.

  9. Note that this observation is also confirmed in the Compas dataset (Fig. 8) but inverted since the privileged group in this dataset is the group \(A=0\). We generated a second synthetic dataset where the group \(A=0\) is privileged to confirm the inversed behavior. The plots can be found in Appendix Appendix A.1.

  10. Again, the behavior is reversed for the Compas dataset (Fig. 8) for the same reason as the previous observation.

  11. A back-door path is a path between A and Y with an edge into A.

References

  • Alves G, Bernier F, Couceiro M, Makhlouf K, Palamidessi C, Zhioua S (2022) Survey on fairness notions and related tensions. arXiv preprint arXiv:2209.13012

  • Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias. propublica. See https://www. propublica. org/article/machine-bias-risk-assessments-in-criminal-sentencing

  • Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2021) Random sampling plus fake data: multidimensional frequency estimates with local differential privacy. In: Proceedings of the 30th ACM international conference on information & knowledge management, CIKM ’21, New York, NY, USA . Association for Computing Machinery, pp 47–57

  • Arcolezi HH, Couchot JF, Al Bouna B, Xiao X (2022) Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates. Digit Commun Netw

  • Arcolezi HH, Couchot JF, Gambs S, Palamidessi C, Zolfaghari M (2022) Multi-Freq-LDPy: multiple frequency estimation under local differential privacy in python. In: Atluri V, Di Pietro R, Jensen CD, Meng W (eds) Computer Security—ESORICS 2022. Springer Nature Switzerland, Cham, pp 770–775

  • Arcolezi HH, Makhlouf K, Palamidessi C (2023) (Local) differential privacy has NO disparate impact on fairness. In: Data and applications security and privacy XXXVII. Springer Nature Switzerland, pp 3–21

  • Bagdasaryan E, Poursaeed O, Shmatikov V (2019) Differential privacy has disparate impact on model accuracy. Adv Neural Inf Process Syst 32

  • Barocas S, Hardt M, Narayanan A (2019) Fairness and machine learning. http://www.fairmlbook.org

  • Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2021) Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1):3–44

    Article  MathSciNet  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Chang H, Shokri R (2021) On the privacy risks of algorithmic fairness. In 2021 IEEE European symposium on security and privacy (EuroS &P). IEEE, pp 292–303

  • Chen C, Liang Y, Xu X, Xie S, Kundu A, Payani A, Hong Y, Shu K (2022) When fairness meets privacy: Fair classification with semi-private sensitive attributes. In: Workshop on trustworthy and socially responsible machine learning, NeurIPS 2022

  • Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2):153–163

    Article  Google Scholar 

  • Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806

  • da Costa Filho JS, Machado JC (2023) FELIP: a local differentially private approach to frequency estimation on multidimensional datasets. In: Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 671–683. OpenProceedings.org

  • de Oliveira AS, Kaplan C, Mallat K, Chakraborty T (2023) An empirical analysis of fairness notions under differential privacy. arXiv preprint arXiv:2302.02910

  • Differential Privacy Team Apple (2017) Learning with privacy at scale

  • Ding F, Hardt M, Miller J, Schmidt L (2021) Retiring adult: new datasets for fair machine learning. Adv Neural Inf Process Syst 34:6478–6490

    Google Scholar 

  • Domingo-Ferrer J, Soria-Comas J (2022) Multi-dimensional randomized response. IEEE Trans Knowl Data Eng 34(10):4933–4946

    Article  Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository

  • Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Reingold O (ed) Theory of cryptography. Springer, Berlin, pp 265–284

    Chapter  Google Scholar 

  • Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226

  • Dwork C, Roth A (2014) et al. The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407

  • Erlingsson Ú, Pihur V, Korolova A (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 1054–1067

  • Farrand T, Mireshghallah F, Singh S, Trask A (2020) Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 workshop on privacy-preserving machine learning in practice, pp 15–19

  • Ficiu B, Lawrence ND, Paleyes A (2023) Automated discovery of trade-off between utility, privacy and fairness in machine learning models. arXiv preprint arXiv:2311.15691

  • Fioretto F, Tran C, Van Hentenryck P, Zhu K (2022) Differential privacy and fairness in decisions and learning tasks: a survey. arXiv preprint arXiv:2202.08187

  • Ganev G, Oprisanu B, De Cristofaro E (2022) Robin hood and Matthew effects: differential privacy has disparate impact on synthetic data. In: International conference on machine learning. PMLR, pp 6944–6959

  • Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29:3315–3323

    Google Scholar 

  • Impact ldp on fairness repository (2023). https://github.com/KarimaMakhlouf/Impact_of_LDP_on_Fairness

  • Jagielski M, Kearns M, Mao J, Oprea A, Roth A, Sharifi-Malvajerdi S, Ullman J (2019) Differentially private fair learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, 09–15 Jun, pp 3000–3008

  • Kairouz P, Bonawitz K, Ramage D (2016) Discrete distribution estimation under local privacy. In: International conference on machine learning. PMLR, pp 2436–2444

  • Kasiviswanathan SP, Lee HK, Nissim K (2011) What can we learn privately? SIAM J. Comput. 40(3):793–826

    Article  MathSciNet  Google Scholar 

  • Kikuchi H (2022) Castell: scalable joint probability estimation of multi-dimensional data randomized with local differential privacy. arXiv preprint arXiv:2212.01627

  • Liu G, Tang P, Hu C, Jin C, Guo S (2023) Multi-dimensional data publishing with local differential privacy. In Proceedings of the 26th international conference on extending database technology, EDBT 2023, Ioannina, Greece, March 28–31, 2023, pp 183–194. OpenProceedings.org

  • Makhlouf K, Zhioua S, Palamidessi C (2021) Machine learning fairness notions: bridging the gap with real-world applications. Inf. Process. Manag. 58(5):102642

    Article  Google Scholar 

  • Makhlouf K, Zhioua S, Palamidessi C (2021) On the applicability of machine learning fairness notions. 23(1):14–23

  • Makhlouf K, Zhioua S, Palamidessi C (2022) Identifiability of causal-based ml fairness notions. In: 2022 14th international conference on computational intelligence and communication networks (CICN), pp 1–8

  • Mangold P, Perrot M, Bellet A, Tommasi M (2023) Differential privacy has bounded impact on fairness in classification. In: Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J (eds) Proceedings of the 40th international conference on machine learning, volume 202 of proceedings of machine learning research. PMLR, 23–29, pp 23681–23705

  • Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):1–35

    Article  Google Scholar 

  • Mitchell S, Potash E, Barocas S, D’Amour A, Lum K (2021) Algorithmic fairness: choices, assumptions, and definitions. Ann Rev Stat Appl 8:141–163

    Article  MathSciNet  Google Scholar 

  • Mozannar H, Ohannessian M, Srebro N (2020) Fair learning with private demographic data. In: International conference on machine learning. PMLR, pp 7066–7075

  • Pearl J (2009) Causality. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12:2825–2830

    MathSciNet  Google Scholar 

  • Ren X, Yu CM, Yu W, Yang S, Yang X, McCann JA, Philip SY (2018) LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forens. Secur. 13(9):2151–2166

    Article  Google Scholar 

  • Tran C, Fioretto F, Van Hentenryck P (2021) Differentially private and fair deep learning: a Lagrangian dual approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 11, pp 9932–9939

  • Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 IEEE/ACM international workshop on software fairness (FairWare). IEEE, pp 1–7

  • Wang T, Blocki J, Li N, Jha S (2017) Locally differentially private protocols for frequency estimation. In: 26th USENIX security symposium (USENIX Security 17). USENIX Association, Vancouver, BC, pp 729–745

  • Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309):63–69

    Article  Google Scholar 

  • Xu D, Yuan S, Wu X (2019) Achieving differential privacy and fairness in logistic regression. In: Companion proceedings of The 2019 world wide web conference, pp 594–599

Download references

Acknowledgements

This work is supported by the Cybersecurity Center research grant number PCC-Grant-202229 at Prince Mohammad Bin Fahd University (PMU). The work of Catuscia Palamidessi, Karima Makhlouf and Sami Zhioua is also partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement 835294). The work of Héber H. Arcolezi is supported by the “ANR 22-PECY-0002” IPOP (Interdisciplinary Project on Privacy) project of the Cybersecurity PEPR and by MIAI @ Grenoble Alpes (“ANR-19-P3IA-0003”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karima Makhlouf.

Additional information

Responsible editor: Panagiotis Papapetrou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A.1 Results of the synthetic dataset 2

The synthetic dataset 2 follows the exact same causal model depicted in Fig. 2. The data distribution is the only difference between the Synthetic datasets 1 and 2. More specifically, synthetic data 2 differs from synthetic data 1 solely by Y distribution.

See Fig. 6.

Fig. 6
figure 6

Impact of \(k\)-RR on fairness for the Synthetic datasets 2 generated with three different thresholds leading to different Y distributions. The gray shaded area represents the disparity results using the baseline model (noLDP)

1.2 A. 2 Results of the synthetic dataset 1 and the Compas datasets for Sect. 4.3

See Figs. 7 and 8.

Fig. 7
figure 7

Impact of Y distribution on the privacy-fairness trade-off. Columns 1, 2, and 3 illustrate the results for the synthetic dataset 1 when the Y distribution is skewed to 1, balanced, and skewed to 0, respectively. The gray shaded area represents the disparity results using the baseline model (noLDP)

Fig. 8
figure 8

Impact of Y distribution on the privacy-fairness trade-off. Columns 1, 2, and 3 illustrate the results for the Compas dataset when the Y distribution is skewed to 1, balanced, and skewed to 0, respectively. The gray shaded area represents the disparity results using the baseline model (noLDP)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Makhlouf, K., Arcolezi, H.H., Zhioua, S. et al. On the impact of multi-dimensional local differential privacy on fairness. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01031-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-024-01031-0

Keywords

Navigation