Skip to main content

Robustness of Fairness: An Experimental Analysis

  • 1450 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1524)

Abstract

Machine learning algorithms are increasingly used in making decisions with significant social impact. However, the predictions made by these algorithms can be demonstrably biased; oftentimes reflecting and even amplifying societal prejudice. Fairness metrics can be used to evaluate the models learned by these algorithms. But how robust are these metrics to reasonable variations in the test data? In this work, we measure the robustness of these metrics by training multiple models in three distinct application domains using publicly available real-world datasets (including the COMPAS dataset). We test each of these models for both performance and fairness on multiple test datasets generated by resampling from a set of held-out datapoints. We see that fairness metrics exhibit far greater variance across these test datasets than performance metrics, when the model has not been derived to be fair. Further, socially disadvantaged groups seem to be most affected by this lack of robustness. Even when the model objective includes fairness constraints, while the mean fairness of the model necessarily increases, its robustness is not consistently and significantly improved. Our work thus highlights the need to consider variations in the test data when evaluating model fairness and provides a framework to do so.

Keywords

  • Classification
  • Fairness
  • Bootstrap sampling
  • Robustness

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-93736-2_43
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-93736-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    We use [[]] to denote the Iverson bracket which returns a value of 1 if the predicate contained within is true and 0 otherwise.

  2. 2.

    Datasets with unit fairness were withheld in the F-test analysis to prevent degenerate cases. However, these accounted for less than \(1.5\%\) of all 800 sample datasets.

  3. 3.

    While the independence assumption does not strictly hold, the F-test gives us one more means of comparison.

  4. 4.

    While we do not report results on all models due to space constraints, the omitted results are similar to reported values.

References

  1. Abebe, R., Goldner, K.: Mechanism design for social good. AI Matters 4(3), 27–34 (2018)

    CrossRef  Google Scholar 

  2. Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. Propublica (2016)

    Google Scholar 

  3. Awasthi, P., Kleindessner, M., Morgenstern, J.: Equalized odds postprocessing under imperfect group information. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics. AISTATS 2020, 26–28 August 2020, Online [Palermo, Sicily, Italy]. Proceedings of Machine Learning Research, vol. 108, pp. 1770–1780. PMLR (2020)

    Google Scholar 

  4. Bao, M., et al.: It’s COMPASlicated: the messy relationship between RAI datasets and algorithmic fairness benchmarks. CoRR abs/2106.05498 (2021)

    Google Scholar 

  5. Barocas, S., Hardt, M., Narayanan, A.: Fairness and machine learning. fairmlbook.org (2019). http://www.fairmlbook.org

  6. Binns, R.: On the apparent conflict between individual and group fairness. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 514–524. FAT* 2020. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  7. Bolukbasi, T., Chang, K.W., Zou, J., Saligrama, V., Kalai, A.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4356–4364. NIPS 2016, Curran Associates Inc., Red Hook, NY, USA (2016)

    Google Scholar 

  8. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learning Research, vol. 81, pp. 77–91. PMLR, New York, NY, USA, 23–24 February 2018

    Google Scholar 

  9. Calders, T., Kamiran, F., Pechenizkiy, M.: Building classifiers with independency constraints. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 13–18 (2009)

    Google Scholar 

  10. Calders, T., Verwer, S.: Three Naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2), 277–292 (2010)

    CrossRef  MathSciNet  Google Scholar 

  11. Celis, L.E., Mehrotra, A., Vishnoi, N.K.: Fair classification with adversarial perturbations. CoRR abs/2106.05964 (2021)

    Google Scholar 

  12. Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5(2), 153–163 (2017)

    CrossRef  Google Scholar 

  13. Dimitrakakis, C., Liu, Y., Parkes, D.C., Radanovic, G.: Bayesian fairness. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33(01), pp. 509–516 (2019)

    Google Scholar 

  14. Donini, M., Oneto, L., Ben-David, S., Shawe-Taylor, J., Pontil, M.: Empirical risk minimization under fairness constraints. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 2796–2806. NIPS 2018, Curran Associates Inc., Red Hook, NY, USA (2018)

    Google Scholar 

  15. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  16. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012)

    Google Scholar 

  17. Efron, B.: The bootstrap and modern statistics. J. Am. Stat. Assoc. 95(452), 1293–1296 (2000)

    CrossRef  MathSciNet  Google Scholar 

  18. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Springer (1993)

    Google Scholar 

  19. Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268. KDD 2015. Association for Computing Machinery, New York, NY, USA (2015)

    Google Scholar 

  20. Ferraro, A., Serra, X., Bauer, C.: Break the loop: gender imbalance in music recommenders. In: Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, pp. 249–254. CHIIR 2021. Association for Computing Machinery, New York, NY, USA (2021)

    Google Scholar 

  21. Friedler, S.A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E.P., Roth, D.: A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 329–338. FAT* 2019. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  22. Hardt, M., Price, E., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)

    Google Scholar 

  23. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition. Springer Series in Statistics. Springer (2009). https://doi.org/10.1007/978-0-387-84858-7

  24. Holstein, K., Vaughan, J.W., Daumé, H., Dudik, M., Wallach, H.: Improving fairness in machine learning systems: what do industry practitioners need? pp. 1–16. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  25. Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)

    CrossRef  Google Scholar 

  26. Kleinberg, J., Ludwig, J., Mullainathan, S., Rambachan, A.: Algorithmic fairness. In: AEA Papers and Proceedings, vol. 108, pp. 22–27 (2018)

    Google Scholar 

  27. Kleinberg, J.M., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. In: Papadimitriou, C.H. (ed.) 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, 9–11 January 2017, Berkeley, CA, USA. LIPIcs, vol. 67, pp. 43:1–43:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

    Google Scholar 

  28. MacCarthy, M.: Mandating fairness and accuracy assessments for law enforcement facial recognition systems. The Brookings Institution (2021). https://www.brookings.edu/blog/techtank/2021/05/26/mandating-fairness-and-accuracy-assessments-for-law-enforcement-facial-recognition-systems

  29. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. CoRR abs/1908.09635 (2019)

    Google Scholar 

  30. Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learning Research, vol. 81, pp. 107–118. PMLR, New York, NY, USA, 23–24 February 2018

    Google Scholar 

  31. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Supp. Syst. 62, 22–31 (2014)

    CrossRef  Google Scholar 

  32. Nanda, V., Dooley, S., Singla, S., Feizi, S., Dickerson, J.P.: Fairness through robustness: investigating robustness disparity in deep learning. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 466–477. FAccT 2021. Association for Computing Machinery, New York, NY, USA (2021)

    Google Scholar 

  33. Noble, S.U.: Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, New York (2018)

    Google Scholar 

  34. O’Neil, C.: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, USA (2016)

    Google Scholar 

  35. Parkes, D.C., Vohra, R.V., et al.: Algorithmic and economic perspectives on fairness. CoRR abs/1909.05282 (2019)

    Google Scholar 

  36. Pleiss, G.: Code and data for the experiments in “On fairness and calibration” (2013)

    Google Scholar 

  37. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., Weinberger, K.Q.: On fairness and calibration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  38. Prabhakaran, V., Hutchinson, B., Mitchell, M.: Perturbation sensitivity analysis to detect unintended model biases. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, 3–7 November 2019, Hong Kong, China, pp. 5739–5744. Association for Computational Linguistics (2019)

    Google Scholar 

  39. Rambachan, A., Kleinberg, J., Ludwig, J., Mullainathan, S.: An economic perspective on algorithmic fairness. In: AEA Papers and Proceedings, vol. 110, pp. 91–95 (2020)

    Google Scholar 

  40. Rezaei, A., Liu, A., Memarrast, O., Ziebart, B.D.: Robust fairness under covariate shift. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(11), pp. 9419–9427 (2021)

    Google Scholar 

  41. Saxena, N.A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D.C., Liu, Y.: How do fairness definitions fare? Testing public attitudes towards three algorithmic definitions of fairness in loan allocations. Artif. Intell. 283, 103238 (2020)

    Google Scholar 

  42. Snecdecor, G.W., Cochran, W.G.: Statistical Methods. Wiley-Blackwell, Hoboken (1991)

    Google Scholar 

  43. Speicher, T., et al.: A unified approach to quantifying algorithmic unfairness: measuring individual & group unfairness via inequality indices. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2239–2248 (2018)

    Google Scholar 

  44. Sweeney, L.: Discrimination in online ad delivery: Google ads, black names and white names, racial discrimination, and click advertising. Queue 11(3), 10–29 (2013)

    CrossRef  Google Scholar 

  45. Tran, C., Fioretto, F., Van Hentenryck, P.: Differentially private and fair deep learning: a Lagrangian dual approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(11), pp. 9932–9939 (2021)

    Google Scholar 

  46. Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 547–558. FAT* 2020. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  47. Zafar, M.B., Valera, I., Gomez-Rodriguez, M., Gummadi, K.P.: Fairness constraints: a flexible approach for fair classification. J. Mach. Learn. Res. 20(75), 1–42 (2019)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andong Luis Li Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Kamp, S., Zhao, A.L.L., Kutty, S. (2021). Robustness of Fairness: An Experimental Analysis. In: , et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93736-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)