Skip to main content

Group and Individual Fairness in Clustering Algorithms

  • Chapter
  • First Online:
Ethics in Artificial Intelligence: Bias, Fairness and Beyond

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1123))

  • 303 Accesses

Abstract

Clustering is a classical unsupervised machine learning technique. It has various applications in criminal justice, automated resume processing, bank loan approvals, recommender systems, and many more. Despite being so popular, traditional clustering algorithms may result in discriminatory behavior towards a group of people (or individuals) and have societal impacts. It has led to the study of fair clustering algorithms that aim to minimize the clustering cost while ensuring fairness. This chapter outlines existing group and individual fairness notions, discusses their relationships, and comprehensively categorizes the current algorithms. The chapter further discusses the advantages and disadvantages of existing algorithms in terms of theoretical guarantees, time complexity, and reproducibility. Finally, the chapter concludes with a discussion of new directions and open problems in the field of fair clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(\boldsymbol{\tau }\) vector is written in the form (red, blue) respectively in \(\boldsymbol{\tau }\)-mp, \(\boldsymbol{\tau }\)-rd and \(\boldsymbol{\tau }\)-fair notion.

  2. 2.

    (pq)-approx bicriteria denotes cost approximation of p and fairness approximation of q.

  3. 3.

    Ratio of clustering objective value under fairness constraint to the standard objective value.

  4. 4.

    Mean center of all points belonging to a single color (say red points) in the dataset.

References

  1. Abbasi M, Bhaskara A, Venkatasubramanian S (2021) Fair clustering via equitable group representations. In: ACM FAccT, pp 504–514. https://doi.org/10.1145/3442188.3445913

  2. Abraham SS, Padmanabhan D, Sundaram SS (2020) Fairness in clustering with multiple sensitive attributes. In: EDBT/ICDT joint conference, pp 287–298

    Google Scholar 

  3. Ahmadian S, Epasto A, Kumar R, Mahdian M (2019) Clustering without over-representation. In: SIGKDD, pp 267–275. https://doi.org/10.1145/3292500.3330987

  4. Ahmadian S, Epasto A, Kumar R, Mahdian M (2020) Fair correlation clustering. In: International conference on artificial intelligence and statistics. PMLR, pp 4195–4205

    Google Scholar 

  5. Amanatidis G, Aziz H, Birmpas G, Filos-Ratsikas A, Li B, Moulin H, Voudouris AA, Wu X (2022) Fair division of indivisible goods: a survey. arXiv:2208.08782

  6. Anderson N, Bera SK, Das S, Liu Y (2020) Distributional individual fairness in clustering. arXiv:2006.12589

  7. Anegg G, Angelidakis H, Kurpisz A, Zenklusen R (2020) A technique for obtaining true approximations for k-center with covering constraints. In: International conference on integer programming and combinatorial optimization. Springer, pp 52–65

    Google Scholar 

  8. Anegg G, Koch LV, Zenklusen R (2022) Techniques for generalized colorful \(k\)-center problems. arXiv:2207.02609

  9. Asano T, Asano Y (2000) Recent developments in maximum flow algorithms. J Oper Res Soc Jpn 43(1):2–31

    MathSciNet  Google Scholar 

  10. Bacelar M (2021) Monitoring bias and fairness in machine learning models: a review. ScienceOpen Preprints

    Google Scholar 

  11. Backurs A, Indyk P, Onak K, Schieber B, Vakilian A, Wagner T (2019) Scalable fair clustering. In: ICML, pp 405–413

    Google Scholar 

  12. Balashankar A, Lees A, Welty C, Subramanian L (2019) What is fair? exploring pareto-efficiency for fairness constrained classifiers. arXiv:1910.14120

  13. Balcan MF, Blum A, Vempala S (2008) A discriminative framework for clustering via similarity functions. In: ACM STOC, pp 671–680

    Google Scholar 

  14. Bandyapadhyay S, Fomin FV, Simonov K (2020) On coresets for fair clustering in metric and euclidean spaces and their applications. arXiv:2007.10137

  15. Bandyapadhyay S, Inamdar T, Pai S, Varadarajan K (2019) A constant approximation for colorful k-center. arXiv:1907.08906

  16. Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395

    Article  MathSciNet  Google Scholar 

  17. Bera S, Chakrabarty D, Flores N, Negahbani M (2019) Fair algorithms for clustering. In: NeurIPS, pp 4954–4965

    Google Scholar 

  18. Bercea IO, Groß M, Khuller S, Kumar A, Rösner C, Schmidt DR, Schmidt M (2018) On the cost of essentially fair clusterings. arXiv:1811.10319

  19. Biddle D (2017) Adverse impact and test validation: a practitioner’s guide to valid and defensible employment testing. Routledge

    Google Scholar 

  20. Böhm M, Fazzone A, Leonardi S, Schwiegelshohn C (2020) Fair clustering with multiple colors. arXiv:2002.07892

  21. Brubach B, Chakrabarti D, Dickerson J, Khuller S, Srinivasan A, Tsepenekas L (2020) A pairwise fair and community-preserving approach to k-center clustering. In: ICML, pp 1178–1189

    Google Scholar 

  22. Brubach B, Chakrabarti D, Dickerson JP, Srinivasan A, Tsepenekas L (2021) Fairness, semi-supervised learning, and more: a general framework for clustering with stochastic pairwise constraints. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6822–6830

    Google Scholar 

  23. Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2014) An improved approximation for k-median, and positive correlation in budgeted optimization. In: ACM-SIAM SODA, pp 737–756

    Google Scholar 

  24. Chakrabarti D, Dickerson JP, Esmaeili SA, Srinivasan A, Tsepenekas L (2021) A new notion of individually fair clustering: \(\alpha \)-equitable \(k\)-center. arXiv:2106.05423

  25. Chan THH, Dinitz M, Gupta A (2006) Spanners with slack. In: European symposium on algorithms. Springer, pp 196–207

    Google Scholar 

  26. Charikar M, Makarychev K, Makarychev Y (2010) Local global tradeoffs in metric embeddings. SIAM J Comput 39(6):2487–2512

    Article  MathSciNet  Google Scholar 

  27. Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access

    Google Scholar 

  28. Chhabra A, Singla A, Mohapatra P (2021) Fair clustering using antidote data. arXiv:2106.00600

  29. Chierichetti F, Kumar R, Lattanzi S, Vassilvitskii, S.: Fair clustering through fairlets. In: NeurIPS, pp. 5036–5044 (2017)

    Google Scholar 

  30. Chlamtáč E, Makarychev Y, Vakilian A (2022) Approximating fair clustering with cascaded norm objectives. In: Proceedings of the 2022 annual ACM-SIAM symposium on discrete algorithms (SODA). SIAM, pp 2664–2683

    Google Scholar 

  31. Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. Accessed 15-August-2021

  32. Davidson I, Ravi S (2020) Making existing clusterings fairer: algorithms, complexity results and insights. AAAI 34(04):3733–3740. https://doi.org/10.1609/aaai.v34i04.5783. ojs.aaai.org/index.php/AAAI/article/view/5783

    Article  Google Scholar 

  33. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: ITCS, pp 214–226

    Google Scholar 

  34. Esmaeili S, Brubach B, Srinivasan A, Dickerson J (2021) Fair clustering under a bounded cost. In: NeurIPS

    Google Scholar 

  35. Esmaeili S, Brubach B, Tsepenekas L, Dickerson J (2020) Probabilistic fair clustering. In: NeurIPS, pp 12743–12755

    Google Scholar 

  36. Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv (CSUR) 18(1):23–38

    Article  MathSciNet  Google Scholar 

  37. Ghadiri M, Samadi S, Vempala S (2021) Socially fair k-means clustering. In: ACM FAccT, pp 438–448

    Google Scholar 

  38. Ghassami A, Khodadadian S, Kiyavash N (2018) Fairness in supervised learning: an information theoretic approach. In: IEEE ISIT, pp 176–180

    Google Scholar 

  39. Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306

    Article  MathSciNet  Google Scholar 

  40. Goyal D, Jaiswal R (2021) Tight fpt approximation for socially fair clustering. arXiv:2106.06755

  41. Gupta S, Ghalme G, Krishnan NC, Jain S (2021) Efficient algorithms for fair clustering with a new fairness notion. arXiv:2109.00708

  42. Harb E, Lam HS (2020) Kfc: a scalable approximation algorithm for \( k \)- center fair clustering. In: NEURIPS, pp 14509–14519

    Google Scholar 

  43. Hardt M, Megiddo N, Papadimitriou C, Wootters M (2016) Strategic classification. In: ITCS, ITCS ’16. Association for Computing Machinery, New York, NY, USA, pp 111–122

    Google Scholar 

  44. Hochbaum DS, Shmoys DB (1986) A unified approach to approximation algorithms for bottleneck problems. J ACM (JACM) 33(3):533–550

    Article  MathSciNet  Google Scholar 

  45. Hong W, Zheng S, Wang H (2013) A job recommender system based on user clustering. J Comput 8(8) (2013)

    Google Scholar 

  46. Huang L, Jiang S, Vishnoi N (2019) Coresets for clustering with fairness constraints. In: NeurIPS, pp 7589–7600

    Google Scholar 

  47. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323. https://doi.org/10.1145/331499.331504. doi.org/10.1145/331499.331504

    Article  Google Scholar 

  48. Jia X, Sheth K, Svensson O (2020) Fair colorful k-center clustering. In: International conference on integer programming and combinatorial optimization. Springer, pp 209–222

    Google Scholar 

  49. Jones M, Nguyen H, Nguyen T (2020) Fair k-centers via maximum matching. In: ICML, pp 4940–4949

    Google Scholar 

  50. Julia A, Larson J (2016) Propublica machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 13-August-2021

  51. Julia A, Larson J, Mattu S, Kirchner L (2016) Propublica–machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 13-August-2021

  52. Jung C, Kannan S, Lutz N (2020) Service in your neighborhood: fairness in center location. Foundations of responsible computing

    Google Scholar 

  53. Kanaparthy S, Padala M, Damle S, Gujar S (2022) Fair federated learning for heterogeneous data. In: Joint CODS-COMAD, pp 298–299. https://doi.org/10.1145/3493700.3493750

  54. Kar D, Medya S, Mandal D, Silva A, Dey P, Sanyal S (2021) Feature-based individual fairness in k-clustering. arXiv:2109.04554

  55. Kleindessner M, Awasthi P, Morgenstern J (2019) Fair k-center clustering for data summarization. In: ICML, pp 3448–3457

    Google Scholar 

  56. Kleindessner M, Awasthi P, Morgenstern J (2020) A notion of individual fairness for clustering. arXiv:2006.04960

  57. Kurdija AS, Afric P, Sikic L, Plejic B, Silic M, Delac G, Vladimir K, Srbljic S (200) Candidate classification and skill recommendation in a cv recommender system. In: International conference on AI and mobile services. Springer, pp 30–44

    Google Scholar 

  58. Le Quy T, Roy A, Friege G, Ntoutsi E (2021) Fair-capacitated clustering. In: EDM, pp 407–414

    Google Scholar 

  59. Li B, Li L, Sun A, Wang C, Wang Y (2021) Approximate group fairness for clustering. In: ICML, pp 6381–6391. http://proceedings.mlr.press/v139/li21j.html

  60. Li S, Svensson O (2016) Approximating k-median via pseudo-approximation. SIAM J Comput 45(2):530–547

    Article  MathSciNet  Google Scholar 

  61. Liu S, Vicente LN (2021) A stochastic alternating balance \( k \)-means algorithm for fair clustering. arXiv:2105.14172

  62. Mahabadi S, Vakilian A (2020) Individual fairness for k-clustering. In: ICML, pp 6586–6596

    Google Scholar 

  63. Makarychev Y, Vakilian A (2021) Approximation algorithms for socially fair clustering. In: Belkin M, Kpotufe S (eds) COLT. https://proceedings.mlr.press/v134/makarychev21a.html

  64. McMahan HB et al (2021) Advances and open problems in federated learning. Found Trends® Mach Learn 14(1)

    Google Scholar 

  65. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6). https://doi.org/10.1145/3457607

  66. Mhasawade V, Zhao Y, Chunara R (2021) Machine learning and algorithmic fairness in public and population health. Nat Mach Intell 3(8):659–666

    Article  Google Scholar 

  67. Micha E, Shah N (2020) Proportionally fair clustering revisited. In: ICALP

    Google Scholar 

  68. Moulin H (2004) Fair division and collective welfare. MIT Press

    Google Scholar 

  69. Nedlund E (2019) Apple card is accused of gender bias.here’s how that can happen. https://edition.cnn.com/2019/11/12/business/apple-card-gender-bias/index.html. Accessed 1-November-2022

  70. Negahbani M, Chakrabarty D (2021) Better algorithms for individually fair \( k \)-clustering. In: NeurIPS

    Google Scholar 

  71. Padmanabhan D, Abraham SS (2020) Representativity fairness in clustering. In: 12th ACM conference on web science. http://dx.doi.org/10.1145/3394231.3397910

  72. Padmanabhan D (2020) Whither fair clustering? In: AI for social good: CRCS workshop

    Google Scholar 

  73. Rösner C, Schmidt M (2018) Privacy preserving clustering with constraints. In: 45th international colloquium on automata, languages, and programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

    Google Scholar 

  74. Samet H (1984) The quadtree and related hierarchical data structures. ACM Comput Surv (CSUR) 16(2):187–260

    Article  MathSciNet  Google Scholar 

  75. Schmidt M, Schwiegelshohn C, Sohler C (2019) Fair coresets and streaming algorithms for fair k-means. In: International workshop on approximation and online algorithms. Springer, pp 232–251

    Google Scholar 

  76. Schmidt M, Wargalla J (2021) Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space. arXiv:2106.07319

  77. Sharifi-Malvajerdi S, Kearns M, Roth A (2019) Average individual fairness: algorithms, generalization and experiments. In: NeurIPS, pp 8242–8251

    Google Scholar 

  78. Song M, Rajasekaran S (2010) Fast algorithms for constant approximation k-means clustering. Trans Mach Learn Data Min 3(2):67–79

    Google Scholar 

  79. Stoica AA, Papadimitriou C (2018) Strategic clustering. http://www.columbia.edu/as5001/strategicclustering.pdf. Accessed 22-January-2022

  80. Swamy C (2016) Improved approximation algorithms for matroid and knapsack median problems and applications. ACM Trans Algorithms 12(4). https://doi.org/10.1145/2963170

  81. Thejaswi S, Ordozgoiti B, Gionis A (2021) Diversity-aware k-median: clustering with fair center representation. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 765–780

    Google Scholar 

  82. Vakilian A, Yalçıner M (2021) Improved approximation algorithms for individually fair clustering. arXiv:2106.14043

  83. Wang B, Davidson I (2019) Towards fair deep clustering with multi-state protected variables. arXiv:1901.10053

  84. Zhang H, Davidson I (2021) Deep fair discriminative clustering. arXiv:2105.14146

  85. Ziko IM, Yuan J, Granger E, Ayed IB (2021) Variational fair clustering. In: AAAI, pp 11202–11209

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narayanan C. Krishnan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Institution of Engineers (India)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gupta, S., Jain, S., Ghalme, G., Krishnan, N.C., Hemachandra, N. (2023). Group and Individual Fairness in Clustering Algorithms. In: Mukherjee, A., Kulshrestha, J., Chakraborty, A., Kumar, S. (eds) Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, vol 1123. Springer, Singapore. https://doi.org/10.1007/978-981-99-7184-8_2

Download citation

Publish with us

Policies and ethics