Abstract
Clustering is a classical unsupervised machine learning technique. It has various applications in criminal justice, automated resume processing, bank loan approvals, recommender systems, and many more. Despite being so popular, traditional clustering algorithms may result in discriminatory behavior towards a group of people (or individuals) and have societal impacts. It has led to the study of fair clustering algorithms that aim to minimize the clustering cost while ensuring fairness. This chapter outlines existing group and individual fairness notions, discusses their relationships, and comprehensively categorizes the current algorithms. The chapter further discusses the advantages and disadvantages of existing algorithms in terms of theoretical guarantees, time complexity, and reproducibility. Finally, the chapter concludes with a discussion of new directions and open problems in the field of fair clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(\boldsymbol{\tau }\) vector is written in the form (red, blue) respectively in \(\boldsymbol{\tau }\)-mp, \(\boldsymbol{\tau }\)-rd and \(\boldsymbol{\tau }\)-fair notion.
- 2.
(p, q)-approx bicriteria denotes cost approximation of p and fairness approximation of q.
- 3.
Ratio of clustering objective value under fairness constraint to the standard objective value.
- 4.
Mean center of all points belonging to a single color (say red points) in the dataset.
References
Abbasi M, Bhaskara A, Venkatasubramanian S (2021) Fair clustering via equitable group representations. In: ACM FAccT, pp 504–514. https://doi.org/10.1145/3442188.3445913
Abraham SS, Padmanabhan D, Sundaram SS (2020) Fairness in clustering with multiple sensitive attributes. In: EDBT/ICDT joint conference, pp 287–298
Ahmadian S, Epasto A, Kumar R, Mahdian M (2019) Clustering without over-representation. In: SIGKDD, pp 267–275. https://doi.org/10.1145/3292500.3330987
Ahmadian S, Epasto A, Kumar R, Mahdian M (2020) Fair correlation clustering. In: International conference on artificial intelligence and statistics. PMLR, pp 4195–4205
Amanatidis G, Aziz H, Birmpas G, Filos-Ratsikas A, Li B, Moulin H, Voudouris AA, Wu X (2022) Fair division of indivisible goods: a survey. arXiv:2208.08782
Anderson N, Bera SK, Das S, Liu Y (2020) Distributional individual fairness in clustering. arXiv:2006.12589
Anegg G, Angelidakis H, Kurpisz A, Zenklusen R (2020) A technique for obtaining true approximations for k-center with covering constraints. In: International conference on integer programming and combinatorial optimization. Springer, pp 52–65
Anegg G, Koch LV, Zenklusen R (2022) Techniques for generalized colorful \(k\)-center problems. arXiv:2207.02609
Asano T, Asano Y (2000) Recent developments in maximum flow algorithms. J Oper Res Soc Jpn 43(1):2–31
Bacelar M (2021) Monitoring bias and fairness in machine learning models: a review. ScienceOpen Preprints
Backurs A, Indyk P, Onak K, Schieber B, Vakilian A, Wagner T (2019) Scalable fair clustering. In: ICML, pp 405–413
Balashankar A, Lees A, Welty C, Subramanian L (2019) What is fair? exploring pareto-efficiency for fairness constrained classifiers. arXiv:1910.14120
Balcan MF, Blum A, Vempala S (2008) A discriminative framework for clustering via similarity functions. In: ACM STOC, pp 671–680
Bandyapadhyay S, Fomin FV, Simonov K (2020) On coresets for fair clustering in metric and euclidean spaces and their applications. arXiv:2007.10137
Bandyapadhyay S, Inamdar T, Pai S, Varadarajan K (2019) A constant approximation for colorful k-center. arXiv:1907.08906
Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395
Bera S, Chakrabarty D, Flores N, Negahbani M (2019) Fair algorithms for clustering. In: NeurIPS, pp 4954–4965
Bercea IO, Groß M, Khuller S, Kumar A, Rösner C, Schmidt DR, Schmidt M (2018) On the cost of essentially fair clusterings. arXiv:1811.10319
Biddle D (2017) Adverse impact and test validation: a practitioner’s guide to valid and defensible employment testing. Routledge
Böhm M, Fazzone A, Leonardi S, Schwiegelshohn C (2020) Fair clustering with multiple colors. arXiv:2002.07892
Brubach B, Chakrabarti D, Dickerson J, Khuller S, Srinivasan A, Tsepenekas L (2020) A pairwise fair and community-preserving approach to k-center clustering. In: ICML, pp 1178–1189
Brubach B, Chakrabarti D, Dickerson JP, Srinivasan A, Tsepenekas L (2021) Fairness, semi-supervised learning, and more: a general framework for clustering with stochastic pairwise constraints. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6822–6830
Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2014) An improved approximation for k-median, and positive correlation in budgeted optimization. In: ACM-SIAM SODA, pp 737–756
Chakrabarti D, Dickerson JP, Esmaeili SA, Srinivasan A, Tsepenekas L (2021) A new notion of individually fair clustering: \(\alpha \)-equitable \(k\)-center. arXiv:2106.05423
Chan THH, Dinitz M, Gupta A (2006) Spanners with slack. In: European symposium on algorithms. Springer, pp 196–207
Charikar M, Makarychev K, Makarychev Y (2010) Local global tradeoffs in metric embeddings. SIAM J Comput 39(6):2487–2512
Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access
Chhabra A, Singla A, Mohapatra P (2021) Fair clustering using antidote data. arXiv:2106.00600
Chierichetti F, Kumar R, Lattanzi S, Vassilvitskii, S.: Fair clustering through fairlets. In: NeurIPS, pp. 5036–5044 (2017)
Chlamtáč E, Makarychev Y, Vakilian A (2022) Approximating fair clustering with cascaded norm objectives. In: Proceedings of the 2022 annual ACM-SIAM symposium on discrete algorithms (SODA). SIAM, pp 2664–2683
Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. Accessed 15-August-2021
Davidson I, Ravi S (2020) Making existing clusterings fairer: algorithms, complexity results and insights. AAAI 34(04):3733–3740. https://doi.org/10.1609/aaai.v34i04.5783. ojs.aaai.org/index.php/AAAI/article/view/5783
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: ITCS, pp 214–226
Esmaeili S, Brubach B, Srinivasan A, Dickerson J (2021) Fair clustering under a bounded cost. In: NeurIPS
Esmaeili S, Brubach B, Tsepenekas L, Dickerson J (2020) Probabilistic fair clustering. In: NeurIPS, pp 12743–12755
Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv (CSUR) 18(1):23–38
Ghadiri M, Samadi S, Vempala S (2021) Socially fair k-means clustering. In: ACM FAccT, pp 438–448
Ghassami A, Khodadadian S, Kiyavash N (2018) Fairness in supervised learning: an information theoretic approach. In: IEEE ISIT, pp 176–180
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306
Goyal D, Jaiswal R (2021) Tight fpt approximation for socially fair clustering. arXiv:2106.06755
Gupta S, Ghalme G, Krishnan NC, Jain S (2021) Efficient algorithms for fair clustering with a new fairness notion. arXiv:2109.00708
Harb E, Lam HS (2020) Kfc: a scalable approximation algorithm for \( k \)- center fair clustering. In: NEURIPS, pp 14509–14519
Hardt M, Megiddo N, Papadimitriou C, Wootters M (2016) Strategic classification. In: ITCS, ITCS ’16. Association for Computing Machinery, New York, NY, USA, pp 111–122
Hochbaum DS, Shmoys DB (1986) A unified approach to approximation algorithms for bottleneck problems. J ACM (JACM) 33(3):533–550
Hong W, Zheng S, Wang H (2013) A job recommender system based on user clustering. J Comput 8(8) (2013)
Huang L, Jiang S, Vishnoi N (2019) Coresets for clustering with fairness constraints. In: NeurIPS, pp 7589–7600
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323. https://doi.org/10.1145/331499.331504. doi.org/10.1145/331499.331504
Jia X, Sheth K, Svensson O (2020) Fair colorful k-center clustering. In: International conference on integer programming and combinatorial optimization. Springer, pp 209–222
Jones M, Nguyen H, Nguyen T (2020) Fair k-centers via maximum matching. In: ICML, pp 4940–4949
Julia A, Larson J (2016) Propublica machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 13-August-2021
Julia A, Larson J, Mattu S, Kirchner L (2016) Propublica–machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 13-August-2021
Jung C, Kannan S, Lutz N (2020) Service in your neighborhood: fairness in center location. Foundations of responsible computing
Kanaparthy S, Padala M, Damle S, Gujar S (2022) Fair federated learning for heterogeneous data. In: Joint CODS-COMAD, pp 298–299. https://doi.org/10.1145/3493700.3493750
Kar D, Medya S, Mandal D, Silva A, Dey P, Sanyal S (2021) Feature-based individual fairness in k-clustering. arXiv:2109.04554
Kleindessner M, Awasthi P, Morgenstern J (2019) Fair k-center clustering for data summarization. In: ICML, pp 3448–3457
Kleindessner M, Awasthi P, Morgenstern J (2020) A notion of individual fairness for clustering. arXiv:2006.04960
Kurdija AS, Afric P, Sikic L, Plejic B, Silic M, Delac G, Vladimir K, Srbljic S (200) Candidate classification and skill recommendation in a cv recommender system. In: International conference on AI and mobile services. Springer, pp 30–44
Le Quy T, Roy A, Friege G, Ntoutsi E (2021) Fair-capacitated clustering. In: EDM, pp 407–414
Li B, Li L, Sun A, Wang C, Wang Y (2021) Approximate group fairness for clustering. In: ICML, pp 6381–6391. http://proceedings.mlr.press/v139/li21j.html
Li S, Svensson O (2016) Approximating k-median via pseudo-approximation. SIAM J Comput 45(2):530–547
Liu S, Vicente LN (2021) A stochastic alternating balance \( k \)-means algorithm for fair clustering. arXiv:2105.14172
Mahabadi S, Vakilian A (2020) Individual fairness for k-clustering. In: ICML, pp 6586–6596
Makarychev Y, Vakilian A (2021) Approximation algorithms for socially fair clustering. In: Belkin M, Kpotufe S (eds) COLT. https://proceedings.mlr.press/v134/makarychev21a.html
McMahan HB et al (2021) Advances and open problems in federated learning. Found Trends® Mach Learn 14(1)
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6). https://doi.org/10.1145/3457607
Mhasawade V, Zhao Y, Chunara R (2021) Machine learning and algorithmic fairness in public and population health. Nat Mach Intell 3(8):659–666
Micha E, Shah N (2020) Proportionally fair clustering revisited. In: ICALP
Moulin H (2004) Fair division and collective welfare. MIT Press
Nedlund E (2019) Apple card is accused of gender bias.here’s how that can happen. https://edition.cnn.com/2019/11/12/business/apple-card-gender-bias/index.html. Accessed 1-November-2022
Negahbani M, Chakrabarty D (2021) Better algorithms for individually fair \( k \)-clustering. In: NeurIPS
Padmanabhan D, Abraham SS (2020) Representativity fairness in clustering. In: 12th ACM conference on web science. http://dx.doi.org/10.1145/3394231.3397910
Padmanabhan D (2020) Whither fair clustering? In: AI for social good: CRCS workshop
Rösner C, Schmidt M (2018) Privacy preserving clustering with constraints. In: 45th international colloquium on automata, languages, and programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Samet H (1984) The quadtree and related hierarchical data structures. ACM Comput Surv (CSUR) 16(2):187–260
Schmidt M, Schwiegelshohn C, Sohler C (2019) Fair coresets and streaming algorithms for fair k-means. In: International workshop on approximation and online algorithms. Springer, pp 232–251
Schmidt M, Wargalla J (2021) Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space. arXiv:2106.07319
Sharifi-Malvajerdi S, Kearns M, Roth A (2019) Average individual fairness: algorithms, generalization and experiments. In: NeurIPS, pp 8242–8251
Song M, Rajasekaran S (2010) Fast algorithms for constant approximation k-means clustering. Trans Mach Learn Data Min 3(2):67–79
Stoica AA, Papadimitriou C (2018) Strategic clustering. http://www.columbia.edu/as5001/strategicclustering.pdf. Accessed 22-January-2022
Swamy C (2016) Improved approximation algorithms for matroid and knapsack median problems and applications. ACM Trans Algorithms 12(4). https://doi.org/10.1145/2963170
Thejaswi S, Ordozgoiti B, Gionis A (2021) Diversity-aware k-median: clustering with fair center representation. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 765–780
Vakilian A, Yalçıner M (2021) Improved approximation algorithms for individually fair clustering. arXiv:2106.14043
Wang B, Davidson I (2019) Towards fair deep clustering with multi-state protected variables. arXiv:1901.10053
Zhang H, Davidson I (2021) Deep fair discriminative clustering. arXiv:2105.14146
Ziko IM, Yuan J, Granger E, Ayed IB (2021) Variational fair clustering. In: AAAI, pp 11202–11209
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Institution of Engineers (India)
About this chapter
Cite this chapter
Gupta, S., Jain, S., Ghalme, G., Krishnan, N.C., Hemachandra, N. (2023). Group and Individual Fairness in Clustering Algorithms. In: Mukherjee, A., Kulshrestha, J., Chakraborty, A., Kumar, S. (eds) Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, vol 1123. Springer, Singapore. https://doi.org/10.1007/978-981-99-7184-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-7184-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7183-1
Online ISBN: 978-981-99-7184-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)