Efficient algorithms for fair clustering with a new notion of fairness

Gupta, Shivam; Ghalme, Ganesh; Krishnan, Narayanan C.; Jain, Shweta

doi:10.1007/s10618-023-00928-6

Efficient algorithms for fair clustering with a new notion of fairness

Published: 20 March 2023

Volume 37, pages 1959–1997, (2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

891 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

We revisit the problem of fair clustering, first introduced by Chierichetti et al. (Fair clustering through fairlets, 2017), which requires each protected attribute to have approximately equal representation in every cluster, i.e., a Balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objectives and fairness. In this paper, we propose a new notion of fairness which we call \(\varvec{\tau }\)-ratio fairness, that strictly generalizes the Balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that a simple greedy round-robin-based algorithm achieves this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the proposed algorithm, the Fair Round-Robin Algorithm for Clustering Over-End (\({\textsc {FRAC}}_{OE}\)). We also propose a heuristic algorithm, Fair Round-Robin Algorithm for Clustering (FRAC), that applies round-robin allocation at each iteration of a vanilla clustering algorithm. Our experimental results suggest that both FRAC and \({\textsc {FRAC}}_{OE}\) outperform all the state-of-the-art algorithms and work exceptionally well even for a large number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Making clusterings fairer by post-processing: algorithms, complexity results and experiments

Article 17 December 2022

Group and Individual Fairness in Clustering Algorithms

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

Availability of data and material

All datasets used in the experiments are publicly available on UCI repository.

Notes

Throughout the paper, for simplicity, we call a k-clustering algorithm as a clustering algorithm.
Otherwise, the problem is uninteresting as the balanced clustering may not be feasible.
It is unclear why a certain assignment to a specific cluster helps maintain a fairness guarantee?
For the sake of simplicity, we assume \(\tau _{\ell } n_{\ell } \in {\mathbb {N}}\) and ignore the floor notation.
Note that an optimal fair allocation need not be unique. Our result holds for any optimal fair allocation.
This cycle would have resulted in further reduction of cost with respect to the points in the cycle.
https://archive.ics.uci.edu/ml/datasets/Adult
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
https://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29
https://github.com/shivi98g/Fair-k-means-Clustering-via-Algorithmic-Fairness

References

Abbasi M, Bhaskara A, Venkatasubramanian S (2021) Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 504–514
Abraham SS, Padmanabhan D, Sundaram SS (2020) Fairness in clustering with multiple sensitive attributes. In: EDBT/ICDT 2020 joint conference, pp 287–298
Ahmadian S, Epasto A, Kumar R, Mahdian M (2019) Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 267–275
Ahmadian S, Epasto A, Kumar R, Mahdian M (2020) Fair correlation clustering. In: Chiappa S, Calandra R (eds) Proceedings of the twenty third international conference on artificial intelligence and statistics, volume 108 of proceedings of machine learning research, pp 4195–4205
Anderson N, Bera SK, Das S, Liu Y (2020) Distributional individual fairness in clustering. arXiv:2006.12589
Anegg G, Angelidakis H, Kurpisz A, Zenklusen R (2020) A technique for obtaining true approximations for k-center with covering constraints. In: International conference on integer programming and combinatorial optimization, pp 52–65. Springer
Backurs A, Indyk P, Onak K, Schieber B, Vakilian A, Wagner T (2019) Scalable fair clustering. In: International conference on machine learning, pp 405–413
Bandyapadhyay S, Fomin FV, Simonov K (2020) On coresets for fair clustering in metric and euclidean spaces and their applications. arXiv:2007.10137
Bandyapadhyay S, Inamdar T, Pai S, Varadarajan K (2019) A constant approximation for colorful k-center. arXiv:1907.08906
Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395
Article MathSciNet Google Scholar
Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev, 671–732
Baumann E, Rumberger JL (2018) State of the art in fair ML: from moral philosophy and legislation to fair classifiers. CoRR arXiv:1811.09539
Bera S, Chakrabarty D, Flores N, Negahbani M (2019) Fair algorithms for clustering. Adv Neural Inf Process Syst 32:4954–4965
Google Scholar
Bercea IO, Groß M, Khuller S, Kumar A, Rösner C, Schmidt DR, Schmidt M (2018) On the cost of essentially fair clusterings. arXiv:1811.10319
Böhm M, Fazzone A, Leonardi S, Schwiegelshohn C (2020) Fair clustering with multiple colors. arXiv:2002.07892
Bose A, Hamilton W (2019) Compositional fairness constraints for graph embeddings. In: International conference on machine learning, pp 715–724
Bottou L, Bengio Y (1994) Convergence properties of the k-means algorithms. Advances in neural information processing systems 7
Brubach B, Chakrabarti D, Dickerson J, Khuller S, Srinivasan A, Tsepenekas L (2020) A pairwise fair and community-preserving approach to k-center clustering. In: International conference on machine learning, pp 1178–1189
Brubach B, Chakrabarti D, Dickerson JP, Srinivasan A, Tsepenekas L (2021) Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6822–6830
Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2014) An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the twenty-sixth annual ACM-SIAM symposium on discrete algorithms, pp 737–756. SIAM
Carey AN, Wu X (2022) The fairness field guide: perspectives from social and formal sciences. arXiv:2201.05216
Chakrabarti D, Dickerson JP, Esmaeili SA, Srinivasan A, Tsepenekas L (2022) A new notion of individually fair clustering: \(\alpha \)-equitable \(k\)-center. In: International conference on artificial intelligence and statistics, pp 6387–6408
Chan TH, Guerqin A, Sozio M (2018) Fully dynamic k-center clustering. In: Proceedings of the 2018 World Wide Web conference, pp 579–587
Chen X, Fain B, Lyu L, Munagala K (2019) Proportionally fair clustering. In: International conference on machine learning, pp 1032–1041
Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access 9:130698–130720. https://doi.org/10.1109/ACCESS.2021.3114099
Article Google Scholar
Chhabra A, Singla A, Mohapatra P (2021) Fair clustering using antidote data. arXiv:2106.00600
Chierichetti F, Kumar R, Lattanzi S, Vassilvitskii S (2017) Fair clustering through fairlets. In: Proceedings of the 31st international conference on neural information processing systems, pp 5036–5044
Chikahara Y, Sakaue S, Fujino A, Kashima H (2021) Learning individually fair classifier with path-specific causal-effect constraint. In: International conference on artificial intelligence and statistics, pp 145–153
Chlamtáč E, Makarychev Y, Vakilian A (2022) Approximating fair clustering with cascaded norm objectives. In: Proceedings of the 2022 annual ACM-SIAM symposium on discrete algorithms (SODA), pp 2664–2683. SIAM
Cho J, Hwang G, Suh C (2020) A fair classifier using mutual information. In: 2020 IEEE international symposium on information theory (ISIT), pp 2521–2526
Correa J, Cristi A, Duetting P, Norouzi-Fard A (2021) Fairness and bias in online selection. In: International conference on machine learning, pp 2112–2121
Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. [Online; accessed 15-August-2021]
Davidson I, Ravi S (2020) Making existing clusterings fairer: algorithms, complexity results and insights. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3733–3740
Deepak, Abraham SS (2020). Representativity fairness in clustering. In: 12th ACM conference on web science
Deepak JM, Jose SV (2020) Fairness in unsupervised learning. In: Proceedings of the 29th ACM international conference on information & knowledge management, CIKM ’20, New York, NY, USA, pp 3511–3512. Association for Computing Machinery
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226
Elzayn H, Jabbari S, Jung C, Kearns M, Neel S, Roth A, Schutzman Z (2019) Fair algorithms for learning in allocation problems. In: Proceedings of the conference on fairness, accountability, and transparency, pp 170–179
Esmaeili S, Brubach B, Srinivasan A, Dickerson J (2021) Fair clustering under a bounded cost. Adv Neural Inf Process Syst 34:14345–14357
Google Scholar
Esmaeili S, Brubach B, Tsepenekas L, Dickerson J (2020) Probabilistic fair clustering. Adv Neural Inf Process Syst 33:12743–12755
Google Scholar
Feng Z, Kacham P, Woodruff D (2021) Dimensionality reduction for the sum-of-distances metric. In: International conference on machine learning, pp 3220–3229
Ghadiri M, Samadi S, Vempala S (2021) Socially fair k-means clustering. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 438–448
Ghadiri M, Singh M, Vempala SS (2022) Constant-factor approximation algorithms for socially fair \( k \)-clustering. arXiv preprint arXiv:2206.11210
Gong S, Liu X, Jain AK (2021) Mitigating face recognition bias via group adaptive classifier. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3414–3424
Goyal D, Jaiswal R (2021) Tight fpt approximation for socially fair clustering. arXiv:2106.06755
Han L, Xu D, Xu Y, Yang P (2022) Approximation algorithms for the individually fair k-center with outliers. J Global Optim, 1–16
Harb E, Lam HS (2020) KFC: a scalable approximation algorithm for \( k \)- center fair clustering. Adv Neural Inf Process Syst 33:14509–14519
Google Scholar
Harris DG, Pensyl T, Srinivasan A, Trinh K (2019) A lottery model for center-type problems with outliers. ACM Trans Algorithms 15(3):1–25
MathSciNet MATH Google Scholar
Huang L, Jiang S, Vishnoi N (2019) Coresets for clustering with fairness constraints. Adv Neural Inf Process Syst 32:7589–7600
Google Scholar
Jia X, Sheth K, Svensson O (2020) Fair colorful k-center clustering. In: International conference on integer programming and combinatorial optimization, pp 209–222. Springer
Jones M, Nguyen H, Nguyen T (2020) Fair k-centers via maximum matching. In: International conference on machine learning, pp 4940–4949
Julia A, Larson J, Mattu S, Kirchner L (2016) Propublica–machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. [Online; accessed 13-August-2021]
Jung C, Kannan S, Lutz N (2020) Service in your neighborhood: fairness in center location. Foundations of Responsible Computing (FORC)
Kalyanakrishnan S (2016) \(k\)-means clustering. https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-2.pdf. [Online; accessed 29-May-2022]
Kar D, Medya S, Mandal D, Silva A, Dey P, Sanyal S (2021) Feature-based individual fairness in k-clustering. arXiv:2109.04554
Kleindessner M, Awasthi P, Morgenstern J (2019) Fair k-center clustering for data summarization. In: International conference on machine learning, pp 3448–3457
Kleindessne M, Awasthi P, Morgenstern J (2020) A notion of individual fairness for clustering. arXiv:2006.04960
Kleindessner M, Samadi S, Awasthi P, Morgenstern J (2019) Guarantees for spectral clustering with fairness constraints. In: International conference on machine learning, pp 3458–3467
Krause A (2016) Clustering and \(k\)-means. https://las.inf.ethz.ch/courses/lis-s16/hw/hw4_sol.pdf. [Online; accessed 29-May-2022]
Kriegel HP, Schubert E, Zimek A (2017) The (black) art of runtime evaluation: are we comparing algorithms or implementations? Knowl Inf Syst 52(2):341–378. https://doi.org/10.1007/s10115-016-1004-2
Article Google Scholar
Le Quy T, Ntoutsi E (2021) Towards fair, explainable and actionable clustering for learning analytics. In: EDM
Le Quy T, Roy A, Iosifidis V, Zhang W, Ntoutsi E (2022) A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1452
Lee JK, Bu Y, Rajan D, Sattigeri P, Panda R, Das S, Wornell GW (2021) Fair selective classification via sufficiency. In: International conference on machine learning, pp 6076–6086
Li B, Li L, Sun A, Wang C, Wang Y (2021) Approximate group fairness for clustering. In: International conference on machine learning, pp 6381–6391
Li P, Zhao H, Liu H (2020) Deep fair clustering for visual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Liu S, Vicente LN (2021) A stochastic alternating balance \( k \)-means algorithm for fair clustering. arXiv:2105.14172
Lohaus M, Perrot M, Luxburg UV (2020) Too relaxed to be fair. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, vol. 119 of proceedings of machine learning research, pp 6360–6369
Mahabadi S, Vakilian A (2020) Individual fairness for k-clustering. In: International conference on machine learning, pp 6586–6596
Makarychev Y, Vakilian A (2021) Approximation algorithms for socially fair clustering. arXiv:2103.02512
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6). https://doi.org/10.1145/3457607
Micha E, Shah N (2020) Proportionally fair clustering revisited. In: 47th international colloquium on automata, languages, and pogramming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Negahbani M, Chakrabarty D (2021) Better algorithms for individually fair \( k \)-clustering. Adv Neural Inf Process Syst 34:13340–13351
Google Scholar
Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal ME, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E et al (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Interdiscip Rev Data Min Knowl Discov 10(3):e1356
Article Google Scholar
Padmanabhan D (2020) Whither fair clustering? In: AI for social good: Harvard CRCS Workshop
Quy TL, Roy A, Friege G, Ntoutsi E (2021) Fair-capacitated clustering. arXiv:2104.12116
Ranzato F, Urban C, Zanella M (2021) Fair training of decision tree classifiers. arXiv:2101.00909
Rösner C, Schmidt M (2018) Privacy preserving clustering with constraints. arXiv:1802.02497
Schmidt M, Schwiegelshohn C, Sohler C (2019) Fair coresets and streaming algorithms for fair k-means. In: International Workshop on Approximation and Online Algorithms, pp 232–251. Springer
Schmidt M, Wargalla J (2021) Coresets for constrained k-median and k-means clustering in low dimensional euclidean space. arXiv:2106.07319
Song H, Li P, Liu H (2021) Deep clustering based fair outlier detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &; Data Mining, KDD ’21, New York, NY, USA, pp 1481-1489. Association for Computing Machinery
Thejaswi S, Ordozgoiti B, Gionis A (2021) Diversity-aware k-median: Clustering with fair center representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 765–780. Springer
Vakilian A, Yalciner M (2022) Improved approximation algorithms for individually fair clustering. In: International Conference on Artificial Intelligence and Statistics, pp 8758–8779. PMLR
Wang B, Davidson I (2019) Towards fair deep clustering with multi-state protected variables. arXiv preprint arXiv:1901.10053
Zhang H, Davidson I (2021) Deep fair discriminative clustering. arXiv preprint arXiv:2105.14146
Zhang W, Bifet A, Zhang X, Weiss JC, Nejdl W (2021) Farf: A fair and adaptive random forests classifier, Advances in Knowledge Discovery and Data Mining, 245–256. Springer International Publishing
Ziko IM, Yuan J, Granger E, Ayed IB (2021) Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, pp 11202–11209

Download references

Funding

The research is funded by Department of Science & Technology, India under grant number SRG/2020/001138 (Recipient name- Dr. Shweta Jain). The authors would like to thank the Prime Minister Research Fellowship, the Ministry of Education, Government of India for generously funding the primary author.

Author information

Authors and Affiliations

Indian Institute of Technology, Ropar, India
Shivam Gupta & Shweta Jain
Indian Institute of Technology Hyderabad, Kandi, India
Ganesh Ghalme
Indian Institute of Technology, Palakkad, India
Narayanan C. Krishnan

Authors

Shivam Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ganesh Ghalme
View author publications
You can also search for this author in PubMed Google Scholar
Narayanan C. Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Shweta Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shivam Gupta.

Ethics declarations

Conflict of interest

No potential competing interest was reported by the authors.

Code availability

The code has been made publicly available at https://github.com/shivi98g/Fair-k-means-Clustering-via-Algorithmic-Fairness

Ethics approval

Not applicable

Consent for publication

The paper is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner. The paper properly credits the meaningful contributions of co-authors and co-researchers.

Additional information

Responsible editor: T. Calders, E. Ntoutsi, M. Pechenizkiy, B. Rosenhahn, S. Ruggieri.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, S., Ghalme, G., Krishnan, N.C. et al. Efficient algorithms for fair clustering with a new notion of fairness. Data Min Knowl Disc 37, 1959–1997 (2023). https://doi.org/10.1007/s10618-023-00928-6

Download citation

Received: 31 August 2021
Accepted: 06 February 2023
Published: 20 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10618-023-00928-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms for fair clustering with a new notion of fairness

Abstract

Access this article

Similar content being viewed by others

Making clusterings fairer by post-processing: algorithms, complexity results and experiments

Group and Individual Fairness in Clustering Algorithms

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

Availability of data and material

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethics approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient algorithms for fair clustering with a new notion of fairness

Abstract

Access this article

Similar content being viewed by others

Making clusterings fairer by post-processing: algorithms, complexity results and experiments

Group and Individual Fairness in Clustering Algorithms

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

Availability of data and material

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethics approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation