Skip to main content
Log in

Efficient algorithms for fair clustering with a new notion of fairness

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We revisit the problem of fair clustering, first introduced by Chierichetti et al. (Fair clustering through fairlets, 2017), which requires each protected attribute to have approximately equal representation in every cluster, i.e., a Balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objectives and fairness. In this paper, we propose a new notion of fairness which we call \(\varvec{\tau }\)-ratio fairness, that strictly generalizes the Balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that a simple greedy round-robin-based algorithm achieves this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the proposed algorithm, the Fair Round-Robin Algorithm for Clustering Over-End (\({\textsc {FRAC}}_{OE}\)). We also propose a heuristic algorithm, Fair Round-Robin Algorithm for Clustering (FRAC), that applies round-robin allocation at each iteration of a vanilla clustering algorithm. Our experimental results suggest that both FRAC and \({\textsc {FRAC}}_{OE}\) outperform all the state-of-the-art algorithms and work exceptionally well even for a large number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and material

All datasets used in the experiments are publicly available on UCI repository.

Notes

  1. Throughout the paper, for simplicity, we call a k-clustering algorithm as a clustering algorithm.

  2. Otherwise, the problem is uninteresting as the balanced clustering may not be feasible.

  3. It is unclear why a certain assignment to a specific cluster helps maintain a fairness guarantee?

  4. For the sake of simplicity, we assume \(\tau _{\ell } n_{\ell } \in {\mathbb {N}}\) and ignore the floor notation.

  5. Note that an optimal fair allocation need not be unique. Our result holds for any optimal fair allocation.

  6. This cycle would have resulted in further reduction of cost with respect to the points in the cycle.

  7. https://archive.ics.uci.edu/ml/datasets/Adult

  8. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

  9. https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008

  10. https://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29

  11. https://github.com/shivi98g/Fair-k-means-Clustering-via-Algorithmic-Fairness

References

  • Abbasi M, Bhaskara A, Venkatasubramanian S (2021) Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 504–514

  • Abraham SS, Padmanabhan D, Sundaram SS (2020) Fairness in clustering with multiple sensitive attributes. In: EDBT/ICDT 2020 joint conference, pp 287–298

  • Ahmadian S, Epasto A, Kumar R, Mahdian M (2019) Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 267–275

  • Ahmadian S, Epasto A, Kumar R, Mahdian M (2020) Fair correlation clustering. In: Chiappa S, Calandra R (eds) Proceedings of the twenty third international conference on artificial intelligence and statistics, volume 108 of proceedings of machine learning research, pp 4195–4205

  • Anderson N, Bera SK, Das S, Liu Y (2020) Distributional individual fairness in clustering. arXiv:2006.12589

  • Anegg G, Angelidakis H, Kurpisz A, Zenklusen R (2020) A technique for obtaining true approximations for k-center with covering constraints. In: International conference on integer programming and combinatorial optimization, pp 52–65. Springer

  • Backurs A, Indyk P, Onak K, Schieber B, Vakilian A, Wagner T (2019) Scalable fair clustering. In: International conference on machine learning, pp 405–413

  • Bandyapadhyay S, Fomin FV, Simonov K (2020) On coresets for fair clustering in metric and euclidean spaces and their applications. arXiv:2007.10137

  • Bandyapadhyay S, Inamdar T, Pai S, Varadarajan K (2019) A constant approximation for colorful k-center. arXiv:1907.08906

  • Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395

    Article  MathSciNet  Google Scholar 

  • Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev, 671–732

  • Baumann E, Rumberger JL (2018) State of the art in fair ML: from moral philosophy and legislation to fair classifiers. CoRR arXiv:1811.09539

  • Bera S, Chakrabarty D, Flores N, Negahbani M (2019) Fair algorithms for clustering. Adv Neural Inf Process Syst 32:4954–4965

    Google Scholar 

  • Bercea IO, Groß M, Khuller S, Kumar A, Rösner C, Schmidt DR, Schmidt M (2018) On the cost of essentially fair clusterings. arXiv:1811.10319

  • Böhm M, Fazzone A, Leonardi S, Schwiegelshohn C (2020) Fair clustering with multiple colors. arXiv:2002.07892

  • Bose A, Hamilton W (2019) Compositional fairness constraints for graph embeddings. In: International conference on machine learning, pp 715–724

  • Bottou L, Bengio Y (1994) Convergence properties of the k-means algorithms. Advances in neural information processing systems 7

  • Brubach B, Chakrabarti D, Dickerson J, Khuller S, Srinivasan A, Tsepenekas L (2020) A pairwise fair and community-preserving approach to k-center clustering. In: International conference on machine learning, pp 1178–1189

  • Brubach B, Chakrabarti D, Dickerson JP, Srinivasan A, Tsepenekas L (2021) Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6822–6830

  • Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2014) An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the twenty-sixth annual ACM-SIAM symposium on discrete algorithms, pp 737–756. SIAM

  • Carey AN, Wu X (2022) The fairness field guide: perspectives from social and formal sciences. arXiv:2201.05216

  • Chakrabarti D, Dickerson JP, Esmaeili SA, Srinivasan A, Tsepenekas L (2022) A new notion of individually fair clustering: \(\alpha \)-equitable \(k\)-center. In: International conference on artificial intelligence and statistics, pp 6387–6408

  • Chan TH, Guerqin A, Sozio M (2018) Fully dynamic k-center clustering. In: Proceedings of the 2018 World Wide Web conference, pp 579–587

  • Chen X, Fain B, Lyu L, Munagala K (2019) Proportionally fair clustering. In: International conference on machine learning, pp 1032–1041

  • Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access 9:130698–130720. https://doi.org/10.1109/ACCESS.2021.3114099

    Article  Google Scholar 

  • Chhabra A, Singla A, Mohapatra P (2021) Fair clustering using antidote data. arXiv:2106.00600

  • Chierichetti F, Kumar R, Lattanzi S, Vassilvitskii S (2017) Fair clustering through fairlets. In: Proceedings of the 31st international conference on neural information processing systems, pp 5036–5044

  • Chikahara Y, Sakaue S, Fujino A, Kashima H (2021) Learning individually fair classifier with path-specific causal-effect constraint. In: International conference on artificial intelligence and statistics, pp 145–153

  • Chlamtáč E, Makarychev Y, Vakilian A (2022) Approximating fair clustering with cascaded norm objectives. In: Proceedings of the 2022 annual ACM-SIAM symposium on discrete algorithms (SODA), pp 2664–2683. SIAM

  • Cho J, Hwang G, Suh C (2020) A fair classifier using mutual information. In: 2020 IEEE international symposium on information theory (ISIT), pp 2521–2526

  • Correa J, Cristi A, Duetting P, Norouzi-Fard A (2021) Fairness and bias in online selection. In: International conference on machine learning, pp 2112–2121

  • Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. [Online; accessed 15-August-2021]

  • Davidson I, Ravi S (2020) Making existing clusterings fairer: algorithms, complexity results and insights. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3733–3740

  • Deepak, Abraham SS (2020). Representativity fairness in clustering. In: 12th ACM conference on web science

  • Deepak JM, Jose SV (2020) Fairness in unsupervised learning. In: Proceedings of the 29th ACM international conference on information & knowledge management, CIKM ’20, New York, NY, USA, pp 3511–3512. Association for Computing Machinery

  • Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226

  • Elzayn H, Jabbari S, Jung C, Kearns M, Neel S, Roth A, Schutzman Z (2019) Fair algorithms for learning in allocation problems. In: Proceedings of the conference on fairness, accountability, and transparency, pp 170–179

  • Esmaeili S, Brubach B, Srinivasan A, Dickerson J (2021) Fair clustering under a bounded cost. Adv Neural Inf Process Syst 34:14345–14357

    Google Scholar 

  • Esmaeili S, Brubach B, Tsepenekas L, Dickerson J (2020) Probabilistic fair clustering. Adv Neural Inf Process Syst 33:12743–12755

    Google Scholar 

  • Feng Z, Kacham P, Woodruff D (2021) Dimensionality reduction for the sum-of-distances metric. In: International conference on machine learning, pp 3220–3229

  • Ghadiri M, Samadi S, Vempala S (2021) Socially fair k-means clustering. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 438–448

  • Ghadiri M, Singh M, Vempala SS (2022) Constant-factor approximation algorithms for socially fair \( k \)-clustering. arXiv preprint arXiv:2206.11210

  • Gong S, Liu X, Jain AK (2021) Mitigating face recognition bias via group adaptive classifier. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3414–3424

  • Goyal D, Jaiswal R (2021) Tight fpt approximation for socially fair clustering. arXiv:2106.06755

  • Han L, Xu D, Xu Y, Yang P (2022) Approximation algorithms for the individually fair k-center with outliers. J Global Optim, 1–16

  • Harb E, Lam HS (2020) KFC: a scalable approximation algorithm for \( k \)- center fair clustering. Adv Neural Inf Process Syst 33:14509–14519

    Google Scholar 

  • Harris DG, Pensyl T, Srinivasan A, Trinh K (2019) A lottery model for center-type problems with outliers. ACM Trans Algorithms 15(3):1–25

    MathSciNet  MATH  Google Scholar 

  • Huang L, Jiang S, Vishnoi N (2019) Coresets for clustering with fairness constraints. Adv Neural Inf Process Syst 32:7589–7600

    Google Scholar 

  • Jia X, Sheth K, Svensson O (2020) Fair colorful k-center clustering. In: International conference on integer programming and combinatorial optimization, pp 209–222. Springer

  • Jones M, Nguyen H, Nguyen T (2020) Fair k-centers via maximum matching. In: International conference on machine learning, pp 4940–4949

  • Julia A, Larson J, Mattu S, Kirchner L (2016) Propublica–machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. [Online; accessed 13-August-2021]

  • Jung C, Kannan S, Lutz N (2020) Service in your neighborhood: fairness in center location. Foundations of Responsible Computing (FORC)

  • Kalyanakrishnan S (2016) \(k\)-means clustering. https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-2.pdf. [Online; accessed 29-May-2022]

  • Kar D, Medya S, Mandal D, Silva A, Dey P, Sanyal S (2021) Feature-based individual fairness in k-clustering. arXiv:2109.04554

  • Kleindessner M, Awasthi P, Morgenstern J (2019) Fair k-center clustering for data summarization. In: International conference on machine learning, pp 3448–3457

  • Kleindessne M, Awasthi P, Morgenstern J (2020) A notion of individual fairness for clustering. arXiv:2006.04960

  • Kleindessner M, Samadi S, Awasthi P, Morgenstern J (2019) Guarantees for spectral clustering with fairness constraints. In: International conference on machine learning, pp 3458–3467

  • Krause A (2016) Clustering and \(k\)-means. https://las.inf.ethz.ch/courses/lis-s16/hw/hw4_sol.pdf. [Online; accessed 29-May-2022]

  • Kriegel HP, Schubert E, Zimek A (2017) The (black) art of runtime evaluation: are we comparing algorithms or implementations? Knowl Inf Syst 52(2):341–378. https://doi.org/10.1007/s10115-016-1004-2

    Article  Google Scholar 

  • Le Quy T, Ntoutsi E (2021) Towards fair, explainable and actionable clustering for learning analytics. In: EDM

  • Le Quy T, Roy A, Iosifidis V, Zhang W, Ntoutsi E (2022) A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1452

  • Lee JK, Bu Y, Rajan D, Sattigeri P, Panda R, Das S, Wornell GW (2021) Fair selective classification via sufficiency. In: International conference on machine learning, pp 6076–6086

  • Li B, Li L, Sun A, Wang C, Wang Y (2021) Approximate group fairness for clustering. In: International conference on machine learning, pp 6381–6391

  • Li P, Zhao H, Liu H (2020) Deep fair clustering for visual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  • Liu S, Vicente LN (2021) A stochastic alternating balance \( k \)-means algorithm for fair clustering. arXiv:2105.14172

  • Lohaus M, Perrot M, Luxburg UV (2020) Too relaxed to be fair. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, vol. 119 of proceedings of machine learning research, pp 6360–6369

  • Mahabadi S, Vakilian A (2020) Individual fairness for k-clustering. In: International conference on machine learning, pp 6586–6596

  • Makarychev Y, Vakilian A (2021) Approximation algorithms for socially fair clustering. arXiv:2103.02512

  • Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6). https://doi.org/10.1145/3457607

  • Micha E, Shah N (2020) Proportionally fair clustering revisited. In: 47th international colloquium on automata, languages, and pogramming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik

  • Negahbani M, Chakrabarty D (2021) Better algorithms for individually fair \( k \)-clustering. Adv Neural Inf Process Syst 34:13340–13351

    Google Scholar 

  • Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal ME, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E et al (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Interdiscip Rev Data Min Knowl Discov 10(3):e1356

    Article  Google Scholar 

  • Padmanabhan D (2020) Whither fair clustering? In: AI for social good: Harvard CRCS Workshop

  • Quy TL, Roy A, Friege G, Ntoutsi E (2021) Fair-capacitated clustering. arXiv:2104.12116

  • Ranzato F, Urban C, Zanella M (2021) Fair training of decision tree classifiers. arXiv:2101.00909

  • Rösner C, Schmidt M (2018) Privacy preserving clustering with constraints. arXiv:1802.02497

  • Schmidt M, Schwiegelshohn C, Sohler C (2019) Fair coresets and streaming algorithms for fair k-means. In: International Workshop on Approximation and Online Algorithms, pp 232–251. Springer

  • Schmidt M, Wargalla J (2021) Coresets for constrained k-median and k-means clustering in low dimensional euclidean space. arXiv:2106.07319

  • Song H, Li P, Liu H (2021) Deep clustering based fair outlier detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &; Data Mining, KDD ’21, New York, NY, USA, pp 1481-1489. Association for Computing Machinery

  • Thejaswi S, Ordozgoiti B, Gionis A (2021) Diversity-aware k-median: Clustering with fair center representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 765–780. Springer

  • Vakilian A, Yalciner M (2022) Improved approximation algorithms for individually fair clustering. In: International Conference on Artificial Intelligence and Statistics, pp 8758–8779. PMLR

  • Wang B, Davidson I (2019) Towards fair deep clustering with multi-state protected variables. arXiv preprint arXiv:1901.10053

  • Zhang H, Davidson I (2021) Deep fair discriminative clustering. arXiv preprint arXiv:2105.14146

  • Zhang W, Bifet A, Zhang X, Weiss JC, Nejdl W (2021) Farf: A fair and adaptive random forests classifier, Advances in Knowledge Discovery and Data Mining, 245–256. Springer International Publishing

  • Ziko IM, Yuan J, Granger E, Ayed IB (2021) Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, pp 11202–11209

Download references

Funding

The research is funded by Department of Science & Technology, India under grant number SRG/2020/001138 (Recipient name- Dr. Shweta Jain). The authors would like to thank the Prime Minister Research Fellowship, the Ministry of Education, Government of India for generously funding the primary author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivam Gupta.

Ethics declarations

Conflict of interest

No potential competing interest was reported by the authors.

Code availability

The code has been made publicly available at https://github.com/shivi98g/Fair-k-means-Clustering-via-Algorithmic-Fairness

Ethics approval

Not applicable

Consent for publication

The paper is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner. The paper properly credits the meaningful contributions of co-authors and co-researchers.

Additional information

Responsible editor: T. Calders, E. Ntoutsi, M. Pechenizkiy, B. Rosenhahn, S. Ruggieri.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Ghalme, G., Krishnan, N.C. et al. Efficient algorithms for fair clustering with a new notion of fairness. Data Min Knowl Disc 37, 1959–1997 (2023). https://doi.org/10.1007/s10618-023-00928-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00928-6

Keywords

Navigation