Approximation Algorithms for K-Modes Clustering

  • Zengyou He
  • Shengchun Deng
  • Xiaofei Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4114)


In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and k-median, i.e., the optimum of k-median is at most the twice the optimum of k-modes for the same categorical data clustering problem. Based on this observation, we derive a deterministic algorithm that achieves an approximation factor of 2. Furthermore, we prove that the distance measure in k-modes defines a metric. Hence, we are able to extend existing approximation algorithms for metric k-median to k-modes. Empirical results verify the superiority of our method.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Huang, Z.: Extensions To The K-means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2, 283–304 (1998)CrossRefGoogle Scholar
  2. 2.
    He, Z., Deng, S., Xu, X.: Improving K-modes Algorithm Considering Frequencies of Attribute Values in Mode. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 157–162. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Huang, Z., Ng, M.K.: A Fuzzy K-modes Algorithm for Clustering Categorical Data. IEEE Transactions on Fuzzy Systems 7(4), 446–452 (1999)CrossRefGoogle Scholar
  4. 4.
    Ng, M.K., Wong, J.C.: Clustering Categorical Data Sets Using Tabu Search Techniques. Pattern Recognition 35(12), 2783–2790 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Gan, G., Yang, Z., Wu, J.: A Genetic k-Modes Algorithm for Clustering Categorical Data. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 195–202. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Mettu, R.R., Plaxton, C.G.: Optimal Time Bounds for Approximate Clustering. Machine Learning 56(1-3), 35–60 (2004)zbMATHCrossRefGoogle Scholar
  7. 7.
    Meyerson, A., O’Callaghan, L., Plotkin, S.A.: A k-Median Algorithm with Running Time Independent of Data Size. Machine Learning 56(1-3), 61–87 (2004)zbMATHCrossRefGoogle Scholar
  8. 8.
    Merz, C.J., Merphy, P.: UCI Repository of Machine Learning Databases (1996),
  9. 9.
    Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., Pandit, V.: Local Search Heuristics for k-Median and Facility Location Problems. SIAM Journal on Computing 33(3), 544–562 (2004)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zengyou He
    • 1
  • Shengchun Deng
    • 1
  • Xiaofei Xu
    • 1
  1. 1.Department of Computer Science and EngineeringHarbin Institute of TechnologyChina

Personalised recommendations