Machine Learning

, Volume 52, Issue 3, pp 217–237 | Cite as

Feature Weighting in k-Means Clustering

  • Dharmendra S. Modha
  • W. Scott Spangler


Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

clustering convexity convex k-means algorithm feature combination feature selection Fisher's discriminant analysis text mining unsupervised learning 


  1. Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proc. Int. Conf. Data Eng. (pp. 3–14).Google Scholar
  2. Ahonen-Myka, H. (1999). Finding all maximal frequent sequences in text. In D. Mladenic & M. Grobelnik (eds.), ICML-99 Workshop: Machine Learning in Text Data Analysis (pp. 11–17).Google Scholar
  3. Bay, S. D. (1999). The UCI KDD archive. Dept. Inform. and Comput. Sci., Univ. California, Irvine, CA. Available at Scholar
  4. Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Dept. Inform. and Comput. Sci., Univ. California, Irvine, CA. Available at Scholar
  5. Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245–271.Google Scholar
  6. Bradley, P.,& Fayyad, U. (1998). Refining initial points for k-means clustering. In Proc. 16th Int. Machine Learning Conf., (pp. 91–99). Bled, Slovenia.Google Scholar
  7. Caruana, R., & Freitag, D. (1994). Greedy attribute selection. In Proc. 11th Int. Machine Learning Conf. (pp. 28–36).Google Scholar
  8. Devaney, M., & Ram, A. (1997). Efficient feature selection in conceptual clustering. In Proc. 14th Int. Machine Learning Conf. (pp. 92–97). Nashville, TN.Google Scholar
  9. Dhillon, I. S.,& Modha, D. S. (2001). Concept decompositions for large sparse text data using clustering. Machine Learning,42:1/2, 143–175.Google Scholar
  10. Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.Google Scholar
  11. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 139–172.Google Scholar
  12. Flickner, M., Sawhney, H., Niblack,W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., & Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer, 28:9, 23–32.Google Scholar
  13. Frakes, W. B., & Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms. New Jersey: Prentice Hall, Englewood Cliffs.Google Scholar
  14. Hartigan, J. A. (1975). Clustering Algorithms. Wiley.Google Scholar
  15. Joachims, T. (1997). A probabilistic analysis of the Rocchio Algorithm with TFIDF for text categorization. In Proc. 14th Int. Conf. Machine Learning. (pp. 143–151).Google Scholar
  16. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proc. 11th Int. Machine Learning Conf. (pp. 121–129).Google Scholar
  17. Kendall, W. S. (1991). Convexity and the hemisphere. J. London Math. Soc. 43, 567–576.Google Scholar
  18. Kleinberg, J., Papadimitriou, C. H., & Raghavan, P. (1998). A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2/4, 311–324.Google Scholar
  19. Koller, D., & Sahami, M. (1996). Towards optimal feature selection. In Proc. 13th Int. Conf. Machine Learning. (pp. 284–292). Bari, Italy.Google Scholar
  20. Mitra, M., Buckley, C., Singhal, A., & Cardie, C. (1997). An analysis of statistical and syntactic phrases. In Proc. RIAO97: Computer-Assisted Inform. Searching on the Internet (pp. 200–214). Montreal, Canada.Google Scholar
  21. Mladeni´c, D., & Grobelnik, M. (1998). Word sequences as features in text-learning. In Proc. 7th Electrotech. Comput. Sci. Conf. ERK'98 (pp. 145–148). Ljubljana, Slovenia.Google Scholar
  22. Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proc. ACM Hypertext Conf. (pp. 143–152). San Antonio, TX.Google Scholar
  23. Nelder, J., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308.Google Scholar
  24. Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1993). Numerical Recipes in C. New York: Cambridge University Press.Google Scholar
  25. Sabin, M. J., & Gray, R. M. (1986). Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inform. Theory, 32:2, 148–155.Google Scholar
  26. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Inform. Proc. & Management (pp. 513–523).Google Scholar
  27. Salton, G., & McGill, M. J. (1983). Introduction to Modern Retrieval. McGraw-Hill Book Company.Google Scholar
  28. Salton, G., Yang, C. S., & Yu, C. T. (1975). A theory of term importance in automatic text analysis. J. Amer. Soc. Inform. Sci.,26:1, 33–44.Google Scholar
  29. Singhal, A., Buckley, C., Mitra, M., & Salton, G. (1996). Pivoted document length normalization. In Proc. ACM SIGIR (pp. 21–29).Google Scholar
  30. Smeaton, A. F., & Kelledy, F. (1998). User-chosen phrases in interactive query formulation for information retrieval. In Proc. 20th BCS-IRSG Colloquium, Springer-Verlag Electronic Workshops in Comput., Grenoble, France.Google Scholar
  31. Talavera, L. (1999). Feature selection as a preprocessing step for hierarchical clustering. In Proc. 16th Int. Machine Learning Conf. (pp. 389–397). Bled, Slovenia.Google Scholar
  32. Vaithyanathan, S., & Dom, B. (1999). Model selection in unsupervised learning with applications to document clustering. In Proc. 16th Int. Machine Learning Conf. Bled, Slovenia.Google Scholar
  33. Wettschereck, D., Aha, D.W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273–314.Google Scholar
  34. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Inform. Retrieval J., 1:1/2, 67–88.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Dharmendra S. Modha
    • 1
  • W. Scott Spangler
    • 1
  1. 1.IBM Almaden Research CenterSan JoseUSA

Personalised recommendations