Annals of Operations Research

, Volume 263, Issue 1–2, pp 93–118 | Cite as

Information-theoretic feature selection with discrete \(k\)-median clustering

  • Onur Şeref
  • Ya-Ju Fan
  • Elan Borenstein
  • Wanpracha A. Chaovalitwongse
Data Mining and Analytics


We propose a novel computational framework that integrates information-theoretic feature selection with discrete \(k\)-median clustering (DKM). DKM is a domain-independent clustering algorithm which requires a pairwise distance matrix between samples that can be defined arbitrarily as input. In the proposed DKM clustering, the center of each cluster is represented by a set of samples, which induce a separate set of clusters for each feature dimension. We evaluate the relevance of each feature by the normalized mutual information (NMI) scores between the base clusters using all features and the induced clusters for that feature dimension. We propose a spectral cluster analysis (SCA) method to determine the number of clusters using the average of the relevance NMI scores. We introduce filter- and wrapper-based feature selection algorithms that produce a ranked list of features using the relevance NMI scores. We create an information gain curve and calculate the normalized area under this curve to quantify information gain and identify the contributing features. We study the properties of our information-theoretic framework for clustering, SCA and feature selection on simulated data. We demonstrate that SCA can accurately identify the number of clusters in simulated data and public benchmark datasets. We also compare the clustering and feature selection performance of our framework to other domain-dependent and domain-independent algorithms on public benchmark datasets and a real-life neural time series dataset. We show that DKM runs comparably fast with better performance.


Discrete clustering Information theory Cluster analysis Feature selection 


  1. Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record, 27(2), 94–105.Google Scholar
  2. Aloise, D., Deshpande, A., Hansen, P., & Popat, P. (May 2009). NP-hardness of euclidean sum-of-squares clustering. Machine Learning, 75, 245–248.Google Scholar
  3. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: University of California.Google Scholar
  4. Bennett, K. P., & Mangasarian, O. L. (1993). Bilinear separation of two sets in n-space. Computational Optimization and Applications, 2, 207–227.CrossRefGoogle Scholar
  5. Boutsidis C., Mahoney M. W., Drineas P. (2009). Unsupervised feature selection for the k-means clustering problem. In Conference on Neural Information Processing Systems.Google Scholar
  6. Bradley, P. S., Mangasarian, O. L., & Street, W. N. (1997). Clustering via concave minimization. Advances in Neural Information Processing Systems, 9, 368–374.Google Scholar
  7. Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2007). On the time series k-nearest neighbor for abnormal brain activity classification. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 37(6), 1005–1016.Google Scholar
  8. Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2008). Novel optimization models for abnormal brain activity classification. Operations Research, 56(6), 1450–1460.CrossRefGoogle Scholar
  9. Chaovalitwongse, W. A., Jeong, Y. S., Jeong, M. K., Danish, S. F., & Wong, S. (2011). Pattern recognition approaches for identifying subcortical targets during deep brain stimulation surgery. IEEE Intelligent Systems, 26(5), 54–63.CrossRefGoogle Scholar
  10. Charikar, M., Guhab, S., Tardos, E., & Shmoys, D. B. (August 2002). A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1), 129–149.Google Scholar
  11. Chhajed, D., & Lowe, T. J. (1992). m-median and m-center problems with mutual communication: Solvable special cases. Operations Research, 40, S56–S66.CrossRefGoogle Scholar
  12. Cord, A., Ambroise, C., & Cocquerez, J.-P. (2006). Feature selection in robust clustering based on laplace mixture. Pattern Recognition Letters, 27(6), 627–635.CrossRefGoogle Scholar
  13. Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.Google Scholar
  14. Fredman, M. L., & Tarjan, R. R. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 596–615.CrossRefGoogle Scholar
  15. Garey, M. R., & Johnson, D. S. (1979). Computers and intractibility: A guide to the theory of NP-completeness. New York: W. H. Freeman.Google Scholar
  16. Horel, J. A., & Misantone, L. J. (1976). Visual discrimination impaired by cutting temporal lobe connections. Science, 193(4250), 336–338.CrossRefGoogle Scholar
  17. Iasemidis, L. D., Shiau, D.-S., Chaovalitwongse, W., Sackellares, J. C., Pardalos, P. M., Carney, P. R., et al. (2003). Adaptive epileptic seizure prediction system. IEEE Transactions on Bio-medical Engineering, 50(5), 616–627.CrossRefGoogle Scholar
  18. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 49–86.CrossRefGoogle Scholar
  19. Law, M. H. C., Figueiredo, M. A. T., & Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1154–1166.CrossRefGoogle Scholar
  20. Ledberg, A., Bressler, S. L., Ding, M., Coppola, R., & Nakamura, R. (January 2007). Large-scale visuomotor integration in the cerebral cortex. Cerebral Cortex, 17(1), 44–62.Google Scholar
  21. Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.CrossRefGoogle Scholar
  22. Mangasarian O. L., Wild E. W. (2004). Feature selection in \(k\)-median clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Applications (pp. 23–28).Google Scholar
  23. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  24. MATLAB. (2011). The MathWorks Inc. Massachusetts: Natick.Google Scholar
  25. Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13, 182–196.CrossRefGoogle Scholar
  26. Mendola, J. D., & Corkin, S. (1999). Visual discrimination and attention after bilateral temporal-lobe lesions: A case study. Neuropsychologia, 37(1), 91–102.CrossRefGoogle Scholar
  27. Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.Google Scholar
  28. Roth, V., & Lange, T. (2004). Feature selection in clustering problems. In Sebastian Thrun, Lawrence Saul, & Bernhard Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.Google Scholar
  29. Şeref, O., Fan, Y. -J., & Chaovalitwongse, W. A. (2014). Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS Journal on Computing, 26(1), 160–172.Google Scholar
  30. Şeref, O., Kundakcioglu, O. E., Prokopyev, O. A., & Pardalos, P. M. (2009). Selective support vector machines. Journal of Combinatorial Optimization, 17(1), 3–20.Google Scholar
  31. Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy: The principles and practice of numerical classification. San Francisco: W. H. Freeman.Google Scholar
  32. Wang, S., Lin, C. J., Wu, C., & Chaovalitwongse, W. (2011). Early detection of numerical typing errors using data mining techniques. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 41(6), 1199–1212.CrossRefGoogle Scholar
  33. Wolf, L., & Shashua, A. (2005). Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6, 1855–1887.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Onur Şeref
    • 1
  • Ya-Ju Fan
    • 2
  • Elan Borenstein
    • 3
  • Wanpracha A. Chaovalitwongse
    • 4
  1. 1.Department of Business Information TechnologyVirginia Polytechnic Institute and State UniversityBlacksburgUSA
  2. 2.Lawrence Livermore National LaboratoryLivermoreUSA
  3. 3.Department of Mechanical and Aerospace EngineeringRutgers UniversityPiscatawayUSA
  4. 4.Departments of Industrial and Systems Engineering and RadiologyUniversity of WashingtonSeattleUSA

Personalised recommendations