Skip to main content
Log in

Information-theoretic feature selection with discrete \(k\)-median clustering

  • Data Mining and Analytics
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We propose a novel computational framework that integrates information-theoretic feature selection with discrete \(k\)-median clustering (DKM). DKM is a domain-independent clustering algorithm which requires a pairwise distance matrix between samples that can be defined arbitrarily as input. In the proposed DKM clustering, the center of each cluster is represented by a set of samples, which induce a separate set of clusters for each feature dimension. We evaluate the relevance of each feature by the normalized mutual information (NMI) scores between the base clusters using all features and the induced clusters for that feature dimension. We propose a spectral cluster analysis (SCA) method to determine the number of clusters using the average of the relevance NMI scores. We introduce filter- and wrapper-based feature selection algorithms that produce a ranked list of features using the relevance NMI scores. We create an information gain curve and calculate the normalized area under this curve to quantify information gain and identify the contributing features. We study the properties of our information-theoretic framework for clustering, SCA and feature selection on simulated data. We demonstrate that SCA can accurately identify the number of clusters in simulated data and public benchmark datasets. We also compare the clustering and feature selection performance of our framework to other domain-dependent and domain-independent algorithms on public benchmark datasets and a real-life neural time series dataset. We show that DKM runs comparably fast with better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record, 27(2), 94–105.

  • Aloise, D., Deshpande, A., Hansen, P., & Popat, P. (May 2009). NP-hardness of euclidean sum-of-squares clustering. Machine Learning, 75, 245–248.

  • Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: University of California.

  • Bennett, K. P., & Mangasarian, O. L. (1993). Bilinear separation of two sets in n-space. Computational Optimization and Applications, 2, 207–227.

    Article  Google Scholar 

  • Boutsidis C., Mahoney M. W., Drineas P. (2009). Unsupervised feature selection for the k-means clustering problem. In Conference on Neural Information Processing Systems.

  • Bradley, P. S., Mangasarian, O. L., & Street, W. N. (1997). Clustering via concave minimization. Advances in Neural Information Processing Systems, 9, 368–374.

    Google Scholar 

  • Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2007). On the time series k-nearest neighbor for abnormal brain activity classification. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 37(6), 1005–1016.

  • Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2008). Novel optimization models for abnormal brain activity classification. Operations Research, 56(6), 1450–1460.

    Article  Google Scholar 

  • Chaovalitwongse, W. A., Jeong, Y. S., Jeong, M. K., Danish, S. F., & Wong, S. (2011). Pattern recognition approaches for identifying subcortical targets during deep brain stimulation surgery. IEEE Intelligent Systems, 26(5), 54–63.

    Article  Google Scholar 

  • Charikar, M., Guhab, S., Tardos, E., & Shmoys, D. B. (August 2002). A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1), 129–149.

  • Chhajed, D., & Lowe, T. J. (1992). m-median and m-center problems with mutual communication: Solvable special cases. Operations Research, 40, S56–S66.

    Article  Google Scholar 

  • Cord, A., Ambroise, C., & Cocquerez, J.-P. (2006). Feature selection in robust clustering based on laplace mixture. Pattern Recognition Letters, 27(6), 627–635.

    Article  Google Scholar 

  • Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.

    Google Scholar 

  • Fredman, M. L., & Tarjan, R. R. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 596–615.

    Article  Google Scholar 

  • Garey, M. R., & Johnson, D. S. (1979). Computers and intractibility: A guide to the theory of NP-completeness. New York: W. H. Freeman.

    Google Scholar 

  • Horel, J. A., & Misantone, L. J. (1976). Visual discrimination impaired by cutting temporal lobe connections. Science, 193(4250), 336–338.

    Article  Google Scholar 

  • Iasemidis, L. D., Shiau, D.-S., Chaovalitwongse, W., Sackellares, J. C., Pardalos, P. M., Carney, P. R., et al. (2003). Adaptive epileptic seizure prediction system. IEEE Transactions on Bio-medical Engineering, 50(5), 616–627.

    Article  Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 49–86.

    Article  Google Scholar 

  • Law, M. H. C., Figueiredo, M. A. T., & Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1154–1166.

    Article  Google Scholar 

  • Ledberg, A., Bressler, S. L., Ding, M., Coppola, R., & Nakamura, R. (January 2007). Large-scale visuomotor integration in the cerebral cortex. Cerebral Cortex, 17(1), 44–62.

  • Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.

    Article  Google Scholar 

  • Mangasarian O. L., Wild E. W. (2004). Feature selection in \(k\)-median clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Applications (pp. 23–28).

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • MATLAB. (2011). The MathWorks Inc. Massachusetts: Natick.

  • Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13, 182–196.

    Article  Google Scholar 

  • Mendola, J. D., & Corkin, S. (1999). Visual discrimination and attention after bilateral temporal-lobe lesions: A case study. Neuropsychologia, 37(1), 91–102.

    Article  Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.

    Google Scholar 

  • Roth, V., & Lange, T. (2004). Feature selection in clustering problems. In Sebastian Thrun, Lawrence Saul, & Bernhard Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.

    Google Scholar 

  • Şeref, O., Fan, Y. -J., & Chaovalitwongse, W. A. (2014). Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS Journal on Computing, 26(1), 160–172.

  • Şeref, O., Kundakcioglu, O. E., Prokopyev, O. A., & Pardalos, P. M. (2009). Selective support vector machines. Journal of Combinatorial Optimization, 17(1), 3–20.

  • Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy: The principles and practice of numerical classification. San Francisco: W. H. Freeman.

    Google Scholar 

  • Wang, S., Lin, C. J., Wu, C., & Chaovalitwongse, W. (2011). Early detection of numerical typing errors using data mining techniques. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 41(6), 1199–1212.

    Article  Google Scholar 

  • Wolf, L., & Shashua, A. (2005). Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6, 1855–1887.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanpracha A. Chaovalitwongse.

Appendix

Appendix

List of abbreviations used in this study are given below in alphabetical order.

DKM:

Discrete \(k\)-median.

DKM-LP:

Discrete \(k\)-median, linear programming.

DKM-R:

Discrete \(k\)-median, restrictive.

EEG:

Electroencephalogram.

EM:

Expectation maximization.

FSCMM:

Feature selection and clustering with mixture models.

FSKM:

Feature selecting \(k\)-median.

HCT:

Hierarchical cluster trees.

LFP:

Local field potential.

NAUC:

Normalized area under the curve.

NMI:

Normalized mutual information.

SCA:

Spectral cluster analysis.

UPBA:

Uncoupled bilinear program algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Şeref, O., Fan, YJ., Borenstein, E. et al. Information-theoretic feature selection with discrete \(k\)-median clustering. Ann Oper Res 263, 93–118 (2018). https://doi.org/10.1007/s10479-014-1589-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-014-1589-3

Keywords

Navigation