Information-theoretic feature selection with discrete $$k$$ -median clustering

Şeref, Onur; Fan, Ya-Ju; Borenstein, Elan; Chaovalitwongse, Wanpracha A.

doi:10.1007/s10479-014-1589-3

Information-theoretic feature selection with discrete $k$-median clustering

Data Mining and Analytics
Published: 09 April 2014

Volume 263, pages 93–118, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Onur Şeref¹,
Ya-Ju Fan²,
Elan Borenstein³ &
…
Wanpracha A. Chaovalitwongse⁴

623 Accesses
10 Citations
Explore all metrics

Abstract

We propose a novel computational framework that integrates information-theoretic feature selection with discrete $k$-median clustering (DKM). DKM is a domain-independent clustering algorithm which requires a pairwise distance matrix between samples that can be defined arbitrarily as input. In the proposed DKM clustering, the center of each cluster is represented by a set of samples, which induce a separate set of clusters for each feature dimension. We evaluate the relevance of each feature by the normalized mutual information (NMI) scores between the base clusters using all features and the induced clusters for that feature dimension. We propose a spectral cluster analysis (SCA) method to determine the number of clusters using the average of the relevance NMI scores. We introduce filter- and wrapper-based feature selection algorithms that produce a ranked list of features using the relevance NMI scores. We create an information gain curve and calculate the normalized area under this curve to quantify information gain and identify the contributing features. We study the properties of our information-theoretic framework for clustering, SCA and feature selection on simulated data. We demonstrate that SCA can accurately identify the number of clusters in simulated data and public benchmark datasets. We also compare the clustering and feature selection performance of our framework to other domain-dependent and domain-independent algorithms on public benchmark datasets and a real-life neural time series dataset. We show that DKM runs comparably fast with better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Weikuan Jia, Meili Sun, … Sujuan Hou

Density-Based Clustering Based on Hierarchical Density Estimates

References

Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record, 27(2), 94–105.
Aloise, D., Deshpande, A., Hansen, P., & Popat, P. (May 2009). NP-hardness of euclidean sum-of-squares clustering. Machine Learning, 75, 245–248.
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: University of California.
Bennett, K. P., & Mangasarian, O. L. (1993). Bilinear separation of two sets in n-space. Computational Optimization and Applications, 2, 207–227.
Article Google Scholar
Boutsidis C., Mahoney M. W., Drineas P. (2009). Unsupervised feature selection for the k-means clustering problem. In Conference on Neural Information Processing Systems.
Bradley, P. S., Mangasarian, O. L., & Street, W. N. (1997). Clustering via concave minimization. Advances in Neural Information Processing Systems, 9, 368–374.
Google Scholar
Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2007). On the time series k-nearest neighbor for abnormal brain activity classification. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 37(6), 1005–1016.
Chaovalitwongse, W. A., Fan, Y. J., & Sachdeo, R. C. (2008). Novel optimization models for abnormal brain activity classification. Operations Research, 56(6), 1450–1460.
Article Google Scholar
Chaovalitwongse, W. A., Jeong, Y. S., Jeong, M. K., Danish, S. F., & Wong, S. (2011). Pattern recognition approaches for identifying subcortical targets during deep brain stimulation surgery. IEEE Intelligent Systems, 26(5), 54–63.
Article Google Scholar
Charikar, M., Guhab, S., Tardos, E., & Shmoys, D. B. (August 2002). A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1), 129–149.
Chhajed, D., & Lowe, T. J. (1992). m-median and m-center problems with mutual communication: Solvable special cases. Operations Research, 40, S56–S66.
Article Google Scholar
Cord, A., Ambroise, C., & Cocquerez, J.-P. (2006). Feature selection in robust clustering based on laplace mixture. Pattern Recognition Letters, 27(6), 627–635.
Article Google Scholar
Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.
Google Scholar
Fredman, M. L., & Tarjan, R. R. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 596–615.
Article Google Scholar
Garey, M. R., & Johnson, D. S. (1979). Computers and intractibility: A guide to the theory of NP-completeness. New York: W. H. Freeman.
Google Scholar
Horel, J. A., & Misantone, L. J. (1976). Visual discrimination impaired by cutting temporal lobe connections. Science, 193(4250), 336–338.
Article Google Scholar
Iasemidis, L. D., Shiau, D.-S., Chaovalitwongse, W., Sackellares, J. C., Pardalos, P. M., Carney, P. R., et al. (2003). Adaptive epileptic seizure prediction system. IEEE Transactions on Bio-medical Engineering, 50(5), 616–627.
Article Google Scholar
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 49–86.
Article Google Scholar
Law, M. H. C., Figueiredo, M. A. T., & Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1154–1166.
Article Google Scholar
Ledberg, A., Bressler, S. L., Ding, M., Coppola, R., & Nakamura, R. (January 2007). Large-scale visuomotor integration in the cerebral cortex. Cerebral Cortex, 17(1), 44–62.
Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.
Article Google Scholar
Mangasarian O. L., Wild E. W. (2004). Feature selection in $k$-median clustering. In: SIAM International Conference on Data Mining, Workshop on Clustering High Dimensional Data and its Applications (pp. 23–28).
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Book Google Scholar
MATLAB. (2011). The MathWorks Inc. Massachusetts: Natick.
Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13, 182–196.
Article Google Scholar
Mendola, J. D., & Corkin, S. (1999). Visual discrimination and attention after bilateral temporal-lobe lesions: A case study. Neuropsychologia, 37(1), 91–102.
Article Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Google Scholar
Roth, V., & Lange, T. (2004). Feature selection in clustering problems. In Sebastian Thrun, Lawrence Saul, & Bernhard Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.
Google Scholar
Şeref, O., Fan, Y. -J., & Chaovalitwongse, W. A. (2014). Mathematical programming formulations and algorithms for discrete k-median clustering with time series data. INFORMS Journal on Computing, 26(1), 160–172.
Şeref, O., Kundakcioglu, O. E., Prokopyev, O. A., & Pardalos, P. M. (2009). Selective support vector machines. Journal of Combinatorial Optimization, 17(1), 3–20.
Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy: The principles and practice of numerical classification. San Francisco: W. H. Freeman.
Google Scholar
Wang, S., Lin, C. J., Wu, C., & Chaovalitwongse, W. (2011). Early detection of numerical typing errors using data mining techniques. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 41(6), 1199–1212.
Article Google Scholar
Wolf, L., & Shashua, A. (2005). Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6, 1855–1887.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Business Information Technology, Virginia Polytechnic Institute and State University, Blacksburg, VA , 24061, USA
Onur Şeref
Lawrence Livermore National Laboratory, Livermore, CA , 94551, USA
Ya-Ju Fan
Department of Mechanical and Aerospace Engineering, Rutgers University, Piscataway, NJ , 08854, USA
Elan Borenstein
Departments of Industrial and Systems Engineering and Radiology, University of Washington, Seattle, WA , 98195, USA
Wanpracha A. Chaovalitwongse

Authors

Onur Şeref
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Elan Borenstein
View author publications
You can also search for this author in PubMed Google Scholar
Wanpracha A. Chaovalitwongse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanpracha A. Chaovalitwongse.

Appendix

List of abbreviations used in this study are given below in alphabetical order.

DKM:: Discrete $k$-median.
DKM-LP:: Discrete $k$-median, linear programming.
DKM-R:: Discrete $k$-median, restrictive.
EEG:: Electroencephalogram.
EM:: Expectation maximization.
FSCMM:: Feature selection and clustering with mixture models.
FSKM:: Feature selecting $k$-median.
HCT:: Hierarchical cluster trees.
LFP:: Local field potential.
NAUC:: Normalized area under the curve.
NMI:: Normalized mutual information.
SCA:: Spectral cluster analysis.
UPBA:: Uncoupled bilinear program algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Şeref, O., Fan, YJ., Borenstein, E. et al. Information-theoretic feature selection with discrete $k$-median clustering. Ann Oper Res 263, 93–118 (2018). https://doi.org/10.1007/s10479-014-1589-3

Download citation

Received: 05 January 2011
Accepted: 22 March 2014
Published: 09 April 2014
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10479-014-1589-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information-theoretic feature selection with discrete \(k\)-median clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Feature dimensionality reduction: a review

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information-theoretic feature selection with discrete \(k\)-median clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Feature dimensionality reduction: a review

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation