Skip to main content

Feature Selection Based on Data Clustering

  • Conference paper
  • First Online:
Intelligent Computing Theories and Methodologies (ICIC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9225))

Included in the following conference series:

Abstract

Feature selection is an important step for data mining and machine learning. It can be used to reduce the requirement of data measurement and storage, and defy the curse of dimensionality to improve the prediction performance. In this paper, we propose a feature selection method via mutual information estimation. It avoids the calculation of high-dimensional mutual information by transforming the high-dimensional feature space into one dimension through a novel supervised clustering method. Experimental results on ten benchmark data sets show that: (1) the performances of kNN, naive Bayes classifier, and C4.5 using much less features selected by the proposed method are similar or even better than those on the original data sets with the whole feature set; (2) different from most of state-of-the-art methods which require to setting the number of features to select in prior, the proposed method can automatically determine the proper size of selected feature subsets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.): Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing. Springer, Heidelberg (2006)

    Google Scholar 

  3. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)

    Book  MATH  Google Scholar 

  4. Vergara, J.R., Estevez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)

    Article  Google Scholar 

  5. Kwak, N., Choi, C.-H.: Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1667–1671 (2002)

    Article  MATH  Google Scholar 

  6. Balagani, K.S., Phoha, V.V.: On the feature selection criterion based on an approximation of multidimensional mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1342–1343 (2010)

    Article  Google Scholar 

  7. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  8. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  9. Brown, G.: A new perspective for information theoretic feature selection. In: AISTATS 2009, pp. 49–56 (2009)

    Google Scholar 

  10. Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)

    Google Scholar 

  11. Hellman, M., Raviv, J.: Probability of error, equivocation, and the chernoff bound. IEEE Trans. Inf. Theory 16(4), 368–372 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  12. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  13. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  14. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  15. Yang, H.H., Moody, J.E.: Data visualization and feature selection: new algorithms for nongaussian data. In: NIPS 1999, pp. 687–702 (1999)

    Google Scholar 

  16. Meyer, P., Schretter, C., Bontempi, G.: Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Sel. Top. Sig. Process. 2(3), 261–274 (2008)

    Article  Google Scholar 

  17. Brown, G., Pocock, A., Zhao, M.-J., Lujan, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)

    MathSciNet  MATH  Google Scholar 

  18. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: AAAI 1992, pp. 129–134 (1992)

    Google Scholar 

  19. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

This work was supported in part by National Natural Science Fund of China (61232005) and National Key Technology R&D Program of China (2012BAH06B01). Liu is partially sponsored by CCF-Tencent Open Research Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, H., Wu, Z., Zhang, X. (2015). Feature Selection Based on Data Clustering. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22180-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22179-3

  • Online ISBN: 978-3-319-22180-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics