Advertisement

Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

  • Adrian Pino AnguloEmail author
  • Kilho Shin
Article
  • 10 Downloads

Abstract

Feature selection is a central issue in machine learning and applied mathematics. Filter feature selection algorithms aim to solve the optimization problem of selecting a set of features that maximize the correlation feature-class and minimize the correlation feature-feature. Mrmr (Minimum Redundancy Maximum Relevance) and Cfs (Correlation-based Feature Selection) are one of the most well-known algorithms that can find an approximate solution to this optimization problem. However, as time passes, the availability of data becomes greater, which makes the feature selection process more challenging. In this paper, we propose two new versions of Mrmr and Cfs that output the same feature set as the original algorithms, but are considerably much faster. Our novel algorithms are based on the solution of the duplication and the redundancy problems intrinsic in the original algorithms. We applied our proposals to thirty datasets related to the field of microarray and cancer analysis. Experiments revealed that the proposed algorithms Mrmr+ and Cfs+ are on average fourteen and three times faster than the original algorithms respectively.

Keywords

Feature selection Minimum redundancy maximum relevance High-dimensional data Machine learning 

Notes

Acknowledgments

This work was partially supported by the Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Number 17H00762) from the Japan Society for the Promotion of Science.

References

  1. 1.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273CrossRefGoogle Scholar
  2. 2.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157zbMATHGoogle Scholar
  3. 3.
    Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp 306–313Google Scholar
  4. 4.
    Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49CrossRefGoogle Scholar
  5. 5.
    Hall M (2000) Correlation-based Feature Selection for Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 359–366Google Scholar
  6. 6.
    Yu L, Liu H (2003) Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, vol 2, pp 856–863Google Scholar
  7. 7.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226CrossRefGoogle Scholar
  8. 8.
    Pino Angulo A (2018) Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information 9(1):6CrossRefGoogle Scholar
  9. 9.
    Huosong X, Jian L (2011) The Research of Feature Selection of Text Classification Based on Integrated Learning Algorithm. In: 2011 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 20–22Google Scholar
  10. 10.
    Roy D, Murty KSR, Mohan CK Feature selection using Deep Neural Networks. In: 2015 International Joint Conference on Neural Networks (IJCNN) (2015), pp 1–6Google Scholar
  11. 11.
    Nguyen HT, Petrović S, Franke K (2010) A Comparison of Feature-Selection Methods for Intrusion Detection. In: Kotenko I, Skormin V (eds) Computer Network Security. Springer, Berlin, pp 242–255Google Scholar
  12. 12.
    Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1CrossRefGoogle Scholar
  13. 13.
    Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766CrossRefGoogle Scholar
  14. 14.
    Elyasigomari V, Lee D, Screen H, Shaheed M (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11CrossRefGoogle Scholar
  15. 15.
    Witten I, Frank E, Hall M, Pal C (2016) Data mining: Practical machine learning tools and techniques the morgan kaufmann series in data management systems. Elsevier Science, New YorkGoogle Scholar
  16. 16.
    Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) Openml: Networked science in machine learning. SIGKDD Explor 15(2):49CrossRefGoogle Scholar
  17. 17.
    Ong CS (2011) Towards open machine learning: Mloss.org and mldata.org. In: 2011 IEEE International Workshop on Open-source Software for Scientific Computation, pp 12–12Google Scholar
  18. 18.
    Guyon I, Gunn S, Hur AB, Dror G (2004) Result Analysis of the NIPS 2003 Feature Selection Challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04. MIT Press, Cambridge, pp 545–552Google Scholar
  19. 19.
    Wojnarski M, Stawicki S, Wojnarowski P (2010) TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments. In: Rough Sets and Current Trends in Computing (RSCTC), Lecture Notes in Artificial Intelligence (LNAI). Springer, Lecture Notes in Artificial Intelligence (LNAI), vol 6086, pp 20–29Google Scholar
  20. 20.
    Wojnarski M (2010) RSCTC’2010 Discovery, Challenge. In: Mining DNA microarray data for medical diagnosis and treatment. In: Rough Sets and Current Trends in Computing. springer, Berlin, pp 4-19Google Scholar
  21. 21.
    Hruschka ER, de Castro LN, Campello RJGB (2004) Evolutionary algorithms for clustering gene-expression data. In: 2004. ICDM ’04. Fourth IEEE International Conference on Data Mining, pp 403–406Google Scholar
  22. 22.
    Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144CrossRefGoogle Scholar
  23. 23.
    Alshamlan HM, Badr GH, AlOhali Y (2015) MRMR-ABC A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. In: Biomed research internationalGoogle Scholar
  24. 24.
    El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inf Syst 26(3):487CrossRefGoogle Scholar
  25. 25.
    Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. Springer, Berlin, pp 117–136Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Graduate School of Applied InformaticsUniversity of HyogoKobeJapan

Personalised recommendations