Skip to main content

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques (IScIDE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9243))

  • 2861 Accesses

Abstract

Feature selection technique has shown its power in analyzing the high dimensional data and building the efficient learning models. This study proposes a feature selection method based on feature grouping and genetic algorithm (FS-FGGA) to get a discriminative feature subset and reduce the irrelevant and redundancy data. Firstly, it eliminates the irrelevant features using the symmetrical uncertainty between features and class labels. Then, it groups the features by Approximate Markov blanket. Finally, genetic algorithm is applied to search the optimal feature subset from the different groups. Experiments on the eight public datasets demonstrate the effectiveness and superiority of FS-FGGA in comparison with SVM-RFE and ECBGS in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tang, Y.C., Zhang, Y.Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 365–381 (2007)

    Article  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA (1992)

    Google Scholar 

  4. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. Mach. Learn. 784, 171–182 (1994)

    Google Scholar 

  5. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  6. Xu, J.C., Xu, T.H., Sun, L.: An efficient gene selection technique based on fuzzy C-means and neighborhood rough set. Appl. Math. Inf. Sci. 8, 3101–3110 (2014)

    Article  Google Scholar 

  7. Yassi, M., Moattar, M.H.: Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification. Biochem. Biophys. Res. Commun. 446, 850–856 (2014)

    Article  Google Scholar 

  8. Liu, X.M., Tang, J.S.: Mass classification in mammograms using selected geometry and texture features, and a new SVM-based feature selection method. IEEE Syst. J. 8, 910–920 (2014)

    Article  Google Scholar 

  9. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  10. Shen, L., Tan, E.C.: Dimension reduction based penalized logistic regression for cancer classification using micro-array data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2, 166–175 (2005)

    Article  Google Scholar 

  11. Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2007)

    Article  Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  13. Arunachalam, J., Kanagasabai, V., Gautham, N.: Protein structure prediction using mutually orthogonal Latin squares and a genetic algorithm. Biochem. Biophys. Res. Commun. 342, 424–433 (2006)

    Article  Google Scholar 

  14. Ram, R., Chetty, M.: A Markov-Blanked-Based model for gene regulatory network inference. IEEE-ACM Trans. Comput. Biol. Bioinform. 8, 353–367 (2011)

    Article  Google Scholar 

  15. Abbasnia, R., Shayanfar, M., Khodam, A.: Reliability-based design optimization of structural systems using a hybrid genetic algorithm. Struct. Eng. Mech. 52, 1099–1120 (2014)

    Article  Google Scholar 

  16. Maji, P., Garai, P.: On fuzzy-rough attribute selection: criteria of max-dependency, max-relevance, min-redundancy, and max-significance. Applied Soft Computing. 13, 3968–3980 (2013)

    Article  Google Scholar 

  17. Xie, Z.X., Hu, Q.H., Yu, D.R.: Improved feature selection algorithm based on SVM and correlation. Adv. Neyral Netw. 3971, 1373–1380 (2006)

    Google Scholar 

  18. Mundra, P.A., Rajapakse, M.J.: SVM-RFE with mRMR filter for gene selection. IEEE transactions on nano bioscience. 9(1), 31–37 (2010)

    Article  Google Scholar 

  19. Sun, X., Liu, Y.H., Xu, M.T., Chen, H.L., Han, J.W., Wang, K.H.: Feature selection using dynamic weights for classification. Knowl.-Based Syst. 37, 541–549 (2013)

    Article  Google Scholar 

  20. Shen, L.L., Zhu, Z.X., Jia, S.: Discriminative Gabor feature selection for hyper spectral image classification. IEEE Geosci. Remote Sens. Lett. 10, 29–33 (2013)

    Article  Google Scholar 

  21. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  22. Liu, H.W., Liu, L., Zhang, H.J.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43, 81–87 (2010)

    Article  Google Scholar 

  23. Piao, Y.J., Piao, M.H., Park, K.J., Ryu, K.H.: An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28, 3306–3315 (2012)

    Article  Google Scholar 

  24. Zhang, M., Zhang, L., Zou, J.F., Yan, C., Xiao, H., Liu, Q.: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 25, 1662–1668 (2009)

    Article  Google Scholar 

  25. Bennasar, M., Setchi, R., Hicks, Y.: Unsupervised discretization method based on adjustable intervals. In: 16th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, vol. 243, pp. 79–87, San Sebastian (2012)

    Google Scholar 

  26. Orhan, U., Hekim, M., Ozer, M.: Epileptic seizure detection using artificial neural network and a new feature extraction approach based on equal width discretization. J. Fac. Eng. Archit. Gazi Univ. 26, 575–580 (2011)

    Google Scholar 

Download references

Acknowledgments

The study has been supported by the State Key Science & Technology Project for Infectious Diseases (2012ZX10002011), the Sino-German Center for Research Promotion (GZ 753), National Natural Science Foundation of China (21375011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lin, X., Wang, X., Xiao, N., Huang, X., Wang, J. (2015). A Feature Selection Method Based on Feature Grouping and Genetic Algorithm. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23862-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23861-6

  • Online ISBN: 978-3-319-23862-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics