Skip to main content

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2018 (IDEAL 2018)

Abstract

This paper explores the limitation of consistency-based measures in the context of feature selection. These kinds of filters are not very widespread in large-dimensionality problems. Typically, the number of selected of attributes is very small and the ability to do right predictions is a drawback. The principal contribution of this work is the introduction of a new approach within feature engineering to create new attributes after the feature selection stage. The experimentation on multi-class problems with a feature space in the order of tens of thousands shed light on that some improvements took place with the new proposal. As a final insight, some new relationships were discovered due to the combined application of feature selection and feature transformation. Additionally, a new measure for classification problems which relates the number of features and the number of classes or labels is also proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2014)

    MATH  Google Scholar 

  2. Tallón-Ballesteros, A.J., Ibiza-Granados, A.: Simplifying pattern recognition problems via a scatter search algorithm. Int. J. Comput. Methods Eng. Sci. Mech. 17(5–6), 315–321 (2016)

    Article  MathSciNet  Google Scholar 

  3. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39 (2000)

    Google Scholar 

  4. Cho, S.-B., Tallón-Ballesteros, Antonio J.: Visual tools to lecture data analytics and engineering. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 551–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_56

    Chapter  Google Scholar 

  5. Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I.H., Trigg, L.: Weka-a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1269–1277. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_66

    Chapter  Google Scholar 

  6. Akthar, F., Hahne, C.: Rapidminer 5 operator reference. Rapid-I GmbH 50, 65 (2012)

    Google Scholar 

  7. Dong, G., Liu, H.: Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton (2018)

    Book  Google Scholar 

  8. Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: FV, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_54

    Chapter  Google Scholar 

  9. Tallón-Ballesteros, A.J., Li, K. (eds.): Fuzzy Systems and Data Mining III: Proceedings of FSDM 2017, vol. 299. IOS Press, Amsterdam (2017)

    Google Scholar 

  10. Liu, H., Motoda, H.: Feature transformation and subset selection. IEEE Intell. Syst. 2, 26–28 (1998)

    Google Scholar 

  11. Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Merging subsets of attributes to improve a hybrid consistency-based filter: a case of study in product unit neural networks. Connect. Sci. 28(3), 242–257 (2016)

    Article  Google Scholar 

  12. Tallón-Ballesteros, A.J., Correia, L., Xue, B.: Featuring the attributes in supervised machine learning. In: de Cos Juez, F., et al. (eds.) HAIS 2018. Lecture Notes in Computer Science, vol. 10870, pp. 350–362. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92639-1_29

    Chapter  Google Scholar 

  13. Hall, M.A.: Correlation-based feature selection for machine learning (1999)

    Google Scholar 

  14. Shin, K., Kuboyama, T., Hashimoto, T., Shepard, D.: sCwc/sLcc: highly scalable feature selection algorithms. Information 8(4), 159 (2017)

    Article  Google Scholar 

  15. Shin, K., Xu, X.M.: Consistency-based feature selection. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS (LNAI), vol. 5711, pp. 342–350. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04595-0_42

    Chapter  Google Scholar 

  16. Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: Consistency measures for feature selection. J. Intell. Inf. Syst. 30(3), 273–292 (2008)

    Article  Google Scholar 

  17. Tallón-Ballesteros, Antonio J., Correia, L., Cho, S.-B.: Stochastic and non-stochastic feature selection. In: Yin, H., et al. (eds.) IDEAL 2017. LNCS, vol. 10585, pp. 592–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68935-7_64

    Chapter  Google Scholar 

Download references

Acknowledgment

This work has been partially subsidised by TIN2014-55894-C2-R, TIN2017-88209-C2-R (Spanish Inter-Ministerial Commission of Science and Technology (MICYT)), P11-TIC-7528 projects (“Junta de Andalucía” (Spain)) and FEDER funds. It has also been supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant no. III-44006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T. (2018). Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11315. Springer, Cham. https://doi.org/10.1007/978-3-030-03496-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03496-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03495-5

  • Online ISBN: 978-3-030-03496-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics