Abstract
This paper explores the limitation of consistency-based measures in the context of feature selection. These kinds of filters are not very widespread in large-dimensionality problems. Typically, the number of selected of attributes is very small and the ability to do right predictions is a drawback. The principal contribution of this work is the introduction of a new approach within feature engineering to create new attributes after the feature selection stage. The experimentation on multi-class problems with a feature space in the order of tens of thousands shed light on that some improvements took place with the new proposal. As a final insight, some new relationships were discovered due to the combined application of feature selection and feature transformation. Additionally, a new measure for classification problems which relates the number of features and the number of classes or labels is also proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2014)
Tallón-Ballesteros, A.J., Ibiza-Granados, A.: Simplifying pattern recognition problems via a scatter search algorithm. Int. J. Comput. Methods Eng. Sci. Mech. 17(5–6), 315–321 (2016)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39 (2000)
Cho, S.-B., Tallón-Ballesteros, Antonio J.: Visual tools to lecture data analytics and engineering. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 551–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_56
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I.H., Trigg, L.: Weka-a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1269–1277. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_66
Akthar, F., Hahne, C.: Rapidminer 5 operator reference. Rapid-I GmbH 50, 65 (2012)
Dong, G., Liu, H.: Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton (2018)
Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: FV, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_54
Tallón-Ballesteros, A.J., Li, K. (eds.): Fuzzy Systems and Data Mining III: Proceedings of FSDM 2017, vol. 299. IOS Press, Amsterdam (2017)
Liu, H., Motoda, H.: Feature transformation and subset selection. IEEE Intell. Syst. 2, 26–28 (1998)
Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Merging subsets of attributes to improve a hybrid consistency-based filter: a case of study in product unit neural networks. Connect. Sci. 28(3), 242–257 (2016)
Tallón-Ballesteros, A.J., Correia, L., Xue, B.: Featuring the attributes in supervised machine learning. In: de Cos Juez, F., et al. (eds.) HAIS 2018. Lecture Notes in Computer Science, vol. 10870, pp. 350–362. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92639-1_29
Hall, M.A.: Correlation-based feature selection for machine learning (1999)
Shin, K., Kuboyama, T., Hashimoto, T., Shepard, D.: sCwc/sLcc: highly scalable feature selection algorithms. Information 8(4), 159 (2017)
Shin, K., Xu, X.M.: Consistency-based feature selection. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS (LNAI), vol. 5711, pp. 342–350. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04595-0_42
Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: Consistency measures for feature selection. J. Intell. Inf. Syst. 30(3), 273–292 (2008)
Tallón-Ballesteros, Antonio J., Correia, L., Cho, S.-B.: Stochastic and non-stochastic feature selection. In: Yin, H., et al. (eds.) IDEAL 2017. LNCS, vol. 10585, pp. 592–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68935-7_64
Acknowledgment
This work has been partially subsidised by TIN2014-55894-C2-R, TIN2017-88209-C2-R (Spanish Inter-Ministerial Commission of Science and Technology (MICYT)), P11-TIC-7528 projects (“Junta de Andalucía” (Spain)) and FEDER funds. It has also been supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant no. III-44006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T. (2018). Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11315. Springer, Cham. https://doi.org/10.1007/978-3-030-03496-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-03496-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03495-5
Online ISBN: 978-3-030-03496-2
eBook Packages: Computer ScienceComputer Science (R0)