Abstract
Machine learning techniques and algorithms are employed in many application domains such as financial applications, recommendation systems, medical diagnosis systems, and self-driving cars. They play a crucial role in harnessing the power of Big Data being produced every day in our digital world. In general, building a well-performing machine learning pipeline is an iterative and complex process that requires a solid understanding of various techniques that can be used in each component of the machine learning pipeline. Feature engineering (FE) is one of the most time-consuming steps in building machine learning pipelines. It requires a deep understanding of the domain and data exploration to discover relevant hand-crafted features from raw data. In this work, we empirically evaluate the impact of integrating an automated feature extraction tool (AutoFeat) into two automated machine learning frameworks, namely, Auto-Sklearn and TPOT, on their predictive performance. Besides, we discuss the limitations of AutoFeat that need to be addressed in order to improve the predictive performance of the automated machine learning frameworks on real-world datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1
Fan, W., et al.: Generalized and heuristic-free feature construction for improved accuracy. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 629–640. SIAM (2010)
Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-sklearn 2.0: The next generation (2020). arXiv preprint arXiv:2007.04074
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Gaudel, R., Sebag, M.: Feature selection as a one-player game (2010)
He, X., Zhao, K., Chu, X.: Automl: A survey of the state-of-the-art. arXiv preprint arXiv:1908.00709 (2019)
Horn, F., Pack, R., Rieger, M.: The autofeat python library for automated feature engineering and selection. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1167, pp. 111–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43823-4_10
Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015, pp. 1–10. IEEE (2015)
Katz, G., Shin, E.C.R., Song, D.: Explorekit: automatic feature generation and selection. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 979–984. IEEE (2016)
Kaul, A., Maheshwary, S., Pudi, V.: Autolearn–automated feature generation and selection. In: 2017 IEEE International Conference on data mining (ICDM), pp. 217–226. IEEE (2017)
Khurana, U., Samulowitz, H., Turaga, D.: Feature engineering for predictive modeling using reinforcement learning. arXiv preprint arXiv:1709.07150 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49(1), 59–98 (2002)
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Olson, R.S., Moore, J.H.: Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Proceedings of the Workshop on Automatic Machine Learning (2016)
Piramuthu, S., Sikora, R.T.: Iterative feature construction for improving inductive learning algorithms. Exp. Syst. Appl. 36(2), 3401–3406 (2009)
Shawi, R.E., Maher, M., Sakr, S.: Automated machine learning: state-of-the-art and open challenges. CoRR abs/1906.02287 (2019). http://arxiv.org/abs/1906.02287
Thornton, C., Hutter, F., Hoos, H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: KDD (2012). https://doi.org/10.1145/2487575.2487629
Tonekaboni, S., Joshi, S., McCradden, M.D., Goldenberg, A.: What clinicians want: contextualizing explainable machine learning for clinical end use (2019). arXiv preprint arXiv:1905.05134
Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD 2017, pp. 1–7 (2017)
Zhang, J., Hao, J., Fogelman-Soulié, F., Wang, Z.: Automatic feature engineering by deep reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2312–2314 (2019)
Zöller, M.A., Huber, M.F.: Benchmark and survey of automated machine learning frameworks (2019)
Acknowledgement
This work is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Eldeeb, H., Amashukeli, S., El Shawi, R. (2021). An Empirical Analysis of Integrating Feature Extraction to Automated Machine Learning Pipeline. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12664. Springer, Cham. https://doi.org/10.1007/978-3-030-68799-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-68799-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68798-4
Online ISBN: 978-3-030-68799-1
eBook Packages: Computer ScienceComputer Science (R0)