Skip to main content

An Empirical Analysis of Integrating Feature Extraction to Automated Machine Learning Pipeline

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12664))

Included in the following conference series:

Abstract

Machine learning techniques and algorithms are employed in many application domains such as financial applications, recommendation systems, medical diagnosis systems, and self-driving cars. They play a crucial role in harnessing the power of Big Data being produced every day in our digital world. In general, building a well-performing machine learning pipeline is an iterative and complex process that requires a solid understanding of various techniques that can be used in each component of the machine learning pipeline. Feature engineering (FE) is one of the most time-consuming steps in building machine learning pipelines. It requires a deep understanding of the domain and data exploration to discover relevant hand-crafted features from raw data. In this work, we empirically evaluate the impact of integrating an automated feature extraction tool (AutoFeat) into two automated machine learning frameworks, namely, Auto-Sklearn and TPOT, on their predictive performance. Besides, we discuss the limitations of AutoFeat that need to be addressed in order to improve the predictive performance of the automated machine learning frameworks on real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://medium.com/kaggle-blog/grupo-bimbo-inventory-demand-winners-interview-clustifier-alex-andrey-1e3b6cec8a20.

  2. 2.

    https://github.com/DataSystemsGroupUT/auto_feature_engineering.

  3. 3.

    https://scikit-learn.org/.

  4. 4.

    https://www.openml.org/d/1107.

References

  1. Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1

    Chapter  Google Scholar 

  2. Fan, W., et al.: Generalized and heuristic-free feature construction for improved accuracy. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 629–640. SIAM (2010)

    Google Scholar 

  3. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-sklearn 2.0: The next generation (2020). arXiv preprint arXiv:2007.04074

  4. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  5. Gaudel, R., Sebag, M.: Feature selection as a one-player game (2010)

    Google Scholar 

  6. He, X., Zhao, K., Chu, X.: Automl: A survey of the state-of-the-art. arXiv preprint arXiv:1908.00709 (2019)

  7. Horn, F., Pack, R., Rieger, M.: The autofeat python library for automated feature engineering and selection. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1167, pp. 111–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43823-4_10

    Chapter  Google Scholar 

  8. Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015, pp. 1–10. IEEE (2015)

    Google Scholar 

  9. Katz, G., Shin, E.C.R., Song, D.: Explorekit: automatic feature generation and selection. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 979–984. IEEE (2016)

    Google Scholar 

  10. Kaul, A., Maheshwary, S., Pudi, V.: Autolearn–automated feature generation and selection. In: 2017 IEEE International Conference on data mining (ICDM), pp. 217–226. IEEE (2017)

    Google Scholar 

  11. Khurana, U., Samulowitz, H., Turaga, D.: Feature engineering for predictive modeling using reinforcement learning. arXiv preprint arXiv:1709.07150 (2017)

  12. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  13. Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49(1), 59–98 (2002)

    Article  Google Scholar 

  14. Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  15. Olson, R.S., Moore, J.H.: Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Proceedings of the Workshop on Automatic Machine Learning (2016)

    Google Scholar 

  16. Piramuthu, S., Sikora, R.T.: Iterative feature construction for improving inductive learning algorithms. Exp. Syst. Appl. 36(2), 3401–3406 (2009)

    Article  Google Scholar 

  17. Shawi, R.E., Maher, M., Sakr, S.: Automated machine learning: state-of-the-art and open challenges. CoRR abs/1906.02287 (2019). http://arxiv.org/abs/1906.02287

  18. Thornton, C., Hutter, F., Hoos, H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: KDD (2012). https://doi.org/10.1145/2487575.2487629

  19. Tonekaboni, S., Joshi, S., McCradden, M.D., Goldenberg, A.: What clinicians want: contextualizing explainable machine learning for clinical end use (2019). arXiv preprint arXiv:1905.05134

  20. Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD 2017, pp. 1–7 (2017)

    Google Scholar 

  21. Zhang, J., Hao, J., Fogelman-Soulié, F., Wang, Z.: Automatic feature engineering by deep reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2312–2314 (2019)

    Google Scholar 

  22. Zöller, M.A., Huber, M.F.: Benchmark and survey of automated machine learning frameworks (2019)

    Google Scholar 

Download references

Acknowledgement

This work is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Eldeeb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eldeeb, H., Amashukeli, S., El Shawi, R. (2021). An Empirical Analysis of Integrating Feature Extraction to Automated Machine Learning Pipeline. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12664. Springer, Cham. https://doi.org/10.1007/978-3-030-68799-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68799-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68798-4

  • Online ISBN: 978-3-030-68799-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics