Abstract
Nowadays, a big pool of different machine learning components (i.e., algorithms and tools) exists that are capable of predicting various decisions in different problem domains successfully. Unfortunately, a problem has emerged in this respect that we cannot estimate safely which component behaves well on a particular dataset without huge experimental work. Consequently, designers and developers must capture as many methods as possible during experimental work to establish which one is more appropriate for the specific problem. To solve this challenge, researchers have proposed customized classification pipelines based on a framework of various search algorithms, machine learning tools, and appropriate parameters for these algorithms that are capable of working independently of user knowledge. Until recently, the majority of these pipelines were constructed using genetic programming. In this paper, a new method is proposed for evolving classification pipelines automatically, founded on stochastic nature-inspired population-based optimization algorithms. The algorithms act as a tool for modeling customized classification pipelines consisting of the following tasks: choosing the proper preprocessing method, selecting the appropriate classification tool, and optimizing the model hyperparameters. The evaluation of the customized classification pipelines also showed potential for using the proposed method in the real world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal Charu C (2014) Data classification: algorithms and applications. Chapman and Hall/CRC, Chapman & Hall/CRC data mining and knowledge discovery series
Bishop Christopher M (2007) Pattern recognition and machine learning, 5th edn. Springer, Information Science and Statistics
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cleveland William S (2014) Data science: an action plan for expanding the technical areas of the field of statistics. Stat Anal Data Mining 7:414–417
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measure 20(1):37–46
Costa VO, Rodrigues CR (2018) Hierarchical ant colony for simultaneous classifier selection and hyperparameter optimization. In: 2018 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–8
de Sá AGC, Pinto WJGS, Oliveira LOVB, Pappa GL (2017) RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: European conference on genetic programming. Springer, pp 246–261
Dey N (2017) Advancements in applied metaheuristic computing. IGI Global
Dua D, Graff C (2017) UCI machine learning repository
Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of ICNN ’95—international conference on neural networks, vol 4, pp 1942–1948
Eiben AE, James E (2015) Introduction to evolutionary computing, 2nd edn. Springer Publishing Company, Incorporated, Smith
Fister I Jr, Yang X-S, Fister I, Brest J, Fister D (2013) A brief review of nature-inspired algorithms for optimization. Elektrotehniški vestnik 80(3):116–122
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Gijsbers P (2018) Automatic construction of machine learning pipelines. Master’s thesis, Eindhoven University of Technology
Gupta N, Khosravy M, Patel N, Senjyu T (2018) A bi-level evolutionary optimization for coordinated transmission expansion planning. IEEE Access 6:48455–48477
Gupta N, Khosravy M, Patel N, Sethi I (2018) Evolutionary optimization based on biological evolution in plants. Proc Comput Sci 126:146–155
Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Sec 29(8):875–890
Holzinger A, Dehmer M, Jurisica I (2014) Knowledge discovery and interactive data mining in bioinformatics—state-of-the-art, future challenges and research directions. BMC Bioinf 15(6):I1
Hutter F, Kotthoff L, Vanschoren J (eds) (2019) Automatic machine learning: methods, systems, challenges. Series on challenges in machine learning. Springer
Kang KC, Cohen SG, Hess JA, Novak WE, Peterson AS (1990) Feature-oriented domain analysis (FODA) feasibility study. Technical report CMU/SEI-90-TR-021, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA
Koza John R (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mohamed WNHW, Salleh MNM, Omar AH (2012) A comparative study of reduced error pruning method in decision tree algorithms. In: 2012 IEEE international conference on control system, computing and engineering. IEEE, pp 392–397
Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016) Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the genetic and evolutionary computation conference 2016, GECCO 2016. ACM, New York, NY, pp 485–492
Olson RS, Moore JH (2016) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on automatic machine learning, pp 66–74
Rosenblatt F (1961) Principles of neurodynamics. Perceptrons and the theory of brain mechanisms. Cornell Aeronautical Lab Inc, Buffalo, NY
Schapire RE (1999) A brief introduction to boosting. In: Proceedings of the 16th international joint conference on artificial intelligence, IJCAI ’99, vol 2. Morgan Kaufmann Publishers Inc, San Francisco, CA, pp 1401–1406
Schuster Stephan C (2007) Next-generation sequencing transforms today’s biology. Nat Methods 5(1):16
Soda P, Iannello G (2010) Decomposition methods and learning approaches for imbalanced dataset: an experimental integration. In: 2010 20th international conference on pattern recognition. IEEE, pp 3117–3120
Stehman Stephen V (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Opt 11(4):341–359
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Xavier-Júnior JC, Freitas AA, Feitosa-Neto A, Ludermir TB (2018) A novel evolutionary algorithm for automated machine learning focusing on classifier ensembles. In: 2018 7th Brazilian conference on intelligent systems (BRACIS). IEEE, pp 462–467
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 65–74
Zhu J, Chen N, Xing EP (2011) Infinite latent SVM for classification and multi-task learning. In: Advances in neural information processing systems, pp 1620–1628
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Fister, I., Zorman, M., Fister, D., Fister, I. (2020). Continuous Optimizers for Automatic Design and Evaluation of Classification Pipelines. In: Khosravy, M., Gupta, N., Patel, N., Senjyu, T. (eds) Frontier Applications of Nature Inspired Computation. Springer Tracts in Nature-Inspired Computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-2133-1_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-2133-1_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2132-4
Online ISBN: 978-981-15-2133-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)