Predicting run time of classification algorithms using meta-learning
- 173 Downloads
- 2 Citations
Abstract
Selecting a right classification algorithm is an important step for the success of any data mining project. Run time can be used to assess efficiency of a classification algorithm of interest. Experimenting with several algorithms can increase the cost of a data mining project. Using the idea of meta-learning, we present an approach to estimate the run time of a particular classification algorithm on an arbitrary dataset. Our work provides a way to evaluate the choice of an algorithm without experimental execution. We demonstrate that the use of multivariate adaptive regression splines significantly outperforms other regression models, even a ‘state-of-the-art’ algorithm such as support vector regression.
Keywords
Classification Meta-learning Regression model RuntimeNotes
Acknowledgments
We thank the Editor-in-Chief, and three anonymous reviewers whose valuable comments and suggestions help us to improve this paper significantly after several round of review.
Supplementary material
References
- 1.You Z, Lei Y, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal components analysis. BMC Bioinform 14:1CrossRefGoogle Scholar
- 2.Smith W, Foster I, Taylor V (1998) Predicting application run times using historical information. In: Feitelson DG, Rudolph L (eds) Workshop on Job Scheduling Strategies for Parallel Processing. Springer, Berlin, Heidelberg, p 122–142Google Scholar
- 3.Dinda P, O’Hallaron D (2000) Host load prediction using linear models. Clust Comput 3(4):265–280CrossRefGoogle Scholar
- 4.Lee B, Schopf J Run-time prediction of parallel applications on shared environments. In Proceedings of 2003 IEEE International Conference on Cluster Computing, p 487–491Google Scholar
- 5.Zhang Y, Sun W, Inoguchi Y (2008) Predict task running time in grid environments based on CPU load predictions. Future Gener Comput Syst 24:489–497CrossRefGoogle Scholar
- 6.Weichslgartner A, Gangadharan D, Wildermann S, Glab M, Teich J (2014) DAARM: design-time application analysis and run-time mapping for predictable execution in many-core systems. In: Hardware/Software Codesign and System Synthesis (CODES + ISSS)Google Scholar
- 7.King R, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9:289–333CrossRefGoogle Scholar
- 8.Berrer H, Paterson I, Keller J (2000) Evaluation of machine-learning algorithm ranking advisors. In: Proceedings of the PKDD-2000 Workshop on Data Mining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, CiteseerGoogle Scholar
- 9.Reif M, Shafait M, Andreas D (2011) Prediction of classifier training time including parameter optimization. In: Bach J, Edelkamp S (eds) KI 2011: Advances in artificial intelligence. 34th annual German conference on AI, Berlin, Germany, October 4-7,2011. Proceedings. Springer, Berlin, Heidelberg, p 260–271Google Scholar
- 10.Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17:83–96CrossRefMathSciNetGoogle Scholar
- 11.Thornton C, Hutter F, Hoos H, Leyton K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International conference on Knowledge Discovery and Data MiningGoogle Scholar
- 12.Ali S, Smith K (2006) On learning algorithm selection for classification. Appl Soft Comput 6:119–138CrossRefGoogle Scholar
- 13.Aha D (1992) Generalizing from case studies: a case study 1992. In: Proceeding of the 9th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco, pp 1–10Google Scholar
- 14.Smola A (1996) Regression estimation with support vector learning machines. Master’s thesis, Technische Universit at M unchenGoogle Scholar
- 15.Bellman R (1956) Dynamic programming and Lagrange multipliers. In: Proceedings of the National Academy of Sciences of the United States of America, p 767Google Scholar
- 16.Burges C (2005) Geometric methods for feature selection and dimensional reduction: a guided tour. In: Rokach L, Maimon O (eds) Data mining and knowledge discovery handbook: a complete guide for practitioners and researchers, vol 1. Kluwer Academic, p 5Google Scholar
- 17.Schölkopf B, Smola A, Muller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRefGoogle Scholar
- 18.van der Maaten (2009) Dimensionality reduction: a comparative review. Tilburg, Netherlands: Tilburg Centre for Creative Computing, Tilburg University, Technical Report: 2009-005Google Scholar
- 19.Massy F (1965) Principal components regression in exploratory statistical research. J Am Stat Assoc 60:234–256CrossRefGoogle Scholar
- 20.Jolliffe I (2002) Principal component analysis. Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, LtdGoogle Scholar
- 21.Tipping M, Micheal E, Bishop C (1999) Probabilistic principal components analysis. J R Stat Soc Ser B (Stat Methodol) 61:61–622CrossRefMathSciNetGoogle Scholar
- 22.Liberty E, Wolf F, Martinsson P, Roklin V, Tygert M, Randomized algorithms for the low-rank approximation of matrices. In: Proceedings of the National Academy of SciencesGoogle Scholar
- 23.Martinsson P, Rokhlin V, Tygert M (2011) A randomized algorithm for the decomposition of matrices. Appl Comput Harmon Anal 30:47–68CrossRefMATHMathSciNetGoogle Scholar
- 24.Hansen P (1987) The truncated SVD as a method for regularization. BIT Numer Math 27:534–553CrossRefMATHMathSciNetGoogle Scholar
- 25.Hyviirinen A, Karhunen J, Oja E (2001) Independent components analysis. Wiley, SingaporeGoogle Scholar
- 26.Hyvärinen A (2004) Independent component analysis. WileyGoogle Scholar
- 27.Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
- 28.Hennessy P (2011) Computer architecture: a quantitative approach. ElsevierGoogle Scholar
- 29.Castiello C, Castellano G, Fanelli A (2005) Meta-data: characterization of input features for meta-learning. In: International Conference on Modeling Decisions for Artificial Intelligence. Springer, Berlin, Heidelberg, pp 457–468Google Scholar
- 30.Box G, Cox D (1964) An analysis of transformations. J R Stat Soc Ser B (Methodol) 30:211–252MATHGoogle Scholar
- 31.Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRefGoogle Scholar
- 32.Stone M, Brook R (1990) Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J R Stat Soc Ser B (Methodol) 237–269Google Scholar
- 33.Hoerl A, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67CrossRefMATHGoogle Scholar
- 34.Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499CrossRefMATHMathSciNetGoogle Scholar
- 35.Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MATHMathSciNetGoogle Scholar
- 36.Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320CrossRefMATHMathSciNetGoogle Scholar
- 37.Kramer O (2013) Dimensionality reduction with unsupervised nearest neighbors. Springer, Berlin, HeidelbergGoogle Scholar
- 38.Friedman H (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67CrossRefMATHMathSciNetGoogle Scholar
- 39.Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297MATHGoogle Scholar
- 40.Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media. Springer, New YorkGoogle Scholar
- 41.Baum E (1998) On the capabilities of multilayer perceptrons. J complex 4:193–215CrossRefMATHMathSciNetGoogle Scholar
- 42.Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
- 43.Hall M, Frank E, Holmes G, Pfahringer B (2009) The WEKA data mining software: an update. In: ACM SIGKDD Explorations Newsletter, p 10–18Google Scholar
- 44.Blake C, Mers C (1998){UCI} Repository of machine learning databases, University of California, Department of Information and Computer ScienceGoogle Scholar
- 45.Pedregosa F, Varoquaux G, Grmfort A, Menel V et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MATHMathSciNetGoogle Scholar
- 46.Quinlan J (1992) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, vol 92. World Scientific, Singapore, pp 343–348Google Scholar
- 47.He YL, Liu J, Hu Y (2015) OWA operator based link prediction ensemble for social network. Expert Syst Appl 42:21–50CrossRefGoogle Scholar
- 48.Wang X, Xing H, Li Y, Hua Q, Dong C (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23:1638–1654CrossRefGoogle Scholar
- 49.Wang Z, Ashfaq R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196CrossRefMathSciNetGoogle Scholar
- 50.He Y, Wang X, Huang J (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240CrossRefGoogle Scholar