Predicting run time of classification algorithms using meta-learning

Original Article

Abstract

Selecting a right classification algorithm is an important step for the success of any data mining project. Run time can be used to assess efficiency of a classification algorithm of interest. Experimenting with several algorithms can increase the cost of a data mining project. Using the idea of meta-learning, we present an approach to estimate the run time of a particular classification algorithm on an arbitrary dataset. Our work provides a way to evaluate the choice of an algorithm without experimental execution. We demonstrate that the use of multivariate adaptive regression splines significantly outperforms other regression models, even a ‘state-of-the-art’ algorithm such as support vector regression.

Keywords

Classification Meta-learning Regression model Runtime 

Notes

Acknowledgments

We thank the Editor-in-Chief, and three anonymous reviewers whose valuable comments and suggestions help us to improve this paper significantly after several round of review.

Supplementary material

13042_2016_571_MOESM1_ESM.pdf (122 kb)
Supplementary material 1 (PDF 122 kb)

References

  1. 1.
    You Z, Lei Y, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal components analysis. BMC Bioinform 14:1CrossRefGoogle Scholar
  2. 2.
    Smith W, Foster I, Taylor V (1998) Predicting application run times using historical information. In: Feitelson DG, Rudolph L (eds) Workshop on Job Scheduling Strategies for Parallel Processing. Springer, Berlin, Heidelberg, p 122–142Google Scholar
  3. 3.
    Dinda P, O’Hallaron D (2000) Host load prediction using linear models. Clust Comput 3(4):265–280CrossRefGoogle Scholar
  4. 4.
    Lee B, Schopf J Run-time prediction of parallel applications on shared environments. In Proceedings of 2003 IEEE International Conference on Cluster Computing, p 487–491Google Scholar
  5. 5.
    Zhang Y, Sun W, Inoguchi Y (2008) Predict task running time in grid environments based on CPU load predictions. Future Gener Comput Syst 24:489–497CrossRefGoogle Scholar
  6. 6.
    Weichslgartner A, Gangadharan D, Wildermann S, Glab M, Teich J (2014) DAARM: design-time application analysis and run-time mapping for predictable execution in many-core systems. In: Hardware/Software Codesign and System Synthesis (CODES + ISSS)Google Scholar
  7. 7.
    King R, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9:289–333CrossRefGoogle Scholar
  8. 8.
    Berrer H, Paterson I, Keller J (2000) Evaluation of machine-learning algorithm ranking advisors. In: Proceedings of the PKDD-2000 Workshop on Data Mining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, CiteseerGoogle Scholar
  9. 9.
    Reif M, Shafait M, Andreas D (2011) Prediction of classifier training time including parameter optimization. In: Bach J, Edelkamp S (eds) KI 2011: Advances in artificial intelligence. 34th annual German conference on AI, Berlin, Germany, October 4-7,2011. Proceedings. Springer, Berlin, Heidelberg, p 260–271Google Scholar
  10. 10.
    Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17:83–96CrossRefMathSciNetGoogle Scholar
  11. 11.
    Thornton C, Hutter F, Hoos H, Leyton K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International conference on Knowledge Discovery and Data MiningGoogle Scholar
  12. 12.
    Ali S, Smith K (2006) On learning algorithm selection for classification. Appl Soft Comput 6:119–138CrossRefGoogle Scholar
  13. 13.
    Aha D (1992) Generalizing from case studies: a case study 1992. In: Proceeding of the 9th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco, pp 1–10Google Scholar
  14. 14.
    Smola A (1996) Regression estimation with support vector learning machines. Master’s thesis, Technische Universit at M unchenGoogle Scholar
  15. 15.
    Bellman R (1956) Dynamic programming and Lagrange multipliers. In: Proceedings of the National Academy of Sciences of the United States of America, p 767Google Scholar
  16. 16.
    Burges C (2005) Geometric methods for feature selection and dimensional reduction: a guided tour. In: Rokach L, Maimon O (eds) Data mining and knowledge discovery handbook: a complete guide for practitioners and researchers, vol 1. Kluwer Academic, p 5Google Scholar
  17. 17.
    Schölkopf B, Smola A, Muller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRefGoogle Scholar
  18. 18.
    van der Maaten (2009) Dimensionality reduction: a comparative review. Tilburg, Netherlands: Tilburg Centre for Creative Computing, Tilburg University, Technical Report: 2009-005Google Scholar
  19. 19.
    Massy F (1965) Principal components regression in exploratory statistical research. J Am Stat Assoc 60:234–256CrossRefGoogle Scholar
  20. 20.
    Jolliffe I (2002) Principal component analysis. Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, LtdGoogle Scholar
  21. 21.
    Tipping M, Micheal E, Bishop C (1999) Probabilistic principal components analysis. J R Stat Soc Ser B (Stat Methodol) 61:61–622CrossRefMathSciNetGoogle Scholar
  22. 22.
    Liberty E, Wolf F, Martinsson P, Roklin V, Tygert M, Randomized algorithms for the low-rank approximation of matrices. In: Proceedings of the National Academy of SciencesGoogle Scholar
  23. 23.
    Martinsson P, Rokhlin V, Tygert M (2011) A randomized algorithm for the decomposition of matrices. Appl Comput Harmon Anal 30:47–68CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Hansen P (1987) The truncated SVD as a method for regularization. BIT Numer Math 27:534–553CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Hyviirinen A, Karhunen J, Oja E (2001) Independent components analysis. Wiley, SingaporeGoogle Scholar
  26. 26.
    Hyvärinen A (2004) Independent component analysis. WileyGoogle Scholar
  27. 27.
    Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  28. 28.
    Hennessy P (2011) Computer architecture: a quantitative approach. ElsevierGoogle Scholar
  29. 29.
    Castiello C, Castellano G, Fanelli A (2005) Meta-data: characterization of input features for meta-learning. In: International Conference on Modeling Decisions for Artificial Intelligence. Springer, Berlin, Heidelberg, pp 457–468Google Scholar
  30. 30.
    Box G, Cox D (1964) An analysis of transformations. J R Stat Soc Ser B (Methodol) 30:211–252MATHGoogle Scholar
  31. 31.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRefGoogle Scholar
  32. 32.
    Stone M, Brook R (1990) Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J R Stat Soc Ser B (Methodol) 237–269Google Scholar
  33. 33.
    Hoerl A, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67CrossRefMATHGoogle Scholar
  34. 34.
    Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499CrossRefMATHMathSciNetGoogle Scholar
  35. 35.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MATHMathSciNetGoogle Scholar
  36. 36.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320CrossRefMATHMathSciNetGoogle Scholar
  37. 37.
    Kramer O (2013) Dimensionality reduction with unsupervised nearest neighbors. Springer, Berlin, HeidelbergGoogle Scholar
  38. 38.
    Friedman H (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297MATHGoogle Scholar
  40. 40.
    Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media. Springer, New YorkGoogle Scholar
  41. 41.
    Baum E (1998) On the capabilities of multilayer perceptrons. J complex 4:193–215CrossRefMATHMathSciNetGoogle Scholar
  42. 42.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
  43. 43.
    Hall M, Frank E, Holmes G, Pfahringer B (2009) The WEKA data mining software: an update. In: ACM SIGKDD Explorations Newsletter, p 10–18Google Scholar
  44. 44.
    Blake C, Mers C (1998){UCI} Repository of machine learning databases, University of California, Department of Information and Computer ScienceGoogle Scholar
  45. 45.
    Pedregosa F, Varoquaux G, Grmfort A, Menel V et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MATHMathSciNetGoogle Scholar
  46. 46.
    Quinlan J (1992) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, vol 92. World Scientific, Singapore, pp 343–348Google Scholar
  47. 47.
    He YL, Liu J, Hu Y (2015) OWA operator based link prediction ensemble for social network. Expert Syst Appl 42:21–50CrossRefGoogle Scholar
  48. 48.
    Wang X, Xing H, Li Y, Hua Q, Dong C (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23:1638–1654CrossRefGoogle Scholar
  49. 49.
    Wang Z, Ashfaq R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196CrossRefMathSciNetGoogle Scholar
  50. 50.
    He Y, Wang X, Huang J (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversity of ColoradoColorado SpringsUSA

Personalised recommendations