Skip to main content
Log in

Predicting run time of classification algorithms using meta-learning

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Selecting a right classification algorithm is an important step for the success of any data mining project. Run time can be used to assess efficiency of a classification algorithm of interest. Experimenting with several algorithms can increase the cost of a data mining project. Using the idea of meta-learning, we present an approach to estimate the run time of a particular classification algorithm on an arbitrary dataset. Our work provides a way to evaluate the choice of an algorithm without experimental execution. We demonstrate that the use of multivariate adaptive regression splines significantly outperforms other regression models, even a ‘state-of-the-art’ algorithm such as support vector regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. You Z, Lei Y, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal components analysis. BMC Bioinform 14:1

    Article  Google Scholar 

  2. Smith W, Foster I, Taylor V (1998) Predicting application run times using historical information. In: Feitelson DG, Rudolph L (eds) Workshop on Job Scheduling Strategies for Parallel Processing. Springer, Berlin, Heidelberg, p 122–142

  3. Dinda P, O’Hallaron D (2000) Host load prediction using linear models. Clust Comput 3(4):265–280

    Article  Google Scholar 

  4. Lee B, Schopf J Run-time prediction of parallel applications on shared environments. In Proceedings of 2003 IEEE International Conference on Cluster Computing, p 487–491

  5. Zhang Y, Sun W, Inoguchi Y (2008) Predict task running time in grid environments based on CPU load predictions. Future Gener Comput Syst 24:489–497

    Article  Google Scholar 

  6. Weichslgartner A, Gangadharan D, Wildermann S, Glab M, Teich J (2014) DAARM: design-time application analysis and run-time mapping for predictable execution in many-core systems. In: Hardware/Software Codesign and System Synthesis (CODES + ISSS)

  7. King R, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9:289–333

    Article  Google Scholar 

  8. Berrer H, Paterson I, Keller J (2000) Evaluation of machine-learning algorithm ranking advisors. In: Proceedings of the PKDD-2000 Workshop on Data Mining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, Citeseer

  9. Reif M, Shafait M, Andreas D (2011) Prediction of classifier training time including parameter optimization. In: Bach J, Edelkamp S (eds) KI 2011: Advances in artificial intelligence. 34th annual German conference on AI, Berlin, Germany, October 4-7,2011. Proceedings. Springer, Berlin, Heidelberg, p 260–271

  10. Reif M, Shafait F, Goldstein M, Breuel T, Dengel A (2014) Automatic classifier selection for non-experts. Pattern Anal Appl 17:83–96

    Article  MathSciNet  Google Scholar 

  11. Thornton C, Hutter F, Hoos H, Leyton K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International conference on Knowledge Discovery and Data Mining

  12. Ali S, Smith K (2006) On learning algorithm selection for classification. Appl Soft Comput 6:119–138

    Article  Google Scholar 

  13. Aha D (1992) Generalizing from case studies: a case study 1992. In: Proceeding of the 9th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco, pp 1–10

  14. Smola A (1996) Regression estimation with support vector learning machines. Master’s thesis, Technische Universit at M unchen

  15. Bellman R (1956) Dynamic programming and Lagrange multipliers. In: Proceedings of the National Academy of Sciences of the United States of America, p 767

  16. Burges C (2005) Geometric methods for feature selection and dimensional reduction: a guided tour. In: Rokach L, Maimon O (eds) Data mining and knowledge discovery handbook: a complete guide for practitioners and researchers, vol 1. Kluwer Academic, p 5

  17. Schölkopf B, Smola A, Muller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319

    Article  Google Scholar 

  18. van der Maaten (2009) Dimensionality reduction: a comparative review. Tilburg, Netherlands: Tilburg Centre for Creative Computing, Tilburg University, Technical Report: 2009-005

  19. Massy F (1965) Principal components regression in exploratory statistical research. J Am Stat Assoc 60:234–256

    Article  Google Scholar 

  20. Jolliffe I (2002) Principal component analysis. Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd

  21. Tipping M, Micheal E, Bishop C (1999) Probabilistic principal components analysis. J R Stat Soc Ser B (Stat Methodol) 61:61–622

    Article  MathSciNet  Google Scholar 

  22. Liberty E, Wolf F, Martinsson P, Roklin V, Tygert M, Randomized algorithms for the low-rank approximation of matrices. In: Proceedings of the National Academy of Sciences

  23. Martinsson P, Rokhlin V, Tygert M (2011) A randomized algorithm for the decomposition of matrices. Appl Comput Harmon Anal 30:47–68

    Article  MATH  MathSciNet  Google Scholar 

  24. Hansen P (1987) The truncated SVD as a method for regularization. BIT Numer Math 27:534–553

    Article  MATH  MathSciNet  Google Scholar 

  25. Hyviirinen A, Karhunen J, Oja E (2001) Independent components analysis. Wiley, Singapore

  26. Hyvärinen A (2004) Independent component analysis. Wiley

  27. Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  28. Hennessy P (2011) Computer architecture: a quantitative approach. Elsevier

  29. Castiello C, Castellano G, Fanelli A (2005) Meta-data: characterization of input features for meta-learning. In: International Conference on Modeling Decisions for Artificial Intelligence. Springer, Berlin, Heidelberg, pp 457–468

  30. Box G, Cox D (1964) An analysis of transformations. J R Stat Soc Ser B (Methodol) 30:211–252

    MATH  Google Scholar 

  31. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  32. Stone M, Brook R (1990) Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J R Stat Soc Ser B (Methodol) 237–269

  33. Hoerl A, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  MATH  Google Scholar 

  34. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MATH  MathSciNet  Google Scholar 

  35. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MATH  MathSciNet  Google Scholar 

  36. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320

    Article  MATH  MathSciNet  Google Scholar 

  37. Kramer O (2013) Dimensionality reduction with unsupervised nearest neighbors. Springer, Berlin, Heidelberg

  38. Friedman H (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67

    Article  MATH  MathSciNet  Google Scholar 

  39. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  40. Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media. Springer, New York

  41. Baum E (1998) On the capabilities of multilayer perceptrons. J complex 4:193–215

    Article  MATH  MathSciNet  Google Scholar 

  42. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  43. Hall M, Frank E, Holmes G, Pfahringer B (2009) The WEKA data mining software: an update. In: ACM SIGKDD Explorations Newsletter, p 10–18

  44. Blake C, Mers C (1998){UCI} Repository of machine learning databases, University of California, Department of Information and Computer Science

  45. Pedregosa F, Varoquaux G, Grmfort A, Menel V et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MATH  MathSciNet  Google Scholar 

  46. Quinlan J (1992) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, vol 92. World Scientific, Singapore, pp 343–348

  47. He YL, Liu J, Hu Y (2015) OWA operator based link prediction ensemble for social network. Expert Syst Appl 42:21–50

    Article  Google Scholar 

  48. Wang X, Xing H, Li Y, Hua Q, Dong C (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23:1638–1654

    Article  Google Scholar 

  49. Wang Z, Ashfaq R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196

    Article  MathSciNet  Google Scholar 

  50. He Y, Wang X, Huang J (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240

    Article  Google Scholar 

Download references

Acknowledgments

We thank the Editor-in-Chief, and three anonymous reviewers whose valuable comments and suggestions help us to improve this paper significantly after several round of review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tri Doan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 122 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doan, T., Kalita, J. Predicting run time of classification algorithms using meta-learning. Int. J. Mach. Learn. & Cyber. 8, 1929–1943 (2017). https://doi.org/10.1007/s13042-016-0571-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0571-6

Keywords

Navigation