Enhancing the Performance of Classification Using Super Learning

  • Md Faisal KabirEmail author
  • Simone A. Ludwig


Classification is one of the supervised learning models, and enhancing the performance of a classification model has been a challenging research problem in the fields of machine learning (ML) and data mining. The goal of ML is to produce or build a model that can be used to perform classification. It is important to achieve superior performance of the classification model. Obtaining a better performance is important for almost all fields including healthcare. Researchers have been using different ML techniques to obtain better performance of their models; ensemble techniques are also used to combine multiple base learner models. The ML technique called super learning or stacked-ensemble is an ensemble method that finds the optimal weighted average of diverse learning models. In this paper, we have used super learning or stacked-ensemble achieving better performance on four benchmark data sets that are related to healthcare. Experimental results show that super learning has a better performance compared to the individual base learners and the baseline ensemble.


Super learning / learner Stacked ensemble Classification 



  1. 1.
    J. Han, M. Kamber. Data Mining Concepts and Techniques (Moraga Kaufman, San Francisco, 2001)zbMATHGoogle Scholar
  2. 2.
    K. Kourou, et al., Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)CrossRefGoogle Scholar
  3. 3.
    R. Agrawal, et al., An interval classier for database mining applications. in Proc. of the VLDB Conference (1992)Google Scholar
  4. 4.
    S.M.M. Rahman, M.D. Faisal Kabir, M.M. Rahman, Integrated data mining and business intelligence. Encyclopedia of business analytics and optimization. IGI Global, 1234–1253 (2014)Google Scholar
  5. 5.
    T. Fawcett, An introduction to ROC analysis. Pattern Recogn. Lett. 27.8, 861–874 (2006)CrossRefGoogle Scholar
  6. 6.
    P. Casas, et al., Big-DAMA: big data analytics for network traffic monitoring and analysis. in Proceedings of the 2016 workshop on Fostering Latin-American Research in Data Communication Networks ACM (2016)Google Scholar
  7. 7.
    H. Kaur, S. Batra, HPCC An Ensembled framework for the prediction of the onset of diabetes. in 2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) (IEEE) (2017)Google Scholar
  8. 8.
    C. Gibbons, et al., Supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy. J. Med. Internet Res. 19, 3 (2017)CrossRefGoogle Scholar
  9. 9.
    T. Silwattananusarn, W. Kanarkard, K. Tuamsuk, Enhanced classification accuracy for cardiotocogram data with ensemble feature selection and classifier ensemble. J. Comput. Commun. 4.04, 20 (2016)CrossRefGoogle Scholar
  10. 10.
    M.J. van der Laan, E.C. Polley, A.E. Hubbard, Super learner statistical applications in genetics and molecular biology, 6.1 (2007). Retrieved 19 Mar. 2018, from
  11. 11.
    M.J. Van der Laan, S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data (Springer Science & Business Media, Berlin, 2011)CrossRefGoogle Scholar
  12. 12.
    J. Vanerio, P. Casas, Ensemble-learning approaches for network security and anomaly detection. in Proceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks ACM (2017)Google Scholar
  13. 13.
    S. Aiello, et al., Machine Learning with Python and H20. H2O ai Inc (2016)Google Scholar
  14. 14.
    D. Cireşan, U. Meier, J. Schmidhuber, Multi-column deep neural networks for image classification. arXiv:1202.2745 (2012)
  15. 15.
    T. Nykodym, et al., Generalized Linear Modeling with H2O. Published by H2O. ai Inc (2016)Google Scholar
  16. 16.
    E. LeDell, Scalable super learning. Handbook of Big Data 339 (2016)Google Scholar
  17. 17.
    E.E. LeDell. Scalable Ensemble Learning and Computationally Efficient Variance Estimation (University of California, Berkeley, 2015)Google Scholar
  18. 18.
    D.H. Wolpert, Stacked generalization. Neural Netw. 5.2, 241–259 (1992)CrossRefGoogle Scholar
  19. 19.
    L. Breiman, Stacked regressions. Mach. Learn. 24.1, 49–64 (1996)zbMATHGoogle Scholar
  20. 20.
    M. LeBlanc, R. Tibshirani, Combining estimates in regression and classification. J. Am. Stat. Assoc. 91.436, 1641–1650 (1996)MathSciNetzbMATHGoogle Scholar
  21. 21.
    M.J. Van der Laan, S. Dudoit, A.W. van der Vaart, Van der The cross-validated adaptive epsilon-net estimator. Statist. Decisions. 24.3, 373–395 (2006)zbMATHGoogle Scholar
  22. 22.
    P. Casas, J. Vanerio, Super learning for anomaly detection in cellular networks. Wireless and Mobile Computing, Networking and Communications (WiMob). IEEE (2017)Google Scholar
  23. 23.
    V. Baćak, E.H. Kennedy, Principled machine learning using the super learner: an application to predicting prison Violence. Sociological Methods & Research 0049124117747301 (2018)Google Scholar
  24. 24.
    B. Antal, A. Hajdu, An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst. 60, 20–27 (2014)CrossRefGoogle Scholar
  25. 25.
    G.I. Salama, M. Abdelhalim, M.A. Zeid, Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC). 32.569, 2 (2012)Google Scholar
  26. 26.
    D.K. Choubey, et al., Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection. in Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016) (2017)Google Scholar
  27. 27.
    M. Abdar, et al., Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 67, 239–251 (2017)CrossRefGoogle Scholar
  28. 28.
    M. Fatima, M. Pasha, Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 9.01, 1–16 (2017)Google Scholar
  29. 29.
    D. Dua, E. Karra Taniskidou, UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceNorth Dakota State UniversityFargoUSA

Personalised recommendations