Evaluating the Effectiveness of Instructional Methods

  • Jimmie LeppinkEmail author


The previous chapters in this book present a variety of insights for the instructional design of high-stakes learning environments. These insights are based on randomised controlled experiments that compared different instructional formats for learners with varying degrees of prior experience with content to be learned as well as other types of carefully designed studies. Moreover, efforts across fields resulted in a variety of instruments for the measurement of cognitive load or, to some extent, even of separate types of cognitive load. Some of these measurements have been successfully used in research in, for instance, emergency medicine settings. However, to bring instructional design research to the next level, a critical revision of common methodological and statistical practices to evaluate the effectiveness of different instructional methods is needed. In this chapter, suboptimal practices that occur across the board in instructional design research are discussed, and more viable alternatives are provided. Although a variety of factors may frequently put constraints on the sample sizes of our studies and variables measured in these studies, we should do efforts to go beyond small samples and beyond single measurements whenever we can. Further, we should adopt alternatives to the traditional statistical significance testing approach that has dominated statistical testing in research in education, psychology and other fields. Finally, we should adjust our approach to the evaluation of the reliability of our measurements, and we should consider an important recent development in the peer-review and reporting practice.


  1. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory (pp. 267–281). Budapest, Hungary: Academiai Kiado.Google Scholar
  2. Anderson, D. R. (2008). Model based inference in the life sciences: A primer on evidence. New York: Springer.CrossRefGoogle Scholar
  3. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. New York: Springer.Google Scholar
  4. Center for Open Science (COS). Registered reports: Peer review before results are known to align scientific values and practices. Retrieved from: Accessed 23 Mar 2018.
  5. Comparison of Registered Reports. Retrieved from: Accessed 23 Mar 2018.
  6. Crutzen, R. (2014). Time is a jailer: What do alpha and its alternatives tell us about reliability? European Health Psychologist, 16, 70–74.Google Scholar
  7. Crutzen, R., & Peters, G. J. Y. (2017). Scale quality: Alpha is an inadequate estimate and factor-analytic evidence is needed first of all. Health Psychology Review, 11, 242–247. CrossRefGoogle Scholar
  8. Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, 399–412. CrossRefGoogle Scholar
  9. Goertzen, J. R., & Cribbie, R. A. (2010). Detecting a lack of association: An equivalence testing approach. British Journal of Mathematical and Statistical Psychology, 63, 527–537. CrossRefGoogle Scholar
  10. Hauck, D. W. W., & Anderson, S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Biopharamceutics, 12, 83–91. CrossRefGoogle Scholar
  11. John, L. K., Löwenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. CrossRefGoogle Scholar
  12. Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations and meta-analyses. Social Psychological and Personality Science, 8, 355–362. CrossRefGoogle Scholar
  13. Leppink, J. (2018a). The art of acknowledging that we know nearly nothing. Health Professions Education, online ahead of print. Scholar
  14. Leppink, J. (2018b). A pragmatic approach to statistical testing and estimation (PASTE). Health Professions Education, online ahead of print. Scholar
  15. Leppink, J. (2018c). Analysis of covariance (ANCOVA) vs. moderated regression (MODREG): Why the interaction matters. Health Professions Education. CrossRefGoogle Scholar
  16. Leppink, J., & Pérez-Fuster, P. (2017). We need more replication research – A case for test-retest reliability. Perspectives on Medical Education, 6, 158–164. CrossRefGoogle Scholar
  17. Leppink, J., & Van Merriënboer, J. J. G. (2015). The beast of aggregating cognitive load measures in technology-based learning environments. Educational Technology & Society, 18, 230–245.Google Scholar
  18. Leppink, J., Paas, F., Van der Vleuten, C. P. M., Van Gog, T., & Van Merriënboer, J. J. G. (2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45, 1058–1072. CrossRefGoogle Scholar
  19. Leppink, J., Paas, F., Van Gog, T., Van der Vleuten, C. P. M., & Van Merriënboer, J. J. G. (2014). Effects of pairs of problems and examples on task performance and different types of cognitive load. Learning and Instruction, 30, 32–42. CrossRefGoogle Scholar
  20. Leppink, J., O’Sullivan, P., & Winston, K. (2016a). On variation and uncertainty. Perspectives on Medical Education, 5, 231–234. CrossRefGoogle Scholar
  21. Leppink, J., Winston, K., & O’Sullivan, P. (2016b). Statistical significance does not imply a real effect. Perspectives on Medical Education, 5, 122–124. CrossRefGoogle Scholar
  22. Leppink, J., O’Sullivan, P., & Winston, K. (2017). Evidence against vs. in favour of a null hypothesis. Perspectives on Medical Education, 6, 115–118. CrossRefGoogle Scholar
  23. Naismith, L. M., Cheung, J. J. H., Ringsted, C., & Cavalcanti, R. B. (2015). Limitations of subjective cognitive load measures in simulation-based procedural training. Medical Education, 49, 805–814. CrossRefGoogle Scholar
  24. Paas, F. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology, 84, 429.CrossRefGoogle Scholar
  25. Paas, F., Tuovinen, J., Tabbers, H., & Van Gerven, P. W. M. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38, 63–71. CrossRefGoogle Scholar
  26. Peters, G. J. Y. (2014). The alpha and the omega of scale reliability and validity. European Health Psychologist, 16, 56–69.Google Scholar
  27. Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74, 145–154. CrossRefGoogle Scholar
  28. Schwarz, G. (1978). Estimating the dimensions of a model. Annals of Statistics, 6, 461–464.CrossRefGoogle Scholar
  29. Sewell, J. L., Boscardin, C. K., Young, J. Q., Ten Cate, O., & O’Sullivan, P. S. (2016). Measuring cognitive load during procedural skills training with colonoscopy as an exemplar. Medical Education, 50, 682–692. CrossRefGoogle Scholar
  30. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, 64, 583–639. CrossRefGoogle Scholar
  31. Sweller, J. (2018). Measuring cognitive load. Perspectives on Medical Education, 7, 1–2. CrossRefGoogle Scholar
  32. Van der Zee, T., & Reich, J. (2018). Open education science. AERA Open, 4. CrossRefGoogle Scholar
  33. Wagenmakers, E. J., Marsman, E., Jamil, T., Ly, A., Verhagen, J., Love, J., et al. (2017). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin Review. CrossRefGoogle Scholar
  34. Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., Van Aert, R. C. M., & Van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, 1–12. CrossRefGoogle Scholar
  35. Young, J. Q., Irby, D. M., Barilla-LaBarca, M. L., Ten Cate, O., & O’Sullivan, P. S. (2016). Measuring cognitive load: Mixed results from a handover simulation for medical students. Perspectives on Medical Education, 5, 24–32. CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Maastricht UniversityMaastrichtThe Netherlands

Personalised recommendations