Nonparametric meta-analysis for single-case research: Confidence intervals for combined effect sizes

  • Bart Michiels
  • Patrick Onghena


In this article we present a nonparametric technique for meta-analyzing randomized single-case experiments by using inverted randomization tests to calculate nonparametric confidence intervals for combined effect sizes (CICES). Over the years, several proposals for single-case meta-analysis have been made, but most of these proposals assume either specific population characteristics (e.g., heterogeneity of variances or normality) or independent observations. However, such assumptions are seldom plausible in single-case research. The CICES technique does not require such assumptions, but only assumes that the combined effect size of multiple randomized single-case experiments can be modeled as a constant difference in the phase means. CICES can be used to synthesize the results from various single-case alternation designs, single-case phase designs, or a combination of the two. Furthermore, the technique can be used with different standardized or unstandardized effect size measures. In this article, we explain the rationale behind the CICES technique and provide illustrations with empirical as well as hypothetical datasets. In addition, we discuss the strengths and weaknesses of this technique and offer some possibilities for future research. We have implemented the CICES technique for single-case meta-analysis in a freely available R function.


Single-case experiments Meta-analysis Effect size Confidence intervals Hypothesis testing Nonparametric statistics Randomization tests 


  1. American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.Google Scholar
  2. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.Google Scholar
  3. American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.Google Scholar
  4. Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). Boston, MA: Pearson.Google Scholar
  5. Beretvas, S. N., & Chung, H. (2008). A review of meta-analyses of single-subject experimental designs: Methodological issues and practice. Evidence-Based Communication Assessment and Intervention, 2, 129–141.CrossRefGoogle Scholar
  6. Boersma, K., Linton, S., Overmeer, T., Jansson, M., Vlaeyen, J., & De Jong, J. (2004). Lowering fear-avoidance and enhancing function through exposure in vivo: A multiple baseline study across six patients with back pain. Pain, 108, 8–16.CrossRefPubMedGoogle Scholar
  7. Bulté, I., & Onghena, P. (2008). An R package for single-case randomization tests. Behavior Research Methods, 40, 467–478. doi: CrossRefPubMedGoogle Scholar
  8. Burns, M. K. (2012). Meta-analysis of single-case design research: Introduction to the special issue. Journal of Behavioral Education, 21, 175–184.CrossRefGoogle Scholar
  9. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312. doi: CrossRefGoogle Scholar
  10. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. doi: CrossRefGoogle Scholar
  11. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.Google Scholar
  12. Cox, D. R., & Reid, N. (2000). The theory of the design of experiments. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
  13. Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29. doi: CrossRefPubMedGoogle Scholar
  14. Dugard, P. (2014). Randomization tests: A new gold standard? Journal of Contextual Behavioral Science, 3, 65–68.CrossRefGoogle Scholar
  15. Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.CrossRefGoogle Scholar
  16. Edgington, E. S. (1967). Statistical inference from N = 1 experiments. Journal of Psychology, 65, 195–199.CrossRefGoogle Scholar
  17. Edgington, E. S. (1969). Approximate randomization tests. Journal of Psychology, 72, 143–149.CrossRefGoogle Scholar
  18. Edgington, E. S. (1980). Overcoming obstacles to single-subject experimentation. Journal of Educational Statistics, 5, 261–267.CrossRefGoogle Scholar
  19. Edgington, E. S. (1996). Randomized single-subject experimental designs. Behaviour Research and Therapy, 34, 567–574.CrossRefPubMedGoogle Scholar
  20. Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
  21. Ferron, J. M., Bell, B. A., Hess, M. F., Rendina-Gobioff, G., & Hibbard, S. T. (2009). Making treatment effect inferences from multiple-baseline data: The utility of multilevel modeling approaches. Behavior Research Methods, 41, 372–384.CrossRefPubMedGoogle Scholar
  22. Ferron, J. M., Farmer, J. L., & Owens, C. M. (2010). Estimating individual treatment effects from multiple-baseline data: A Monte Carlo study of multilevel-modeling approaches. Behavior Research Methods, 42, 930–943. doi: CrossRefPubMedGoogle Scholar
  23. Ferron, J., Foster-Johnson, L., & Kromrey, J. D. (2003). The functioning of single-case randomization tests with and without random assignment. Journal of Experimental Education, 71, 267−288.CrossRefGoogle Scholar
  24. Ferron, J. M., & Levin, J. R. (2014). Single-case permutation and randomization statistical tests: Present status, promising new developments. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances (pp. 153–183). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  25. Ferron, J. M., Moeyaert, M., Van den Noortgate, W., & Beretvas, S. N. (2014). Estimating casual effects from multiple-baseline studies: Implications for design and analysis. Psychological Methods, 19, 493–510.CrossRefPubMedGoogle Scholar
  26. Garthwaite, P. (2005). Confidence intervals: Nonparametric. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science (pp. 375–381). Chichester, UK: Wiley.Google Scholar
  27. Gast, D. L., & Ledford, J. R. (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd ed.). New York, NY: Routledge.Google Scholar
  28. Hedges, L. G., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single-case designs. Research Synthesis Methods, 3, 224–239.CrossRefPubMedGoogle Scholar
  29. Hedges, L. G., Pustejovsky, J. E., & Shadish, W. R. (2013). A standardized mean difference effect size for multiple baseline designs across individuals. Research Synthesis Methods, 4, 324–341.CrossRefPubMedGoogle Scholar
  30. Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128.CrossRefGoogle Scholar
  31. Heyvaert, M., Moeyaert, M., Verkempynck, P., Van den Noortgate, W., Vervloet, M., Ugille, M., & Onghena, P. (2017). Testing the intervention effect in single-case experiments: A Monte Carlo simulation study. Journal of Experimental Education, 85, 175–196.CrossRefGoogle Scholar
  32. Heyvaert, M., & Onghena, P. (2014). Analysis of single-case data: Randomisation tests for measures of effect size. Neuropsychological Rehabilitation, 24, 507–527.CrossRefPubMedGoogle Scholar
  33. Heyvaert, M., Wendt, O., Van den Noortgate, W., & Onghena, P. (2015). Randomization and data-analysis items in quality standards for single-case experimental studies. Journal of Special Education, 49, 146–156.CrossRefGoogle Scholar
  34. Hinkelmann, K., & Kempthorne, O. (2005). Design and analysis of experiments, Vol. 2: Advanced experimental design. Hoboken, NJ: Wiley.CrossRefGoogle Scholar
  35. Hinkelmann, K., & Kempthorne, O. (2008). Design and analysis of experiments, Vol. 1: Introduction to experimental design (2nd ed.). Hoboken, NJ: Wiley.Google Scholar
  36. Hinkelmann, K., & Kempthorne, O. (2012). Design and analysis of experiments, Vol. 3: Special designs and applications. Hoboken, NJ: Wiley.Google Scholar
  37. Hope, A. C. A. (1968). A simplified Monte Carlo test procedure. Journal of the Royal Statistical Society: Series B, 30, 582–598.Google Scholar
  38. Horner, R. D., & Baer, D. M. (1978). Multiple probe technique: A variation on the multiple baseline. Journal of Applied Behavior Analysis, 11, 189–196.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Kazdin, A. E. (1980). Obstacles in using randomization tests in single-case experimentation. Journal of Educational Statistics, 5, 253–260.CrossRefGoogle Scholar
  40. Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). New York, NY: Oxford University Press.Google Scholar
  41. Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759.CrossRefGoogle Scholar
  42. Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from the What Works Clearinghouse website:
  43. Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods, 15, 124–144. doi:
  44. Kratochwill, T. R., & Levin, J. R. (Eds.). (2014). Single-case intervention research: Methodological and statistical advances. Washington, DC: American Psychological Association.Google Scholar
  45. Lambert, M. C., Cartledge, G., Heward, W. L., & Lo, Y. (2006). Effects of response cards on disruptive behavior and academic responding during math lessons by fourth-grade urban students. Journal of Positive Behavior Interventions, 8, 88–99.CrossRefGoogle Scholar
  46. Lehmann, E. L. (1959). Testing statistical hypotheses. Hoboken, NJ: Wiley.Google Scholar
  47. Levin, J. R., Ferron, J. M., & Kratochwill, T. R. (2012). Nonparametric statistical tests for single-case systematic and randomized ABAB . . . AB and alternating treatment intervention designs: New developments, new directions. Journal of School Psychology, 50, 599–624.CrossRefPubMedGoogle Scholar
  48. Levin, J. R., Marascuilo, L. A., & Hubert, L. J. (1978). N = Nonparametric randomization tests. In T. R. Kratochwill (Ed.), Single-subject research: Strategies for evaluating change (pp. 167–196). New York, NY: Academic Press.CrossRefGoogle Scholar
  49. Levin, J. R., & Wampold, B. E. (1999). Generalized single-case randomization tests: Flexible analyses for a variety of situations. School Psychology Quarterly, 14, 59–93.CrossRefGoogle Scholar
  50. Maas, C. J. M., & Hox, J. J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–137.CrossRefGoogle Scholar
  51. Maggin, D. M., O’Keeffe, B. V., & Johnson, A. H. (2011). A quantitative synthesis of methodology in the meta-analysis of single-subject research for students with disabilities: 1985–2009. Exceptionality, 19, 109–135.CrossRefGoogle Scholar
  52. Manolov, R., & Solanas, A. (2009). Percentage of nonoverlapping corrected data. Behavior Research Methods, 41, 1262–1271. doi: CrossRefPubMedGoogle Scholar
  53. Matyas, T. A., & Greenwood, K. M. (1997). Serial dependency in single-case time series. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 215–243). Mahwah, NJ: Erlbaum.Google Scholar
  54. Michiels, B., Heyvaert, M., Meulders, A., & Onghena, P. (2017). Confidence intervals for single-case effect size measures based on randomization test inversion. Behavior Research Methods, 49, 363–381.CrossRefPubMedGoogle Scholar
  55. Moeyaert, M., Ugille, M., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2014). Three-level analysis of single-case experimental data: Empirical validation. Journal of Experimental Education, 82, 1–21.CrossRefGoogle Scholar
  56. Moore, D. S., McCabe, G. P., & Craig, B. A. (2014). Introduction to the practice of statistics (8th ed.). New York, NY: W. H. Freeman.Google Scholar
  57. Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society A, 767, 333–380.CrossRefGoogle Scholar
  58. Nugent, W. (1996). Integrating single-case and group comparison designs for evaluation research. Journal of Applied Behavioral Science, 32, 209–226.CrossRefGoogle Scholar
  59. Onghena, P. (1992). Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behavioral Assessment, 14, 153–171.Google Scholar
  60. Onghena, P. (2005). Single-case designs. In B. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science (Vol. 4, pp. 1850–1854). Chichester, UK: Wiley.Google Scholar
  61. Onghena, P., & Edgington, E. S. (1994). Randomization tests for restricted alternating treatments designs. Behaviour Research and Therapy, 32, 783–786.CrossRefPubMedGoogle Scholar
  62. Onghena, P., & Edgington, E. S. (2005). Customization of pain treatments: Single-case design and analysis. Clinical Journal of Pain, 21, 56–68.CrossRefPubMedGoogle Scholar
  63. Parker, R. I., Hagan-Burke, S., & Vannest, K. J. (2007). Percentage of all non-overlapping data: An alternative to PND. Journal of Special Education, 40, 194–204.CrossRefGoogle Scholar
  64. Parker, R. I., & Vannest, K. J. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy, 40, 357–367.CrossRefPubMedGoogle Scholar
  65. Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification, 35, 303–322.CrossRefPubMedGoogle Scholar
  66. Rindskopf, D. M. (2014). Nonlinear Bayesian analysis for single case designs. Journal of School Psychology, 52, 71–81.CrossRefGoogle Scholar
  67. Robey, R. R., & Beeson, P. M. (2005). Aphasia treatment: Examining the evidence. Presentation at the American Speech-Language-Hearing Association Annual Convention. San Diego, CA.Google Scholar
  68. Scruggs, T. E., & Mastropieri, M. A. (2013). PND at 25: Past, present, and future trends in summarizing single-subject research. Remedial and Special Education, 34, 9–19.CrossRefGoogle Scholar
  69. Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The quantitative synthesis of single subject research: Methodology and validation. Remedial and Special Education, 8, 24–33.CrossRefGoogle Scholar
  70. Shadish, W. R. (2014). Analysis and meta-analysis of single-case designs: An introduction. Journal of School Psychology, 52, 109–122.CrossRefPubMedGoogle Scholar
  71. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi experimental designs for generalized causal inference. New York, NY: Houghton Mifflin.Google Scholar
  72. Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. Journal of School Psychology, 52, 123–147.CrossRefPubMedGoogle Scholar
  73. Shadish, W. R., & Rindskopf, D. M. (2007). Methods for evidence-based practice: Quantitative synthesis of single-subject designs. New Directions for Evaluation, 113, 95–109.CrossRefGoogle Scholar
  74. Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43, 971–980. doi: CrossRefPubMedGoogle Scholar
  75. Solomon, B. G. (2014). Violations of assumptions in school-based single-case data: Implications for the selection and interpretation of effect sizes. Behavior Modification, 38, 477–496.CrossRefPubMedGoogle Scholar
  76. Tate, R. L., Perdices, M., Rosenkoetter, U., Shadish, W., Vohra, S., Barlow, D. H., . . . Wilson, B. (2016). The Single-Case Reporting guideline In BEhavioural interventions (SCRIBE): 2016 statement. Aphasiology, 30, 862–876.Google Scholar
  77. Tritchler, D. (1984). On inverting permutation tests. Journal of the American Statistical Association, 385, 200–207.CrossRefGoogle Scholar
  78. Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers, 35, 1–10. doi: CrossRefGoogle Scholar
  79. Vlaeyen, J. W., de Jong, J., Geilen, M., Heuts, P. H. T., & van Breukelen, G. (2001). Graded exposure in vivo in the treatment of pain-related fear: a replicated single-case experimental design in four patients with chronic low back pain. Behaviour Research and Therapy, 39, 151–166.CrossRefPubMedGoogle Scholar
  80. Vohra, S., Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Tate, R., . . . the CENT Group. (2015). CONSORT extension for reporting N-of-1 trials (CENT): 2015 statement. British Medical Journal, 350, h1738.Google Scholar
  81. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. American Statistician, 70, 129–133.CrossRefGoogle Scholar
  82. Welch, W., & Gutierrez, L. G. (1988). Robust permutation tests for matched-pairs designs. Journal of the American Statistical Association, 402, 450–455.CrossRefGoogle Scholar
  83. Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. doi: CrossRefGoogle Scholar
  84. Winch, R. F., & Campbell, D. T. (1969). Proof? No. Evidence? Yes. The significance of tests of significance. American Sociologist, 4, 140–143.Google Scholar
  85. Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantitatively synthesizing single-subject data. Journal of Special Education, 44, 18–28.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Faculty of Psychology and Educational SciencesKU Leuven–University of LeuvenLeuvenBelgium

Personalised recommendations