Abstract
Recent interest in promoting replication efforts assumes that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. The paper describes research designs for replication and demonstrates how multiple designs may be combined in systematic replication efforts, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
Similar content being viewed by others
Notes
Currently, there is no standard approach for determining replication failure. Researchers often compare the direction, size, and statistical significance patterns of study effects; they have also examined statistical tests of difference and/or equivalence of study results. In this article, we will define replication failure as statistical differences in two or more study effect estimates.
The replication effort actually consisted of six individual RCTs and five replication study designs. We limit our discussion to include only the first three RCTs and replications studies because of space considerations. Results of the systematic conceptual replication study is available at Krishnamachari (2021).
A full review of how researchers may apply semantic similarity methods is beyond the scope of this paper, but we provide readers with an intuition for the approach here. To quantify the similarity between texts, researchers represent texts numerically by their relative word frequencies or by the extent to which they include a set of abstract topics. After each transcript is represented as a numerical vector, researchers calculate the similarity of vectors by measuring the cosine of the angle between them. Two texts that share the same relative word frequencies will have a cosine similarity of one and two texts that share no common terms (or concepts) will be perpendicular to one-another and have a cosine similarity of 0. Importantly, semantic similarity methods create continuous measures which can be used to identify studies where treatments were delivered more or less consistently, or with more or less adherence. Anglin and Wong (2020) describe the method and provide an example of how it may be used in replication contexts.
References
Anderson, & Maxwell. (2017). Addressing the “replication crisis”: Using original studies to design replication studies with appropriate statistical power. Multivariate Behavioral Research, 52, 305–324.
Anderson, Kelly, & Maxwell. (2017). Sample-size planning for more accurate statistical power: a method adjusting sample effect sizes for publicationBias and Uncertainty. Psychological Science, 28(11), 1547–1562.
Anglin, K. L., & Wong, V. C. (2020). Using Semantic Similarity to Assess Adherence and Replicability of Intervention Delivery (No. 73; EdPolicyWorks Working Paper Series, pp. 1–33). EdPolicyWorks. https://curry.virginia.edu/sites/default/files/uploads/epw/73_Semantic_Similarity_to_Assess_Adherence_and_Replicability_0.pdf
Angrist, J., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
Brandt, M. J., & IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., Grange, J. A., Perugini, M., Spies, J. R., & van ’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005
Cartwright, N., & Hardie, J. (2012). Evidence-Based Policy: A Practical Guide to Doing it Better (p. 208). Oxford University Press.
Chang, A., & Li, P. (2015). Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say “Usually Not” (Finance and Economics Discussion Series) [2015–083]. Board of Governors of the Federal Reserve System. https://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf
Clemens, M. A. (2017). The meaning of failed replications: A review and proposal. Journal of Economic Surveys, 31, 326–342.
Cohen, J., Wong, V., Krishnamachari, A., & Berlin, R. (2020). Teacher coaching in a simulated environment. Educational Evaluation and Policy Analysis, 42, 208–231. https://doi.org/10.3102/0162373720906217
Cole, S. R., & Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 Trial. American Journal of Epidemiology, 172, 107–115. https://doi.org/10.1093/aje/kwq084.
Cook, B., Therrien, W., & Wong, V. (2020). Developing Infrastructure and Procedures for the Special Education Research Accelerator. NCSER. https://ies.ed.gov/funding/grantsearch/details.asp?ID=3356
Degue, S., Niolon, P. H., Estefan, L. F., Tracy, A. J., Le, V. D., Vivolo-Kantor, A. M., Little, T. D., Latzman, N. E., Tharp, A., Lang, K. M., & Taylor, B. (2020). Effects of Dating Matters® on sexual violence and sexual harassment outcomes among middle school youth: A cluster-randomized controlled trial. Prevention Science. https://doi.org/10.1007/s11121-020-01152-0
Department of Health and Human Services. (2014). PAR-13–383: Replication of Key Clinical Trials Initiative. Grants and Funding. https://grants.nih.gov/grants/guide/pa-files/PAR-13-383.html
Duncan, G. J., Engel, M., Claessens, A., & Dowsett, C. J. (2014). Replication and robustness in developmental research. Developmental Psychology, 50, 2417–2425. https://doi.org/10.1037/a0037996
Fraker, T., & Maynard, R. (1987). The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Programs. The Journal of Human Resources, 22. https://doi.org/10.2307/145902
Gottfredson, D. C., Cook, T. D., Gardner, F. E. M., Gorman-Smith, D., Howe, G. W., Sandler, I. N., & Zafft, K. M. (2015). Standards of evidence for efficacy, effectiveness, and scale-up research in prevention science: Next generation. Prevention Science, 16, 893–926. https://doi.org/10.1007/s11121-015-0555-x
Hedges, L. V., & Schauer, J. M. (2019). Statistical analyses for studying replication: Meta-analytic perspectives. Psychological Methods, 24, 557.
Imbens, G. W., & Ruben, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences (p. 644).
Institute of Education Sciences. (2016). Building Evidence: What Comes After an Efficacy Study? (pp. 1–17). https://ies.ed.gov/ncer/whatsnew/techworkinggroup/pdf/BuildingEvidenceTWG.pdf
Institute of Education Sciences. (2020). Program Announcement: Research Grants Focused on Systematic Replication CFDA 84.305R. Funding Opportunities; Institute of Education Sciences (IES). https://ies.ed.gov/funding/ncer_rfas/systematic_replications.asp
Keown, L. J., Sanders, M. R., Franke, N., & Shepherd, M. (2018). TeWhānauPou Toru: A randomized controlled trial (RCT) of a culturally adapted low-intensity variant of the triple P-positive parenting program for indigenous Māori families in New Zealand. Prevention Science, 19, 954–965. https://doi.org/10.1007/s11121-018-0886-5
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4
Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., IJzerman, H., Nilsonne, G., Vanpaemel, W., Frank, M. C. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1), 20. https://doi.org/10.1525/collabra.158
Krishnamachari, A. (2021). How do pre-service teachers learn? Using rigorous research methods to inform teacher preparation policy [Doctoral Dissertation, University of Virginia]. UVa Libra Repository. https://doi.org/10.18130/B811-ZH86
Lalonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, 76, 604–620.
Lebel, E. P., Mccarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified framework to quantify the credibility of scientific findings. Advances in Methods and Practices in Psychological Science, 1, 389–402. https://doi.org/10.1177/2515245918787489
Lei, R., Gelman, A. & Ghitza, Y. (2017). The 2008 election: a preregistered replication analysis. Statistics and Public Policy, 4(1), 1–8. https://doi.org/10.1080/2330443X.2016.1277966
Lewis, M. A., Rhew, I. C., Fairlie, A. M., Swanson, A., Anderson, J., & Kaysen, D. (2019). Evaluating personalized feedback intervention framing with a randomized controlled trial to reduce young adult alcohol-related sexual risk taking. Prevention Science, 20, 310–320. https://doi.org/10.1007/s11121-018-0879-4
Maalouf, F. T., Alrojolah, L., Ghandour, L., Afifi, R., Dirani, L. A., Barrett, P., et al. (2020). Building emotional resilience in youth in Lebanon: A school-based randomized controlled trial of the FRIENDS intervention. Prevention Science, 21, 650–660. https://doi.org/10.1007/s11121-020-01123-5
Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference: Methods and principles for social research. Methods and Principles for Social Research. https://doi.org/10.1017/CBO9781107587991
National Science Foundation. (2020). Improving Undergraduate STEM Education: Education and Human Resources. Funding. https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=505082
Nelson, M. C., Cordray, D. S., Hulleman, C. S., Darrow, C. L., & Sommer, E. C. (2012). A procedure for assessing intervention fidelity in experiments testing educational and behavioral interventions. Journal of Behavioral Health Services and Research, 39, 374–396. https://doi.org/10.1007/s11414-012-9295-x
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture: Author guidelines for journals could help promote transparency, openness, and reproducibility. Science, 348, 1422–1425. https://doi.org/10.1126/science.aab2374
Nosek, B.A., Ebersole, C.R., DeHaven, A.C., Mellor, D.T. (2018). The preregistration revolution. PNAS, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
Nosek, B. A., & Errington, T. M. (2020). What is replication? PLOS Biology, 18, e3000691. https://doi.org/10.1371/journal.pbio.3000691
Rindskopf, D. M., Shadish, W. R., & Clark, M. H. (2018). Using Bayesian Correspondence Criteria to Compare Results From a Randomized Experiment and a Quasi-Experiment Allowing Self-Selection. Evaluation Review, 42(2), 248-280. https://doi.org/10.1177/0193841X18789532
Roddy, M. K., Rhoades, G. K., & Doss, B. D. (2020). Effects of ePREP and OurRelationship on low-income couples’ mental health and health behaviors: A randomized controlled trial. Prevention Science, 21, 861–871. https://doi.org/10.1007/s11121-020-01100-y
Rosenbaum, P. R. (2017). Observation and Experiment: An Introduction to Causal Inference. Harvard University Press.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. https://doi.org/10.1037/h0037350
Schauer, J. M., & Hedges, L. V. (2020). Assessing heterogeneity and power in replications of psychological experiments. Psychological Bulletin.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. https://doi.org/10.1037/a0015108
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin.
Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science, 9(1), 76–80. https://doi.org/10.1177/1745691613514755
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12, 1123–1128.
Simonsohn, U. (2015). Small telescopes. Psychological Science, 26, 559–569. https://doi.org/10.1177/0956797614567341
Solari, E., Wong, V., Baker, D. L., & Richards, T. (2020). Iterative Replication of Read Well in First Grade. NCSER. https://ies.ed.gov/funding/grantsearch/details.asp?ID=4404
Spybrook, J., Anderson, D., & Maynard, R. (2019). The Registry of Efficacy and Effectiveness Studies (REES): A step toward increased transparency in education. Journal of Research on Educational Effectiveness, 12, 5–9. https://doi.org/10.1080/19345747.2018.1529212
Steiner P.M., & Wong V.C. (2018). Assessing correspondence between experimental and nonexperimental estimates in within-study comparisons. Evaluation Review. 42(2), 214–247. https://doi.org/10.1177/0193841X18773807
Steiner, P. M., Wong, V. C., & Anglin, K. L. (2019). A causal replication framework for designing and assessing replication efforts. ZeitschriftFürPsychologie / Journal of Psychology, 227, 280–292. https://doi.org/10.1027/2151-2604/a000385
Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71. https://doi.org/10.1177/1745691613514450
Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 369–386. https://doi.org/10.1111/j.1467-985X.2010.00673.x
Terry, J. D., Weist, M. D., Strait, G. G., & Miller, M. (2020). Motivational interviewing to promote the effectiveness of selective prevention: An integrated school-based approach. Prevention Science. https://doi.org/10.1007/s11121-020-01124-4
Tipton, E. (2012). Improving generalizations from experiments using propensity score subclassification. Journal of Educational and Behavioral Statistics, 38, 239–266. https://doi.org/10.3102/1076998612441947
Tipton, E., & Olsen, R. B. (2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47, 516–524. https://doi.org/10.3102/0013189X18781522
Wennehorst, K., Mildenstein, K., Saliger, B., Tigges, C., Diehl, H., Keil, T., & Englert, H. (2016). A comprehensive lifestyle intervention to prevent type 2 diabetes and cardiovascular diseases: the german CHIP Trial. Prevention Science, 17, 386–397. https://doi.org/10.1007/s11121-015-0623-2
Widaman, K. F., Ferrer, E., & Conger, R. D. (2010). Factorial invariance within longitudinal structural equation models: Measuring the same construct across time. Child Development Perspectives, 4, 10–18. https://doi.org/10.1111/j.1750-8606.2009.00110.x
Wong, V. C., & Steiner, P. M. (2018). Replication designs for causal inference. In EdPolicyWorks Working Paper Series (No. 62; EdPolicyWorks Working Paper Series, Issue 62). EdPolicyWorks. https://curry.virginia.edu/sites/default/files/uploads/epw/62_Replication_Designs.pdf
Wong, V. C., Wing, C., Steiner, P. M., Wong, M., & Cook, T. D. (2012). Research designs for program evaluation. Handbook of Psychology, Second Edition, 2.
Wu, A. D., Li, Z., & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assessment Research & Evaluation, 12, 25. https://doi.org/10.7275/mhqa-cd89
Funding
The research reported here was supported by the Institute of Education Sciences, US Department of Education, through Grant #R305B140026 and Grant #R305D190043 to the Rectors and Visitors of the University of Virginia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Approval was obtained from the ethics committee of University of Virginia. The procedures used in this study adhere to the tenets of the Declaration of Helsinki (Ethics approval numbers: 2170, 2727, 2875, 2918).
Consent to Participate
Informed consent was obtained from all individual participants included in the study.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wong, V.C., Anglin, K. & Steiner, P.M. Design-Based Approaches to Causal Replication Studies. Prev Sci 23, 723–738 (2022). https://doi.org/10.1007/s11121-021-01234-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11121-021-01234-7