Recommendations for Practice: Justifying Claims of Generalizability

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.


Recommendations for practice are routinely included in articles that report educational research. Robinson et al. suggest that reports of primary research should not routinely do so. They argue that single primary research studies seldom have sufficient external validity to support claims about practice policy. In this article, I draw on recent statistical research that has formalized subjective notions about generalizability from experiments. I show that even rather large experiments often do not support generalizations to policy-relevant inference populations. This suggests that single primary studies are unlikely to be sufficiently generalizable to support recommendations for practice.

This is a preview of subscription content, log in to check access.


  1. Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24, 295–313.

    Article  Google Scholar 

  2. Hedges, L. V., & O'Muircheartaigh, C. (2010). Improving inference for population level treatment effects in social experiments. Working paper, Northwestern University Institute for Policy Research.

  3. Kalton, G. (1968). Standardization: a technique to control for extraneous variables. Applied Statistics, 17, 118–136.

    Article  Google Scholar 

  4. Kitagawa, E. M. (1964). Standardized comparisons in population research. Demography, 1, 296–315.

    Google Scholar 

  5. Kruskal, W., & Mosteller, F. (1979). Representative sampling III: the current statistical literature. International Statistical Review, 47, 245–265.

    Article  Google Scholar 

  6. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  7. Mosteller, F., Light, R. J., & Sachs, J. A. (1996). Sustained inquiry in education: lessons learned from skill grouping and class size. Harvard Educational Review, 66, 797–842.

    Google Scholar 

  8. Nye, B., Hedges, L. V., & Konstantopoulos, S. (2000). The effects of small classes on achievement: the results of the Tennessee class size experiment. American Educational Research Journal, 37, 123–151.

    Article  Google Scholar 

  9. O'Muircheartaigh, C., & Hedges, L. V. (2013). Generalizing from unrepresentative experiments: a stratified propensity score approach. Journal of the Royal Statistical Society, Series C (in press)

  10. Oaxaca, R. (1973). Male–female wage differentials in urban labor markets. International Economic Review, 14, 693–709.

    Article  Google Scholar 

  11. Robinson, D. H., Levin, J. R., Schraw, G., Patal, E. A., & Hunt, E. B. (2013). On going (way) beyond one's data: a proposal to restrict recommendations for practice in primary educational research journals. Educational Psychology Review (in press).

  12. Roschelle, J., Shechtman, N., Tatar, D., & Hegedus, S. (2010). Integration of technology, curriculum, and professional development for advancing middle school mathematics. American Educational Research Journal, 47, 833–878.

    Article  Google Scholar 

  13. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.

    Article  Google Scholar 

  14. Rosenberg, P. (1962). Test factor standardization as a method of interpretation. Social Forces, 41, 53–61.

    Article  Google Scholar 

  15. Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, Part, 2, 369–386.

    Google Scholar 

  16. Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics. (in press)

  17. Tipton, E., & Hedges, L. V. (2013). Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness (in press)

Download references

Author information



Corresponding author

Correspondence to Larry V. Hedges.

Additional information

This paper is based in part on work supported by the US National Science Foundation (NSF) under grants #0815295 and #1118978. Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily represent the views of the NSF.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hedges, L.V. Recommendations for Practice: Justifying Claims of Generalizability. Educ Psychol Rev 25, 331–337 (2013).

Download citation


  • Replication
  • Practice policy
  • Generalization