Allison, P. D. (1990). Change scores as dependent variables in regression analysis. Sociological Methodology, 20(1), 93–114.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
Bast, J., & Reitsma, P. (1997). Matthew effects in reading: A comparison of latent growth curve models and simplex models with structured means. Multivariate Behavioral Research, 32(2), 135–167.
Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change (pp. 3–20). Madison, WI: University of Wisconsin Press.
Bergman, L. R. (2001). A person approach in research on adolescence: Some methodological challenges. Journal of Adolescent Research, 16(1), 28–53.
Collins, L. M. (1996a). Measurement of change in research on aging: Old and new issues from an individual growth perspective. Handbook of the psychology of aging (4th ed., pp. 38–56). San Diego, CA: Academic Press.
Collins, L. M. (1996b). Is reliability obsolete? A commentary on “Are simple gain scores obsolete?”. Applied Psychological Measurement, 20(3), 289–292.
Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH: Cengage Learning.
Cronbach, L. J., & Furby, L. (1970). How we should measure “change”: Or should we? Psychological Bulletin, 74(1), 68–80.
Denney, C. B., Rapport, M. D., & Chung, K.-M. (2005). Interactions of task and subject variables among continuous performance tests. Journal of Child Psychology and Psychiatry, 46(4), 420–435.
Diggle, P., Heagerty, P., Liang, K. Y., & Zeger, S. (2013). Analysis of longitudinal data (2nd ed.). Oxford: OUP Oxford.
Draheim, C., Hicks, K. L., & Engle, R. W. (2016). Combining reaction time and accuracy: The relationship between working memory capacity and task switching as a case example. Perspectives on Psychological Science, 11(1), 133–155.
Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56(3), 495–515.
Finney, J. W., Moos, R. H., & Mewborn, C. R. (1980). Posttreatment experiences and treatment outcome of alcoholic patients six months and two years after hospitalization. Journal of Consulting and Clinical Psychology, 48(1), 17–29.
Fiszdon, J. M., & Johannesen, J. K. (2010). Comparison of computational methods for the evaluation of learning potential in schizophrenia. Journal of the International Neuropsychological Society: JINS, 16, 613–620.
Gjerustad, C., & von Soest, T. (2012). Socio-economic status and mental health: The importance of achieving occupational aspirations. Journal of Youth Studies, 15(7), 890–908.
Gold, A. B., Ewing-Cobbs, L., Cirino, P., Fuchs, L. S., Stuebing, K. K., & Fletcher, J. M. (2013). Cognitive and behavioral attention in children with math difficulties. Child Neuropsychology, 19(4), 420–437.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.
Guo, Y., Tompkins, V., Justice, L., & Petscher, Y. (2014). Classroom age composition and vocabulary development among at-risk preschoolers. Early Education and Development, 25(7), 1016–1034.
Hertzog, C., von Oertzen, T., Ghisletta, P., & Lindenberger, U. (2008). Evaluating the power of latent growth curve models to detect individual differences in change. Structural Equation Modeling, 15(4), 541–563.
Holahan, C. J., & Moos, R. H. (1981). Social support and psychological distress: A longitudinal analysis. Journal of Abnormal Psychology, 90(4), 365–370.
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge.
Hughes, M. M., Linck, J. A., Bowles, A. R., Koeth, J. T., & Bunting, M. F. (2014). Alternatives to switch-cost scoring in the task-switching paradigm: Their reliability and increased validity. Behavior Research Methods, 46(3), 702–721.
Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment. Applied Psychological Measurement, 40(8), 559–572.
Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15(4), 336–352.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12–19.
Kelly, S., & Ye, F. (2017). Accounting for the relationship between initial status and growth in regression models. The Journal of Experimental Education, 85(3), 353–375.
Kerckhoff, A. C. (1986). Effects of ability grouping in British secondary schools. American Sociological Review, 51(6), 842–858.
Kim, S., & Camilli, G. (2014). An item response theory approach to longitudinal analysis with application to summer setback in preschool language/literacy. Large-Scale Assessments in Education, 2(1), 1.
Li, F., Cohen, A., Bottge, B., & Templin, J. (2016). A latent transition analysis model for assessing change in cognitive skills. Educational and Psychological Measurement, 76(2), 181–204.
Linn, R. L., & Haug, C. (2002). Stability of school-building accountability scores and gains. Educational Evaluation and Policy Analysis, 24(1), 29–36.
Linn, R. L., & Slinde, J. A. (1977). The determination of the significance of change between pre- and posttesting periods. Review of Educational Research, 47(1), 121–150.
Lord, F. M. (1956). The measurement of growth. ETS Research Bulletin Series, 1956(1), i-22.
Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris (Ed.), Problems in measuring change (pp. 21–38). Madison, WI: The University of Wisconsin Press.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Adison-Wesley.
McArdle, J. J., Petway, K. T., & Hishinuma, E. S. (2015). IRT for growth and change. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (1st ed., pp. 435–456). New York, NY: Routledge.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1(3), 293–299.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55(1), 107–122.
Nesselroade, J. R. (1991). Interindividual differences in intraindividual change. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 92–105). Washington, DC: American Psychological Association.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592.
O’Connor, E. F. (1972). Extending classical test theory to the measurement of change. Review of Educational Research, 42(1), 73–97.
Ogles, B. M., Lunnen, K. M., & Bonesteel, K. (2001). Clinical significance: History, application, and current practice. Clinical Psychology Review, 21(3), 421–446.
Overall, J. E., & Woodward, J. A. (1975). Unreliability of difference scores: A paradox for measurement of change. Psychological Bulletin, 82(1), 85–86.
Parker, G. R., & Dabros, M. S. (2012). Last-period problems in legislatures. Public Choice, 151(3), 789–806.
Raaijmakers, J. G. W. (2016). On testing the strength independence assumption in retrieval-induced forgetting. Psychonomic Bulletin & Review, 23(5), 1374–1381.
Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501–525.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed., Vol. 1). London: Sage.
Raykov, T. (1993). A structural equation model for measuring residualized change and discerning patterns of growth or decline. Applied Psychological Measurement, 17(1), 53–71.
Reckase, M. (2009). Multidimensional item response theory (Vol. 150). New York, NY: Springer.
Rogosa, D. R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92(3), 726–748.
Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335–343.
Roohr, K. C., Liu, H., & Liu, O. L. (2016). Investigating student learning gains in college: A longitudinal study. Studies in Higher Education, 42(12), 2284–2300.
Sandell, R., & Wilczek, A. (2016). Another way to think about psychological change: Experiential vs. incremental. European Journal of Psychotherapy & Counselling, 18(3), 228–251.
Schunemann, H. J., & Guyatt, G. H. (2005). Commentary—goodbye M(C)ID! Hello MID, where do you come from? Health Services Research, 40(2), 593–597.
Sijtsma, K., & van der Ark, L. A. (2015). Conceptions of reliability revisited and practical recommendations. Nursing Research, 64(2), 128–136.
Snijders, T. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
Son, S.-H., & Morrison, F. J. (2010). The nature and impact of changes in home learning environment on development of language and academic skills in preschool children. Developmental Psychology, 46(5), 1103–1118.
Stanley, J. C. (1967). General and special formulas for reliability of differences. Journal of Educational Measurement, 4(4), 249–252.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21(4), 360–407.
Stevenson, C. E., Heiser, W. J., & Resing, W. C. M. (2013). Working memory as a moderator of training and transfer of analogical reasoning in children. Contemporary Educational Psychology, 38(3), 159–169.
Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). New York, NY: Oxford University Press.
Trompetter, H. R., Lamers, S. M. A., Westerhof, G. J., Fledderus, M., & Bohlmeijer, E. T. (2017). Both positive mental health and psychopathology should be monitored in psychotherapy: Confirmation for the dual-factor model in acceptance and commitment therapy. Behaviour Research and Therapy, 91, 58–63.
Willett, J. B. (1988). Questions and answers in the measurement of change. Review of Research in Education, 15, 345–422.
Williams, B. J., & Kaufmann, L. M. (2012). Reliability of the go/no go association task. Journal of Experimental Social Psychology, 48(4), 879–891.
Williams, R. H., & Zimmerman, D. W. (1977). The reliability of difference scores when errors are correlated. Educational and Psychological Measurement, 37(3), 679–689.
Williams, R. H., & Zimmerman, D. W. (1996). Are simple gain scores obsolete? Applied Psychological Measurement, 20(1), 59–69.
Wise, E. A. (2004). Methods for analyzing psychotherapy outcomes: A review of clinical significance, reliable change, and recommendations for future directions. Journal of Personality Assessment, 82(1), 50–59.
Zimmerman, D. W. (1994). A note on interpretation of formulas for the reliability of differences. Journal of Educational Measurement, 31(2), 143–147.
Zimmerman, D. W., & Williams, R. H. (1982a). Gain scores in research can be highly reliable. Journal of Educational Measurement, 19(2), 149–154.
Zimmerman, D. W., & Williams, R. H. (1982b). On the high predictive potential of change and growth measures. Educational and Psychological Measurement, 42(4), 961–968.