Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas

doi:10.1007/s11336-018-9611-3

Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Published: 30 April 2018

Volume 83, pages 674–695, (2018)
Cite this article

Psychometrika Aims and scope Submit manuscript

1447 Accesses
14 Citations
Explore all metrics

Abstract

Change scores obtained in pretest–posttest designs are important for evaluating treatment effectiveness and for assessing change of individual test scores in psychological research. However, over the years the use of change scores has raised much controversy. In this article, from a multilevel perspective, we provide a structured treatise on several persistent negative beliefs about change scores and show that these beliefs originated from the confounding of the effects of within-person change on change-score reliability and between-person change differences. We argue that psychometric properties of change scores, such as reliability and measurement precision, should be treated at suitable levels within a multilevel framework. We show that, if examined at the suitable levels with such a framework, the negative beliefs about change scores can be renounced convincingly. Finally, we summarize the conclusions about change scores to dispel the myths and to promote the potential and practical usefulness of change scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Factor Analysis Approach to Item Level Change Score Reliability

Item Response Models for Dependent Data: Quasi-exact Tests for the Investigation of Some Preconditions for Measuring Change

Some Remarks on Applications of Tests for Detecting A Change Point to Psychometric Problems

Article 21 October 2016

Sandip Sinharay

References

Allison, P. D. (1990). Change scores as dependent variables in regression analysis. Sociological Methodology, 20(1), 93–114.
Article Google Scholar
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
Google Scholar
Bast, J., & Reitsma, P. (1997). Matthew effects in reading: A comparison of latent growth curve models and simplex models with structured means. Multivariate Behavioral Research, 32(2), 135–167.
Article PubMed Google Scholar
Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change (pp. 3–20). Madison, WI: University of Wisconsin Press.
Google Scholar
Bergman, L. R. (2001). A person approach in research on adolescence: Some methodological challenges. Journal of Adolescent Research, 16(1), 28–53.
Article Google Scholar
Collins, L. M. (1996a). Measurement of change in research on aging: Old and new issues from an individual growth perspective. Handbook of the psychology of aging (4th ed., pp. 38–56). San Diego, CA: Academic Press.
Google Scholar
Collins, L. M. (1996b). Is reliability obsolete? A commentary on “Are simple gain scores obsolete?”. Applied Psychological Measurement, 20(3), 289–292.
Article Google Scholar
Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH: Cengage Learning.
Google Scholar
Cronbach, L. J., & Furby, L. (1970). How we should measure “change”: Or should we? Psychological Bulletin, 74(1), 68–80.
Article Google Scholar
Denney, C. B., Rapport, M. D., & Chung, K.-M. (2005). Interactions of task and subject variables among continuous performance tests. Journal of Child Psychology and Psychiatry, 46(4), 420–435.
Article PubMed Google Scholar
Diggle, P., Heagerty, P., Liang, K. Y., & Zeger, S. (2013). Analysis of longitudinal data (2nd ed.). Oxford: OUP Oxford.
Google Scholar
Draheim, C., Hicks, K. L., & Engle, R. W. (2016). Combining reaction time and accuracy: The relationship between working memory capacity and task switching as a case example. Perspectives on Psychological Science, 11(1), 133–155.
Article PubMed Google Scholar
Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56(3), 495–515.
Article Google Scholar
Finney, J. W., Moos, R. H., & Mewborn, C. R. (1980). Posttreatment experiences and treatment outcome of alcoholic patients six months and two years after hospitalization. Journal of Consulting and Clinical Psychology, 48(1), 17–29.
Article PubMed Google Scholar
Fiszdon, J. M., & Johannesen, J. K. (2010). Comparison of computational methods for the evaluation of learning potential in schizophrenia. Journal of the International Neuropsychological Society: JINS, 16, 613–620.
Article PubMed Google Scholar
Gjerustad, C., & von Soest, T. (2012). Socio-economic status and mental health: The importance of achieving occupational aspirations. Journal of Youth Studies, 15(7), 890–908.
Article Google Scholar
Gold, A. B., Ewing-Cobbs, L., Cirino, P., Fuchs, L. S., Stuebing, K. K., & Fletcher, J. M. (2013). Cognitive and behavioral attention in children with math difficulties. Child Neuropsychology, 19(4), 420–437.
Article PubMed Google Scholar
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.
Article Google Scholar
Guo, Y., Tompkins, V., Justice, L., & Petscher, Y. (2014). Classroom age composition and vocabulary development among at-risk preschoolers. Early Education and Development, 25(7), 1016–1034.
Article PubMed PubMed Central Google Scholar
Hertzog, C., von Oertzen, T., Ghisletta, P., & Lindenberger, U. (2008). Evaluating the power of latent growth curve models to detect individual differences in change. Structural Equation Modeling, 15(4), 541–563.
Article Google Scholar
Holahan, C. J., & Moos, R. H. (1981). Social support and psychological distress: A longitudinal analysis. Journal of Abnormal Psychology, 90(4), 365–370.
Article PubMed Google Scholar
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge.
Book Google Scholar
Hughes, M. M., Linck, J. A., Bowles, A. R., Koeth, J. T., & Bunting, M. F. (2014). Alternatives to switch-cost scoring in the task-switching paradigm: Their reliability and increased validity. Behavior Research Methods, 46(3), 702–721.
Article PubMed Google Scholar
Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment. Applied Psychological Measurement, 40(8), 559–572.
Article PubMed PubMed Central Google Scholar
Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15(4), 336–352.
Article Google Scholar
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12–19.
Article PubMed Google Scholar
Kelly, S., & Ye, F. (2017). Accounting for the relationship between initial status and growth in regression models. The Journal of Experimental Education, 85(3), 353–375.
Article Google Scholar
Kerckhoff, A. C. (1986). Effects of ability grouping in British secondary schools. American Sociological Review, 51(6), 842–858.
Article Google Scholar
Kim, S., & Camilli, G. (2014). An item response theory approach to longitudinal analysis with application to summer setback in preschool language/literacy. Large-Scale Assessments in Education, 2(1), 1.
Google Scholar
Li, F., Cohen, A., Bottge, B., & Templin, J. (2016). A latent transition analysis model for assessing change in cognitive skills. Educational and Psychological Measurement, 76(2), 181–204.
Article PubMed Google Scholar
Linn, R. L., & Haug, C. (2002). Stability of school-building accountability scores and gains. Educational Evaluation and Policy Analysis, 24(1), 29–36.
Article Google Scholar
Linn, R. L., & Slinde, J. A. (1977). The determination of the significance of change between pre- and posttesting periods. Review of Educational Research, 47(1), 121–150.
Article Google Scholar
Lord, F. M. (1956). The measurement of growth. ETS Research Bulletin Series, 1956(1), i-22.
Article Google Scholar
Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris (Ed.), Problems in measuring change (pp. 21–38). Madison, WI: The University of Wisconsin Press.
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Adison-Wesley.
Google Scholar
McArdle, J. J., Petway, K. T., & Hishinuma, E. S. (2015). IRT for growth and change. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (1st ed., pp. 435–456). New York, NY: Routledge.
Google Scholar
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1(3), 293–299.
Article Google Scholar
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55(1), 107–122.
Article Google Scholar
Nesselroade, J. R. (1991). Interindividual differences in intraindividual change. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 92–105). Washington, DC: American Psychological Association.
Chapter Google Scholar
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592.
PubMed Google Scholar
O’Connor, E. F. (1972). Extending classical test theory to the measurement of change. Review of Educational Research, 42(1), 73–97.
Article Google Scholar
Ogles, B. M., Lunnen, K. M., & Bonesteel, K. (2001). Clinical significance: History, application, and current practice. Clinical Psychology Review, 21(3), 421–446.
Article PubMed Google Scholar
Overall, J. E., & Woodward, J. A. (1975). Unreliability of difference scores: A paradox for measurement of change. Psychological Bulletin, 82(1), 85–86.
Article Google Scholar
Parker, G. R., & Dabros, M. S. (2012). Last-period problems in legislatures. Public Choice, 151(3), 789–806.
Article Google Scholar
Raaijmakers, J. G. W. (2016). On testing the strength independence assumption in retrieval-induced forgetting. Psychonomic Bulletin & Review, 23(5), 1374–1381.
Article Google Scholar
Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501–525.
Article PubMed Google Scholar
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed., Vol. 1). London: Sage.
Google Scholar
Raykov, T. (1993). A structural equation model for measuring residualized change and discerning patterns of growth or decline. Applied Psychological Measurement, 17(1), 53–71.
Article Google Scholar
Reckase, M. (2009). Multidimensional item response theory (Vol. 150). New York, NY: Springer.
Book Google Scholar
Rogosa, D. R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92(3), 726–748.
Article Google Scholar
Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335–343.
Article Google Scholar
Roohr, K. C., Liu, H., & Liu, O. L. (2016). Investigating student learning gains in college: A longitudinal study. Studies in Higher Education, 42(12), 2284–2300.
Article Google Scholar
Sandell, R., & Wilczek, A. (2016). Another way to think about psychological change: Experiential vs. incremental. European Journal of Psychotherapy & Counselling, 18(3), 228–251.
Article Google Scholar
Schunemann, H. J., & Guyatt, G. H. (2005). Commentary—goodbye M(C)ID! Hello MID, where do you come from? Health Services Research, 40(2), 593–597.
Article PubMed PubMed Central Google Scholar
Sijtsma, K., & van der Ark, L. A. (2015). Conceptions of reliability revisited and practical recommendations. Nursing Research, 64(2), 128–136.
Article PubMed Google Scholar
Snijders, T. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
Book Google Scholar
Son, S.-H., & Morrison, F. J. (2010). The nature and impact of changes in home learning environment on development of language and academic skills in preschool children. Developmental Psychology, 46(5), 1103–1118.
Article PubMed Google Scholar
Stanley, J. C. (1967). General and special formulas for reliability of differences. Journal of Educational Measurement, 4(4), 249–252.
Article Google Scholar
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21(4), 360–407.
Article Google Scholar
Stevenson, C. E., Heiser, W. J., & Resing, W. C. M. (2013). Working memory as a moderator of training and transfer of analogical reasoning in children. Contemporary Educational Psychology, 38(3), 159–169.
Article Google Scholar
Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). New York, NY: Oxford University Press.
Book Google Scholar
Trompetter, H. R., Lamers, S. M. A., Westerhof, G. J., Fledderus, M., & Bohlmeijer, E. T. (2017). Both positive mental health and psychopathology should be monitored in psychotherapy: Confirmation for the dual-factor model in acceptance and commitment therapy. Behaviour Research and Therapy, 91, 58–63.
Article PubMed Google Scholar
Willett, J. B. (1988). Questions and answers in the measurement of change. Review of Research in Education, 15, 345–422.
Google Scholar
Williams, B. J., & Kaufmann, L. M. (2012). Reliability of the go/no go association task. Journal of Experimental Social Psychology, 48(4), 879–891.
Article Google Scholar
Williams, R. H., & Zimmerman, D. W. (1977). The reliability of difference scores when errors are correlated. Educational and Psychological Measurement, 37(3), 679–689.
Article Google Scholar
Williams, R. H., & Zimmerman, D. W. (1996). Are simple gain scores obsolete? Applied Psychological Measurement, 20(1), 59–69.
Article Google Scholar
Wise, E. A. (2004). Methods for analyzing psychotherapy outcomes: A review of clinical significance, reliable change, and recommendations for future directions. Journal of Personality Assessment, 82(1), 50–59.
Article PubMed Google Scholar
Zimmerman, D. W. (1994). A note on interpretation of formulas for the reliability of differences. Journal of Educational Measurement, 31(2), 143–147.
Article Google Scholar
Zimmerman, D. W., & Williams, R. H. (1982a). Gain scores in research can be highly reliable. Journal of Educational Measurement, 19(2), 149–154.
Article Google Scholar
Zimmerman, D. W., & Williams, R. H. (1982b). On the high predictive potential of change and growth measures. Educational and Psychological Measurement, 42(4), 961–968.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Methodology and Statistics, TSB, Tilburg University, PO Box 90153, 5000 LE, Tilburg, The Netherlands
Zhengguo Gu, Wilco H. M. Emons & Klaas Sijtsma

Authors

Zhengguo Gu
View author publications
You can also search for this author in PubMed Google Scholar
Wilco H. M. Emons
View author publications
You can also search for this author in PubMed Google Scholar
Klaas Sijtsma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengguo Gu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xlsx 15 KB)

Appendices

Appendix A

Equation (11) models the true pretest score $\tau _{v1}$ and the true posttest score $\tau _{v1}+\delta _v$. Alternatively, we may model the true pretest score $\tau _{v1}$ and the true change $\delta _v$ (rather than the true posttest score) as follows:

$$\begin{aligned} \begin{bmatrix} \tau _{v1}\\ \delta _v \end{bmatrix}=\begin{bmatrix} \mu _{\tau _1}\\ \mu _\delta \end{bmatrix}+\begin{bmatrix} \upsilon _1\\ \upsilon _2 \end{bmatrix}, \end{aligned}$$

(A1)

with

$$\begin{aligned} \begin{bmatrix} \upsilon _1\\ \upsilon _2 \end{bmatrix} \sim N(\mathbf {0}, \mathbf {\Sigma }_\upsilon ), \end{aligned}$$

(A2)

where $\mathbf {\Sigma }_\upsilon =\begin{bmatrix} \sigma ^2_{\tau _1}&\sigma _{\tau _1\delta }\\ \sigma _{\tau _1\delta }&\sigma ^2_\delta \end{bmatrix}$. $\sigma ^2_{\tau _1}$ denotes the variance of the true pretest score. $\sigma _{\tau _1\delta }$ denotes the covariance between the true pretest score and true change. $\sigma ^2_\delta $ denotes the variance of true change. Given (A1) and (A2), we can derive the variance of the true posttest score, denoted by $\sigma ^2_{\tau _{2}}$, and the covariance between the true pretest score and the true posttest score, denoted by $\sigma _{\tau _{1}\tau _{2}}$, as follows:

$$\begin{aligned} \sigma ^2_{\tau _{2}}=\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta }+\sigma ^2_\delta , \end{aligned}$$

(A3)

and

$$\begin{aligned} \sigma _{\tau _{1}\tau _{2}}=\sigma ^2_{\tau _1}+\sigma _{\tau _1\delta }. \end{aligned}$$

(A4)

Note that in Eq. (13) the covariance matrix $\mathbf {\Sigma }_\omega $ can be expressed as

$$\begin{aligned} \mathbf {\Sigma }_\omega =\begin{bmatrix} \sigma ^2_{\tau _{1}}&\sigma _{\tau _{1}\tau _{2}}\\ \sigma _{\tau _{1}\tau _{2}}&\sigma ^2_{\tau _{2}} \end{bmatrix}, \end{aligned}$$

(A5)

and given (A3) and (A4), we thus can derive

$$\begin{aligned} \mathbf {\Sigma }_\omega = \begin{bmatrix} \sigma ^2_{\tau _1}&\sigma ^2_{\tau _1}+\sigma _{\tau _1\delta }\\ \sigma ^2_{\tau _1}+\sigma _{\tau _1\delta }&\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta }+\sigma ^2_\delta \end{bmatrix}. \end{aligned}$$

(A6)

Appendix B

Let $\sigma ^2_1$ denote the variance of the observed pretest score, $\sigma ^2_2$ denote the variance of the observed posttest score, and let $\sigma _{12}$ denote the covariance between the pretest score and the posttest score. Then, the variance of change scores $\sigma ^2_D$ is

$$\begin{aligned} \sigma ^2_D=\sigma ^2_2-2\sigma _{12}+\sigma ^2_1. \end{aligned}$$

(B1)

According to Eq. (16),

$$\begin{aligned} \sigma ^2_2= & {} \sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\tau _1}+\sigma ^2_{\varepsilon _2}, \end{aligned}$$

(B2)

$$\begin{aligned} \sigma _{12}= & {} \sigma _{\tau _1\delta }+\sigma ^2_{\tau _1}+\sigma _{\varepsilon _1\varepsilon _2}, \end{aligned}$$

(B3)

and

$$\begin{aligned} \sigma ^2_1=\sigma ^2_{\varepsilon _1}+\sigma ^2_{\tau _1}. \end{aligned}$$

(B4)

Thus, replacing the right-hand side of Eq. (B1) with (B2), (B3), and (B4), we obtain

$$\begin{aligned} \sigma ^2_D = \sigma ^2_\delta +\sigma ^2_{\varepsilon _1} + \sigma ^2_{\varepsilon _2} -2\sigma _{\varepsilon _1\varepsilon _2} \end{aligned}$$

(B5)

as desired.

Appendix C

Here, it suffices to show that it is possible to obtain positive values for $\rho _{DD'}-\rho _{11'}$ and $\rho _{DD'}-\rho _{22'}$ theoretically. Whether $\rho _{DD'}-\rho _{11'}>0$ and $\rho _{DD'}-\rho _{22'}>0$ are observed in empirical studies is irrelevant.

$\rho _{DD'}-\rho _{11'}$ can be derived as follows. Given Eqs. (17) and (19),

$$\begin{aligned} \begin{aligned} \rho _{DD'}-\rho _{11'}&= \frac{\sigma ^2_\delta \sigma ^2_{\varepsilon _1}-\sigma ^2_{\tau _1}(\sigma ^2_{\varepsilon _1}+ \sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})}{(\sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+\sigma ^2_{\varepsilon _1})} \end{aligned} \end{aligned}$$

(C1)

According to the Cauchy–Schwarz inequality,

$$\begin{aligned} \sigma _{\varepsilon _1\varepsilon _2}\le \sigma _{\varepsilon _1}\sigma _{\varepsilon _2}, \end{aligned}$$

which implies that

$$\begin{aligned} \sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2}\ge \sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2} -2\sigma _{\varepsilon _1}\sigma _{\varepsilon _2}=(\sigma _{\varepsilon _1}-\sigma _{\varepsilon _2})^2. \end{aligned}$$

This means that the denominator of (C1) is always positive. The denominator can equal 0, when $\sigma _{\varepsilon _1}=\sigma _{\varepsilon _2}=\sigma _{\tau _1}=\sigma _{\delta }=0$. The numerator of (C1) can be positive as well, as long as (for example) $\sigma ^2_\delta $ is high enough (keeping everything else constant) so that $\sigma ^2_\delta \sigma ^2_{\varepsilon _1}>\sigma ^2_{\tau _1}(\sigma ^2_{\varepsilon _1}+ \sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})$. Thus, we have shown that $\rho _{DD'}-\rho _{11'}>0$ is possible.

$\rho _{DD'}-\rho _{22'}$ can be derived as follows. Given Eqs. (17) and (20),

$$\begin{aligned} \begin{aligned} \rho _{DD'}-\rho _{22'}&=\frac{\sigma ^2_\delta }{\sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2}} - \frac{\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }}{\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\varepsilon _2}}\\&= \frac{\sigma ^2_\delta (\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\varepsilon _2})-(\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta })(\sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})}{(\sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\varepsilon _2})}\\&= \frac{(-\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta })+\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})}{(\sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\varepsilon _2})}. \end{aligned} \end{aligned}$$

(C2)

We first examine the denominator of (C2). According to the Cauchy–Schwarz inequality,

$$\begin{aligned} -\sigma _{\varepsilon _1}\sigma _{\varepsilon _2}\le \sigma _{\varepsilon _1\varepsilon _2}\le \sigma _{\varepsilon _1}\sigma _{\varepsilon _2}, \end{aligned}$$

and

$$\begin{aligned} -\sigma _{\tau _1}\sigma _{\delta }\le \sigma _{\tau _1\delta }\le \sigma _{\tau _1}\sigma _{\delta }, \end{aligned}$$

and thus, it can be proven that

$$\begin{aligned} \sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1\varepsilon _2}\ge \sigma ^2_\delta +\sigma ^2_{\varepsilon _1}+\sigma ^2_{\varepsilon _2}-2\sigma _{\varepsilon _1}\sigma _{\varepsilon _2}=\sigma ^2_\delta +(\sigma _{\varepsilon _1}-\sigma _{\varepsilon _2})^2\ge 0, \end{aligned}$$

and

$$\begin{aligned} \sigma ^2_{\tau _1}+\sigma ^2_\delta +2\sigma _{\tau _1\delta }+\sigma ^2_{\varepsilon _2}\ge \sigma ^2_{\tau _1}+\sigma ^2_\delta -2\sigma _{\tau _1}\sigma _\delta +\sigma ^2_{\varepsilon _2}= (\sigma _{\tau _1}-\sigma _\delta )^2+\sigma ^2_{\varepsilon _2}\ge 0. \end{aligned}$$

Therefore, the denominator of (C2) is always positive. The denominator can equal 0, when $\sigma _{\tau _1}=\sigma _{\delta }=\sigma _{\varepsilon _1}=\sigma _{\varepsilon _2}=0$.

We now examine whether the numerator of (C2)—$(-\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta })+\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})$—can be positive. According to the Cauchy–Schwarz inequality, $-\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2}$ cannot be positive:

$$\begin{aligned} -\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2}\le -\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2} + 2\sigma _{\varepsilon _1}\sigma _{\varepsilon _2} = -(\sigma _{\varepsilon _1}-\sigma _{\varepsilon _2})^2\le 0. \end{aligned}$$

(C3)

Therefore, the numerator of (C2) can be positive due to the following three sufficient conditions: 1) $\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta }<0$ and $\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})>0$; 2) $\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta }<0$, $\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})<0$, but $(-\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta })+\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})>0$; and 3) $\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})>0$, $\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta }>0$, but $(-\sigma ^2_{\varepsilon _1}-\sigma ^2_{\varepsilon _2}+2\sigma _{\varepsilon _1\varepsilon _2})(\sigma ^2_{\tau _1}+2\sigma _{\tau _1\delta })+\sigma ^2_\delta (2\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1})>0$. The three conditions can happen given suitable values for the parameters. Take the first condition for example, the numerator of (C2) is positive if $\sigma _{\tau _1\delta }<(-1/2)\sigma ^2_{\tau _1}$ and $\sigma _{\varepsilon _1\varepsilon _2}>(1/2)\sigma ^2_{\varepsilon _1}$. Thus, we have shown that $\rho _{DD'}-\rho _{22'}>0$ is possible, given suitable values for $\sigma ^2_{\tau _1}$, $\sigma ^2_{\delta }$, $\sigma ^2_{\varepsilon _1}$, $\sigma ^2_{\varepsilon _2}$, $\sigma _{\tau _1\delta }$, and $\sigma _{\varepsilon _1\varepsilon _2}$.

Appendix D

The covariance between D and $X_1$, denoted by $\sigma _{D1}$, is

$$\begin{aligned} \sigma _{D1} = \sigma _{12} - \sigma ^2_1. \end{aligned}$$

(D1)

Thus, replacing the right-hand side of Eq. (D1) with (B3) and (B4), we obtain

$$\begin{aligned} \sigma _{D1}= \sigma _{\tau _1\delta }+\sigma _{\varepsilon _1\varepsilon _2}-\sigma ^2_{\varepsilon _1}, \end{aligned}$$

(D2)

and hence (D2) is the numerator in Eq. (22).

Appendix E

We assume the variance of true pretest scores is equal to one (i.e., $\sigma ^2_{\tau _1}=1$), which is to identify the scale, and assume 75%, 50%, and 25% of the individuals at the population level have a true change score larger than .5 SD, .75 SD, and 1 SD of the true pretest scores, respectively. Therefore, we need to find a normal distribution with true change scores equal .5, .75, and 1 corresponding to the 25th, 50th and 75th percentiles. Figure 2 presents such a normal distribution of true change scores with $\mu =.75$ and $\sigma ^2=.14$. The variance of this distribution is obtained as follows. For a standard normal distribution, we know that $P(Z>z=.67)=.25$, and therefore, the variance of the true change scores should be $((1-.75)/.67)^2\approx .14$. To see this, let $\sigma $ be the standard deviation of the normal distribution of true change scores. When true score equals 1, if we standardize the true score, then we obtain $(1-.75)/\sigma = .67$. Solving $\sigma $ and taking the square of $\sigma $, we obtain .14.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, Z., Emons, W.H.M. & Sijtsma, K. Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs. Psychometrika 83, 674–695 (2018). https://doi.org/10.1007/s11336-018-9611-3

Download citation

Received: 09 February 2017
Revised: 16 February 2018
Published: 30 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11336-018-9611-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Abstract

Access this article

Similar content being viewed by others

A Factor Analysis Approach to Item Level Change Score Reliability

Item Response Models for Dependent Data: Quasi-exact Tests for the Investigation of Some Preconditions for Measuring Change

Some Remarks on Applications of Tests for Detecting A Change Point to Psychometric Problems

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (xlsx 15 KB)

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Review of Issues About Classical Change Scores: A Multilevel Modeling Perspective on Some Enduring Beliefs

Abstract

Access this article

Similar content being viewed by others

A Factor Analysis Approach to Item Level Change Score Reliability

Item Response Models for Dependent Data: Quasi-exact Tests for the Investigation of Some Preconditions for Measuring Change

Some Remarks on Applications of Tests for Detecting A Change Point to Psychometric Problems

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (xlsx 15 KB)

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation