, Volume 65, Issue 2, pp 187–197 | Cite as

Kelley's formula as a basis for the assessment of reliable change

  • Gerard H. Maassen


In the literature on the measurement of change,reliable change is usually determined by means of a confidence interval around an observed value of a statistic that estimates thetrue change. In recent literature on the efficacy of psychotherapies, attention has been particularly directed at the improvement of the estimation of the true change. Reliable Change Indices, incorporating thereliability-weighted measure of individual change, also known as Kelley's formula, have been proposed. According to current practice, these indices are defined as the ratio of such an estimator and an intuitively appealing criterion and then regarded as standard normally distributed statistics. However, because the authors fail to adopt an adequate standard error of the estimator, the statistical properties of their indices are unclear. In this article, it is shown that this can lead to paradoxical conclusions. The adjusted standard error is derived.

Key words

difference scores reliable change index Kelley's formula 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barkham, M., Rees, A., Stiles, W.B., Shapiro, D.A., Hardy, G.E., & Reynolds, S. (1996). Dose-effect relations in time-limited psychotherapy for depression.Journal of Consulting and Clinical Psychology, 64, 927–935.Google Scholar
  2. Bruggemans, E., Van de Vijver, F.J.R., & Huysmans, H.A. (1997). Assessment of cognitive deterioration in individual patients following cardiac surgery: Correcting for measurement error and practice effects.Journal of Clinical and Experimental Neuropsychology, 19, 543–559.Google Scholar
  3. Christensen, L., & Mendoza, J.L. (1986). A method of assessing change in a single subject: An alteration of the RC index.Behavior Therapy, 12, 305–308.Google Scholar
  4. Cohen, J. (1977).Statistical power analysis for the behavioral sciences. New York: Academic Press.Google Scholar
  5. Collins, L.M. (1996). Is reliability obsolete? A commentary on “Are simple gain scores obsolete?.”Applied Psychological Measurement, 20, 289–292.Google Scholar
  6. Cronbach, L.J., & Furby, L. (1970). How we should measure “Change”—or should we?Psychological Bulletin, 74, 68–80.Google Scholar
  7. Debats, D.L. (1996). Meaning in life—Clinical relevance and predictive power.British Journal of Clinical Psychology, 35, 503–516.Google Scholar
  8. De Haan, E., Van Oppen, P., Van Balkom, A.J.L.M., Spinhoven, P., Hoogduin, K.A.L., & Van Dyck, R. (1997). Prediction of outcome and early vs. late improvement in Ocd patients treated with cognitive-behavior therapy and pharmacotherapy.Acta Psychiatrica Scandinavica, 96, 354–361.Google Scholar
  9. Hafkenscheid, A.J.P.M. (1994).Rating scales in treatment efficacy studies: Individualized and normative use. Unpublished doctoral dissertation, Rijksuniversiteit Groningen, Groningen (the Netherlands).Google Scholar
  10. Hageman, W.J.J.M., & Arrindell, W.A. (1993). A further refinement of the reliable change (RC) index byImproving the pre-postDifference score: IntroducingRC ID.Behaviour Research and Therapy, 31, 693–700.Google Scholar
  11. Hsu, L.M. (1989). Reliable changes in psychotherapy: Taking into account regression toward the mean.Behavioral Assessment, 11, 459–467.Google Scholar
  12. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance.Behavior Therapy, 15, 336–352.Google Scholar
  13. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research.Journal of Clinical and Consulting Psychology, 59, 12–19.Google Scholar
  14. Kelley, T.L. (1947).Fundamentals of statistics. Cambridge: Harvard University Press.Google Scholar
  15. Lord, F.M., & Novick, M.R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  16. McNemar, Q. (1958). On growth measurement.Educational and Psychological Measurement, 18, 47–55.Google Scholar
  17. McNemar, Q. (1962).Psychological statistics (3rd ed.). New York: Wiley.Google Scholar
  18. McNemar, Q. (1969).Psychological statistics (4th ed.). New York: Wiley.Google Scholar
  19. Mellenbergh, G.J. (1999). A note on simple gain score precision.Applied Psychological Measurement, 23, 87–89.Google Scholar
  20. Nunnally, J.C., & Kotsch, W.E. (1983). Studies of individual subjects: logic and methods of analysis.British Journal of Clinical Psychology, 22, 83–93.Google Scholar
  21. Ostrom, Th.M. (1966). Perspective as an intervening construct in the judgment of attitude statements.Journal of Personality and Social Psychology, 3, 135–144.Google Scholar
  22. Plewis, I. (1985).Analysing change. Chichester: Wiley.Google Scholar
  23. Rao, C.R. (1973).Linear statistical inference and its applications. New York: Wiley.Google Scholar
  24. Rogosa, D., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change.Psychological Bulletin, 92, 726–748.Google Scholar
  25. Rudy, T.E., Turk, D.C., Kubinski, J.A., & Zaki, H.S. (1995). Differential treatment responses of Tmd patients as a function of psychological characteristics.Pain, 61, 103–112.Google Scholar
  26. Sharma, K.K., & Gupta, J.K. (1986). Optimum reliability of gain scores.Journal of Experimental Education, 54, 105–108.Google Scholar
  27. Smith, M.L., Glass, G.V., & Miller, Th.I. (1980).The Benefits of Psychotherapy. Baltimore: John Hopkins University Press.Google Scholar
  28. Speer, D.C. (1992). Clinically significant change: Jacobson and Truax (1991) revisited.Journal of Consulting and Clinical Psychology, 60, 402–408.Google Scholar
  29. Taylor, S. (1995). Assessment of obsessions and compulsions—Reliability, validity and sensitivity to treatment effects.Clinical Psychology Review, 15, 261–296.Google Scholar
  30. Upshaw, H.S., & Ostrom, Th.M. (1984). Psychological perspective in attitude research. In J.R. Eiser (Ed.),Attitudinal judgment. New York: Springer.Google Scholar
  31. Van Oppen, P., De Haan, E., Van Balkom, A.J.L.M., Spinhoven, P., Hoogduin, K., & Van Dyck, R. (1995). Cognitive therapy and exposure in-vivo in the treatment of obsessive-compulsive disorder.Behaviour Research and Therapy, 33, 379–390.Google Scholar
  32. Willett, J.B. (1988). Questions and answers in the measurement of change. In E.Z. Rothkopf (Ed.),Review of research in education, Vol. 15, 1988–89, pp. 345–422. Washington: American Educational Research Association.Google Scholar
  33. Willett, J.B. (1989). Some results on reliability for the longitudinal measure of change: Implications for the design of studies of individual growth.Educational and Psychological Measurement, 49, 587–602.Google Scholar
  34. Williams, R.H., & Zimmerman, D.W. (1996). Are simple gain scores obsolete?Applied Psychological Measurement, 20, 59–69.Google Scholar
  35. Wykes, T. (1998). What are we changing with neurocognitive rehabilitation—Illustrations from 2 single cases of changes in neuropsychological performance and brain systems as measured by SPECT.Schizophrenia Research, 34, 77–86.Google Scholar
  36. Zimmerman, D.W., & Williams, R.H. (1982). Gain scores in research can be highly reliable.Journal of Educational Measurement, 19, 149–154.Google Scholar

Copyright information

© The Psychometric Society 2000

Authors and Affiliations

  1. 1.Faculty of Social Sciences, Department Methodology and StatisticsUtrecht UniversityUtrechtThe Netherlands

Personalised recommendations