Abstract
Clinical cognitive ability assessment—and its corollary, score interpretation—are in a state of disarray. Many current instruments are designed to provide a bevy of scores to appeal to a variety of school psychologists. These scores are not all grounded in the attribute’s theory or developed from sound measurement or psychometric theory. Thus, for a given instrument, there can be substantial variation between school psychologists when interpreting scores from the same instrument. This is contrary to the very purpose of psychological assessment. As a contrast, we provide a sketch of theoretically driven test development and score interpretation. In addition, we provide examples of how this could be implemented using two theories of intelligence (Spearman’s two-factor and Cattell and Horn’s Gf-Gc) and measurement theory about the nature of psychological test scores. While different from what is often implemented by school psychologists, it is consistent with the guiding principles of evidence-based psychological assessment.
Similar content being viewed by others
Notes
Unfortunately, David Wechsler never actually defined the attribute he was measuring with the “verbal” subtests on his instruments. Instead, it appears he included them because he wanted to have a cognitive ability test that was different from those that were already in existence circa 1920s.
My usual examination of subjects included, in addition to a short interview, administration of the Stanford-Binet or Yerkes Point Scale, and nearly always one or more of the available performance tests. It then occurred to me that an intelligence scale, combining verbal and nonverbal tests, would be a useful addition to the psychometrist’s armamentarium (Wechsler 1981, p. 83).
A possible exception is the WJ IV, which uses principal component analysis-derived weights for the calculation of the General Intellectual Ability score. Although the results are “truly enough not identical with ‘g’ [they] are usually at any rate very good approximations to it” (Spearman 1946, p. 121).
There are other noted abilities contained within Gf-Gc theory (Horn and Blankson 2012), but Gf and Gc are believed to make the most important contributions to intellectual functioning.
Readers interested in information on how to calculate these statistics can consult Grice (2001).
We calculated reliability of the aggregate scores using the Guttman-Cronbach α.
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA/APA/NCME]. (2014). Standards for educational and psychological testing (4th ed.). Washington, DC: Authors.
Beaujean, A. A. (2018). Simulating data for clinical research: a tutorial. The Journal of Psychoeducational Assessment, 36, 7–20. https://doi.org/10.1177/0734282917690302.
Beaujean, A. A., & Sheng, Y. (2014). Assessing the Flynn effect in the Wechsler scales. Journal of Individual Differences, 35, 63–78. https://doi.org/10.1027/1614-0001/a000128.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061.
Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., & Franić, S. (2009). The end of construct validity. In R. W. Lissitz (Ed.), The concept of validity: revisions, new directions, and applications (pp. 135–170). Charlotte: Information Age Publishing.
Braden, J. P., & Ouzts, S. M. (2005). Review of the Kaufman assessment battery for children, second edition. In B. S. Plake & J. C. Impara (Eds.), The sixteenth mental measurements yearbook (2nd ed., pp. 517–520). Lincoln: Buros Institute of Mental Measurements.
Bringmann, L. F., & Eronen, M. I. (2016). Heating up the measurement debate: what psychologists can learn from the history of physics. Theory & Psychology, 26, 27–43. https://doi.org/10.1177/0959354315617253.
Canivez, G. L., & Watkins, M. W. (2016). Review of the Wechsler intelligence scale for children-fifth edition: critique, commentary, and independent analyses. In A. S. Kaufman, S. E. Raiford, & D. L. Coalson (Eds.), Intelligent testing with the WISC-V (pp. 683–702). Hoboken: Wiley.
Carroll, J. B. (1996). A three-stratum theory of intelligence: Spearman’s contribution. In I. Dennis & P. Tapsfield (Eds.), Human abilities: their nature and measurement (pp. 1–17). Mahwah: Erlbaum.
Cattell, R. B. (1943). The measurement of adult intelligence. Psychological Bulletin, 40, 153–193. https://doi.org/10.1037/h0059973.
Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: a critical experiment. Journal of Educational Psychology, 54, 1–22. https://doi.org/10.1037/h0046743.
Cattell, R. B. (1987). Intelligence: its structure, growth, and action. New York: Elsevier.
Courville, T., Coalson, D. L., Kaufman, A. S., & Raiford, S. E. (2016). Does WISC-V scatter matter? In A. S. Kaufman, S. E. Raiford, & D. L. Coalson (Eds.), Intelligent testing with the WISC-V (pp. 209–228). Hoboken: Wiley.
Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of testing (pp. 3–25). Mahwah: Lawrence Erlbaum.
Finkelstein, L. (2005). Problems of measurement in soft systems. Measurement, 38, 267–274. https://doi.org/10.1016/j.measurement.2005.09.002.
Flanagan, D. P., & Alfonso, V. C. (2017). Essentials of WISC-V assessment (2nd ed.). Hoboken: Wiley.
Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2013). Essentials of cross-battery assessment (3rd ed.). Hoboken: Wiley.
Floyd, R. G., Bergeron, R., McCormack, A. C., Anderson, J. L., & Hargrove-Owens, G. L. (2005). Are Cattell-Horn-Carroll (CHC) broad ability composite scores exchangeable across batteries? School Psychology Review, 34, 329–357.
Frazier, T. W., & Youngstrom, E. A. (2007). Historical increase in the number of factors measured by commercial tests of cognitive ability: are we overfactoring? Intelligence, 35, 169–182. https://doi.org/10.1016/j.intell.2006.07.002.
Grace, J. B., & Bollen, K. A. (2008). Representing general theoretical concepts in structural equation models: the role of composite variables. Environmental and Ecological Statistics, 15, 191–213. https://doi.org/10.1007/s10651-007-0047-7.
Grégoire, J. (2013). Measuring components of intelligence: mission impossible? Journal of Psychoeducational Assessment, 31, 138–147. https://doi.org/10.1177/0734282913478034.
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430–450. https://doi.org/10.1037/1082-989X.6.4.430.
Groth-Marnat, G. (1999). Financial efficacy of clinical assessment: rational guidelines and issues for future research. Journal of Clinical Psychology, 55, 813–824.
Grove, W. M., & Vrieze, S. I. (2013). The clinical versus mechanical prediction controversy. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J. I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 2: Testing and assessment in clinical and counseling psychology (pp. 51–62). Washington, DC: American Psychological Association.
Hale, J. B., Fiorello, C. A., Kavanagh, J. A., Hoeppner, J.-A. B., & Gaither, R. A. (2001). WISC-III predictors of academic achievement for children with learning disabilities: are global and factor scores comparable? School Psychology Quarterly, 16, 31–55. https://doi.org/10.1521/scpq.16.1.31.19158.
Horn, J. L. (1963). Equations representing combinations of components in scoring psychological variables. Acta Psychologica, 21, 184–217. https://doi.org/10.1016/0001-6918(63)90048-9.
Horn, J. L. (1985). Remodeling old models of intelligence. In B. B. Wolman (Ed.), Handbook of intelligence (pp. 267–300). New York: Wiley.
Horn, J. L. (1989). Models of intelligence. In R. L. Linn (Ed.), Intelligence, measurement, theory and public policy (pp. 29–73). Urbana: University of Illinois Press.
Horn, J. L. (1991). Measurement of intellectual capabilities: a review of theory. In K. S. McGrew, J. K. Werder, & R. W. Woodcock (Eds.), Woodcock-Johnson psycho-educational battery-revised technical manual (pp. 197–232). Chicago: Riverside.
Horn, J. L., & Blankson, A. N. (2012). Foundations for better understanding of cognitive abilities. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: theories, tests, and issues (3rd ed., pp. 73–98). New York: Guilford Press.
Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized intelligence. Journal of Educational Psychology, 57, 253–270. https://doi.org/10.1037/h0023816.
Horn, J. L., & McArdle, J. J. (2007). Understanding human intelligence since Spearman. In R. Cudeck & R. C. MacCallum (Eds.), Factor analysis at 100: historical developments and future directions (pp. 205–247). Mahwah: Erlbaum.
Hunsley, J., & Mash, E. J. (2007). Evidence-based assessment. Annual Review of Clinical Psychology, 3, 29–51. https://doi.org/10.1146/annurev.clinpsy.3.022806.091419.
Jackson, J. S. H., & Maraun, M. (1996). The conceptual validity of empirical scale construction: the case of the sensation seeking scale. Personality and Individual Differences, 21, 103–110. https://doi.org/10.1016/0191-8869(95)00217-0.
Jensen, A. R. (1993). Psychometric g and achievement. In B. R. Gifford (Ed.), Policy perspectives on educational testing (pp. 117–227). New York: Kluwer Academic Publishers.
Jensen, A. R. (2002). Galton’s legacy to research on intelligence. Journal of Biosocial Science, 34, 145–172. https://doi.org/10.1017/s0021932002001451.
Kamphaus, R. W., Winsor, A. P., Rowe, E. W., & Kim, S. (2012). A history of intelligence test interpretation. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (3rd ed., pp. 56–70). New York: Guilford.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000.
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman assessment battery for children-second edition. Circle Pines: American Guidance Service.
Kaufman, A. S., Raiford, S. E., & Coalson, D. L. (2016). Intelligent testing with the WISC-V. Hoboken: Wiley.
Keith, T. Z., & Reynolds, M. R. (2010). Cattell-Horn-Carroll abilities and cognitive tests: what we’ve learned from 20 years of research. Psychology in the Schools, 47, 635–650. https://doi.org/10.1002/pits.20496.
Kingston, N. M., Scheuring, S. T., & Kramer, L. B. (2013). Test development strategies. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J. I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1: test theory and testing and assessment in industrial and organizational psychology (pp. 165–184). Washington, DC: American Psychological Association.
Kline, P. (2000). The handbook of psychological testing (2nd ed.). London: Routledge.
Krause, M. S. (2012). Measurement validity is fundamentally a matter of definition, not correlation. Review of General Psychology, 16, 391–400. https://doi.org/10.1037/a0027701.
Krause, M. S. (2013). The data analytic implications of human psychology’s dimensions being ordinally scaled. Review of General Psychology, 17, 318–325. https://doi.org/10.1037/a0032292.
Littell, W. M. (1960). The Wechsler intelligence scale for children: review of a decade of research. Psychological Bulletin, 57, 132–156. https://doi.org/10.1037/h0044513.
Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
Luria, A. R. (1973). The working brain: an introduction to neuropsychology. New York: Basic Books.
Maraun, M. D. (1998a). Measurement as a normative practice: implications of Wittgenstein’s philosophy for measurement in psychology. Theory & Psychology, 8, 435–461. https://doi.org/10.1177/0959354398084001.
Maraun, M. D. (1998b). The nexus misconceived: Wittgenstein made silly. Theory & Psychology, 8, 489–501. https://doi.org/10.1177/0959354398084004.
Mari, L., Carbone, P., & Petri, D. (2015). Fundamentals of hard and soft measurement. In A. Ferrero, D. Petri, P. Carbone & M. Catelani (Eds.), Modern measurements: Fundamentals and applications (pp. 203–262). Hoboken, NJ: Wiley-IEEE Press.
McDonald, R. P. (1999). Test theory: a unified treatment. Mahwah: Erlbaum.
McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. https://doi.org/10.1016/j.intell.2008.08.004.
McGrew, K. S., LaForte, E. M., & Schrank, F. A. (2014). Woodcock- Johnson IV technical manual. Rolling Meadows: Riverside.
Michell, J. (1999). Measurement in psychology: critical history of a methodological concept. New York: Cambridge University Press.
Michell, J. (2007). Measurement. In S. P. Turner & M. W. Risjord (Eds.), Philosophy of anthropology and sociology (pp. 71–119). Amsterdam: North Holland.
Michell, J. (2011). Qualitative research meets the ghost of Pythagoras. Theory & Psychology, 21, 241–259. https://doi.org/10.1177/0959354310391351.
Michell, J. (2012). Alfred Binet and the concept of heterogeneous orders. Frontiers in Psychology, 3(261), 1–8. https://doi.org/10.3389/fpsyg.2012.00261.
Petri, D., Mari, L., & Carbone, P. (2015). A structured methodology for measurement development. IEEE Transactions on Instrumentation and Measurement, 64, 2367–2379. https://doi.org/10.1109/TIM.2015.2399023.
Pfeiffer, S. I., Reddy, L. A., Kletzel, J. E., Schmelzer, E. R., & Boyer, L. M. (2000). The practitioner’s view of IQ testing and profile analysis. School Psychology Quarterly, 15, 376–385. https://doi.org/10.1037/h0088795.
R Development Core Team. (2017). R: a language and environment for statistical computing (version 3.3.3) [computer program]. Vienna: R Foundation for Statistical Computing.
Raiford, S. E. (2017). Essentials of WISC-V integrated assessment. Hoboken: Wiley.
Schneider, W. J. (2013). What if we took our models seriously? Estimating latent scores in individuals. Journal of Psychoeducational Assessment, 31, 186–201. https://doi.org/10.1177/0734282913478046.
Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (3rd ed., pp. 99–144). New York: Guilford.
Schrank, F. A., McGrew, K. S., & Mather, N. (2014). Woodcock-Johnson IV tests of cognitive abilities. Rolling Meadows: Riverside.
Sijtsma, K. (2012). Psychological measurement between physics and statistics. Theory & Psychology, 22, 786–809. https://doi.org/10.1177/0959354312454353.
Sijtsma, K. (2013). Theory development as a precursor for test validity. In R. E. Millsap, L. A. van der Ark, D. M. Bolt, & C. M. Woods (Eds.), New developments in quantitative psychology: presentations from the 77th annual psychometric society meeting (pp. 267–274). New York: Springer.
Spearman, C. E. (1927). The abilities of man: their nature and measurement. New York: Blackburn Press.
Spearman, C. E. (1931). Our need of some science in place of the word ‘intelligence’. Journal of Educational Psychology, 22, 401–410. https://doi.org/10.1037/h0070599.
Spearman, C. E. (1939). Thurstone’s work re-worked. Journal of Educational Psychology, 30, 1–16. https://doi.org/10.1037/h0061267.
Spearman, C. E. (1946). Theory of general factor. British Journal of Psychology, 36, 117–131. https://doi.org/10.1111/j.2044-8295.1946.tb01114.x.
Thomson, G. H. (1927). The tetrad-difference criterion. British Journal of Psychology. General Section, 17, 235–255. https://doi.org/10.1111/j.2044-8295.1927.tb00426.x.
Thurstone, L. L. (1935). The vectors of mind: multiple-factor analysis for the isolation of primary traits. Chicago: University of Chicago Press.
Tomarken, A. J., & Waller, N. G. (2003). Potential problems with “well fitting” models. Journal of Abnormal Psychology, 112, 578–598. https://doi.org/10.1037/0021-843X.112.4.578.
Wechsler, D. (1950). Cognitive, conative, and non-intellective intelligence. American Psychologist, 5, 78–83. https://doi.org/10.1037/h0063112.
Wechsler, D. (1975). Intelligence defined and undefined: a relativistic appraisal. American Psychologist, 30, 135–139. https://doi.org/10.1037/h0076868.
Wechsler, D. (1981). The psychometric tradition: developing the Wechsler adult intelligence scale. Contemporary Educational Psychology, 6, 82–85. https://doi.org/10.1016/0361-476X(81)90035-7.
Wechsler, D. (2014). Wechsler intelligence scale for children-fifth edition administration and scoring manual. Bloomington: NCS Pearson.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Beaujean, A.A., Benson, N.F. Theoretically-Consistent Cognitive Ability Test Development and Score Interpretation. Contemp School Psychol 23, 126–137 (2019). https://doi.org/10.1007/s40688-018-0182-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40688-018-0182-1