Skip to main content
Log in

Theoretically-Consistent Cognitive Ability Test Development and Score Interpretation

  • Published:
Contemporary School Psychology Aims and scope Submit manuscript

Abstract

Clinical cognitive ability assessment—and its corollary, score interpretation—are in a state of disarray. Many current instruments are designed to provide a bevy of scores to appeal to a variety of school psychologists. These scores are not all grounded in the attribute’s theory or developed from sound measurement or psychometric theory. Thus, for a given instrument, there can be substantial variation between school psychologists when interpreting scores from the same instrument. This is contrary to the very purpose of psychological assessment. As a contrast, we provide a sketch of theoretically driven test development and score interpretation. In addition, we provide examples of how this could be implemented using two theories of intelligence (Spearman’s two-factor and Cattell and Horn’s Gf-Gc) and measurement theory about the nature of psychological test scores. While different from what is often implemented by school psychologists, it is consistent with the guiding principles of evidence-based psychological assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Unfortunately, David Wechsler never actually defined the attribute he was measuring with the “verbal” subtests on his instruments. Instead, it appears he included them because he wanted to have a cognitive ability test that was different from those that were already in existence circa 1920s.

    My usual examination of subjects included, in addition to a short interview, administration of the Stanford-Binet or Yerkes Point Scale, and nearly always one or more of the available performance tests. It then occurred to me that an intelligence scale, combining verbal and nonverbal tests, would be a useful addition to the psychometrist’s armamentarium (Wechsler 1981, p. 83).

  2. A possible exception is the WJ IV, which uses principal component analysis-derived weights for the calculation of the General Intellectual Ability score. Although the results are “truly enough not identical with ‘g’ [they] are usually at any rate very good approximations to it” (Spearman 1946, p. 121).

  3. There are other noted abilities contained within Gf-Gc theory (Horn and Blankson 2012), but Gf and Gc are believed to make the most important contributions to intellectual functioning.

  4. Readers interested in information on how to calculate these statistics can consult Grice (2001).

  5. We calculated reliability of the aggregate scores using the Guttman-Cronbach α.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA/APA/NCME]. (2014). Standards for educational and psychological testing (4th ed.). Washington, DC: Authors.

    Google Scholar 

  • Beaujean, A. A. (2018). Simulating data for clinical research: a tutorial. The Journal of Psychoeducational Assessment, 36, 7–20. https://doi.org/10.1177/0734282917690302.

    Article  Google Scholar 

  • Beaujean, A. A., & Sheng, Y. (2014). Assessing the Flynn effect in the Wechsler scales. Journal of Individual Differences, 35, 63–78. https://doi.org/10.1027/1614-0001/a000128.

    Article  Google Scholar 

  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061.

    Article  PubMed  Google Scholar 

  • Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., & Franić, S. (2009). The end of construct validity. In R. W. Lissitz (Ed.), The concept of validity: revisions, new directions, and applications (pp. 135–170). Charlotte: Information Age Publishing.

    Google Scholar 

  • Braden, J. P., & Ouzts, S. M. (2005). Review of the Kaufman assessment battery for children, second edition. In B. S. Plake & J. C. Impara (Eds.), The sixteenth mental measurements yearbook (2nd ed., pp. 517–520). Lincoln: Buros Institute of Mental Measurements.

    Google Scholar 

  • Bringmann, L. F., & Eronen, M. I. (2016). Heating up the measurement debate: what psychologists can learn from the history of physics. Theory & Psychology, 26, 27–43. https://doi.org/10.1177/0959354315617253.

    Article  Google Scholar 

  • Canivez, G. L., & Watkins, M. W. (2016). Review of the Wechsler intelligence scale for children-fifth edition: critique, commentary, and independent analyses. In A. S. Kaufman, S. E. Raiford, & D. L. Coalson (Eds.), Intelligent testing with the WISC-V (pp. 683–702). Hoboken: Wiley.

    Google Scholar 

  • Carroll, J. B. (1996). A three-stratum theory of intelligence: Spearman’s contribution. In I. Dennis & P. Tapsfield (Eds.), Human abilities: their nature and measurement (pp. 1–17). Mahwah: Erlbaum.

    Google Scholar 

  • Cattell, R. B. (1943). The measurement of adult intelligence. Psychological Bulletin, 40, 153–193. https://doi.org/10.1037/h0059973.

    Article  Google Scholar 

  • Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: a critical experiment. Journal of Educational Psychology, 54, 1–22. https://doi.org/10.1037/h0046743.

    Article  Google Scholar 

  • Cattell, R. B. (1987). Intelligence: its structure, growth, and action. New York: Elsevier.

    Google Scholar 

  • Courville, T., Coalson, D. L., Kaufman, A. S., & Raiford, S. E. (2016). Does WISC-V scatter matter? In A. S. Kaufman, S. E. Raiford, & D. L. Coalson (Eds.), Intelligent testing with the WISC-V (pp. 209–228). Hoboken: Wiley.

    Google Scholar 

  • Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of testing (pp. 3–25). Mahwah: Lawrence Erlbaum.

    Google Scholar 

  • Finkelstein, L. (2005). Problems of measurement in soft systems. Measurement, 38, 267–274. https://doi.org/10.1016/j.measurement.2005.09.002.

    Article  Google Scholar 

  • Flanagan, D. P., & Alfonso, V. C. (2017). Essentials of WISC-V assessment (2nd ed.). Hoboken: Wiley.

    Google Scholar 

  • Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2013). Essentials of cross-battery assessment (3rd ed.). Hoboken: Wiley.

    Google Scholar 

  • Floyd, R. G., Bergeron, R., McCormack, A. C., Anderson, J. L., & Hargrove-Owens, G. L. (2005). Are Cattell-Horn-Carroll (CHC) broad ability composite scores exchangeable across batteries? School Psychology Review, 34, 329–357.

    Google Scholar 

  • Frazier, T. W., & Youngstrom, E. A. (2007). Historical increase in the number of factors measured by commercial tests of cognitive ability: are we overfactoring? Intelligence, 35, 169–182. https://doi.org/10.1016/j.intell.2006.07.002.

    Article  Google Scholar 

  • Grace, J. B., & Bollen, K. A. (2008). Representing general theoretical concepts in structural equation models: the role of composite variables. Environmental and Ecological Statistics, 15, 191–213. https://doi.org/10.1007/s10651-007-0047-7.

    Article  Google Scholar 

  • Grégoire, J. (2013). Measuring components of intelligence: mission impossible? Journal of Psychoeducational Assessment, 31, 138–147. https://doi.org/10.1177/0734282913478034.

    Article  Google Scholar 

  • Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6, 430–450. https://doi.org/10.1037/1082-989X.6.4.430.

    Article  PubMed  Google Scholar 

  • Groth-Marnat, G. (1999). Financial efficacy of clinical assessment: rational guidelines and issues for future research. Journal of Clinical Psychology, 55, 813–824.

    Article  Google Scholar 

  • Grove, W. M., & Vrieze, S. I. (2013). The clinical versus mechanical prediction controversy. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J. I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 2: Testing and assessment in clinical and counseling psychology (pp. 51–62). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Hale, J. B., Fiorello, C. A., Kavanagh, J. A., Hoeppner, J.-A. B., & Gaither, R. A. (2001). WISC-III predictors of academic achievement for children with learning disabilities: are global and factor scores comparable? School Psychology Quarterly, 16, 31–55. https://doi.org/10.1521/scpq.16.1.31.19158.

    Article  Google Scholar 

  • Horn, J. L. (1963). Equations representing combinations of components in scoring psychological variables. Acta Psychologica, 21, 184–217. https://doi.org/10.1016/0001-6918(63)90048-9.

    Article  Google Scholar 

  • Horn, J. L. (1985). Remodeling old models of intelligence. In B. B. Wolman (Ed.), Handbook of intelligence (pp. 267–300). New York: Wiley.

    Google Scholar 

  • Horn, J. L. (1989). Models of intelligence. In R. L. Linn (Ed.), Intelligence, measurement, theory and public policy (pp. 29–73). Urbana: University of Illinois Press.

    Google Scholar 

  • Horn, J. L. (1991). Measurement of intellectual capabilities: a review of theory. In K. S. McGrew, J. K. Werder, & R. W. Woodcock (Eds.), Woodcock-Johnson psycho-educational battery-revised technical manual (pp. 197–232). Chicago: Riverside.

    Google Scholar 

  • Horn, J. L., & Blankson, A. N. (2012). Foundations for better understanding of cognitive abilities. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: theories, tests, and issues (3rd ed., pp. 73–98). New York: Guilford Press.

    Google Scholar 

  • Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized intelligence. Journal of Educational Psychology, 57, 253–270. https://doi.org/10.1037/h0023816.

    Article  PubMed  Google Scholar 

  • Horn, J. L., & McArdle, J. J. (2007). Understanding human intelligence since Spearman. In R. Cudeck & R. C. MacCallum (Eds.), Factor analysis at 100: historical developments and future directions (pp. 205–247). Mahwah: Erlbaum.

    Google Scholar 

  • Hunsley, J., & Mash, E. J. (2007). Evidence-based assessment. Annual Review of Clinical Psychology, 3, 29–51. https://doi.org/10.1146/annurev.clinpsy.3.022806.091419.

    Article  PubMed  Google Scholar 

  • Jackson, J. S. H., & Maraun, M. (1996). The conceptual validity of empirical scale construction: the case of the sensation seeking scale. Personality and Individual Differences, 21, 103–110. https://doi.org/10.1016/0191-8869(95)00217-0.

    Article  Google Scholar 

  • Jensen, A. R. (1993). Psychometric g and achievement. In B. R. Gifford (Ed.), Policy perspectives on educational testing (pp. 117–227). New York: Kluwer Academic Publishers.

    Chapter  Google Scholar 

  • Jensen, A. R. (2002). Galton’s legacy to research on intelligence. Journal of Biosocial Science, 34, 145–172. https://doi.org/10.1017/s0021932002001451.

    Article  PubMed  Google Scholar 

  • Kamphaus, R. W., Winsor, A. P., Rowe, E. W., & Kim, S. (2012). A history of intelligence test interpretation. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (3rd ed., pp. 56–70). New York: Guilford.

    Google Scholar 

  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000.

    Article  Google Scholar 

  • Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman assessment battery for children-second edition. Circle Pines: American Guidance Service.

    Google Scholar 

  • Kaufman, A. S., Raiford, S. E., & Coalson, D. L. (2016). Intelligent testing with the WISC-V. Hoboken: Wiley.

    Google Scholar 

  • Keith, T. Z., & Reynolds, M. R. (2010). Cattell-Horn-Carroll abilities and cognitive tests: what we’ve learned from 20 years of research. Psychology in the Schools, 47, 635–650. https://doi.org/10.1002/pits.20496.

    Article  Google Scholar 

  • Kingston, N. M., Scheuring, S. T., & Kramer, L. B. (2013). Test development strategies. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J. I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1: test theory and testing and assessment in industrial and organizational psychology (pp. 165–184). Washington, DC: American Psychological Association.

    Google Scholar 

  • Kline, P. (2000). The handbook of psychological testing (2nd ed.). London: Routledge.

    Google Scholar 

  • Krause, M. S. (2012). Measurement validity is fundamentally a matter of definition, not correlation. Review of General Psychology, 16, 391–400. https://doi.org/10.1037/a0027701.

    Article  Google Scholar 

  • Krause, M. S. (2013). The data analytic implications of human psychology’s dimensions being ordinally scaled. Review of General Psychology, 17, 318–325. https://doi.org/10.1037/a0032292.

    Article  Google Scholar 

  • Littell, W. M. (1960). The Wechsler intelligence scale for children: review of a decade of research. Psychological Bulletin, 57, 132–156. https://doi.org/10.1037/h0044513.

    Article  PubMed  Google Scholar 

  • Luecht, R. M., Gierl, M. J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.

  • Luria, A. R. (1973). The working brain: an introduction to neuropsychology. New York: Basic Books.

    Google Scholar 

  • Maraun, M. D. (1998a). Measurement as a normative practice: implications of Wittgenstein’s philosophy for measurement in psychology. Theory & Psychology, 8, 435–461. https://doi.org/10.1177/0959354398084001.

    Article  Google Scholar 

  • Maraun, M. D. (1998b). The nexus misconceived: Wittgenstein made silly. Theory & Psychology, 8, 489–501. https://doi.org/10.1177/0959354398084004.

    Article  Google Scholar 

  • Mari, L., Carbone, P., & Petri, D. (2015). Fundamentals of hard and soft measurement. In A. Ferrero, D. Petri, P. Carbone & M. Catelani (Eds.), Modern measurements: Fundamentals and applications (pp. 203–262). Hoboken, NJ: Wiley-IEEE Press.

  • McDonald, R. P. (1999). Test theory: a unified treatment. Mahwah: Erlbaum.

    Google Scholar 

  • McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. https://doi.org/10.1016/j.intell.2008.08.004.

    Article  Google Scholar 

  • McGrew, K. S., LaForte, E. M., & Schrank, F. A. (2014). Woodcock- Johnson IV technical manual. Rolling Meadows: Riverside.

    Google Scholar 

  • Michell, J. (1999). Measurement in psychology: critical history of a methodological concept. New York: Cambridge University Press.

    Book  Google Scholar 

  • Michell, J. (2007). Measurement. In S. P. Turner & M. W. Risjord (Eds.), Philosophy of anthropology and sociology (pp. 71–119). Amsterdam: North Holland.

    Chapter  Google Scholar 

  • Michell, J. (2011). Qualitative research meets the ghost of Pythagoras. Theory & Psychology, 21, 241–259. https://doi.org/10.1177/0959354310391351.

    Article  Google Scholar 

  • Michell, J. (2012). Alfred Binet and the concept of heterogeneous orders. Frontiers in Psychology, 3(261), 1–8. https://doi.org/10.3389/fpsyg.2012.00261.

    Article  Google Scholar 

  • Petri, D., Mari, L., & Carbone, P. (2015). A structured methodology for measurement development. IEEE Transactions on Instrumentation and Measurement, 64, 2367–2379. https://doi.org/10.1109/TIM.2015.2399023.

    Article  Google Scholar 

  • Pfeiffer, S. I., Reddy, L. A., Kletzel, J. E., Schmelzer, E. R., & Boyer, L. M. (2000). The practitioner’s view of IQ testing and profile analysis. School Psychology Quarterly, 15, 376–385. https://doi.org/10.1037/h0088795.

    Article  Google Scholar 

  • R Development Core Team. (2017). R: a language and environment for statistical computing (version 3.3.3) [computer program]. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Raiford, S. E. (2017). Essentials of WISC-V integrated assessment. Hoboken: Wiley.

    Google Scholar 

  • Schneider, W. J. (2013). What if we took our models seriously? Estimating latent scores in individuals. Journal of Psychoeducational Assessment, 31, 186–201. https://doi.org/10.1177/0734282913478046.

    Article  Google Scholar 

  • Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (3rd ed., pp. 99–144). New York: Guilford.

    Google Scholar 

  • Schrank, F. A., McGrew, K. S., & Mather, N. (2014). Woodcock-Johnson IV tests of cognitive abilities. Rolling Meadows: Riverside.

    Google Scholar 

  • Sijtsma, K. (2012). Psychological measurement between physics and statistics. Theory & Psychology, 22, 786–809. https://doi.org/10.1177/0959354312454353.

    Article  Google Scholar 

  • Sijtsma, K. (2013). Theory development as a precursor for test validity. In R. E. Millsap, L. A. van der Ark, D. M. Bolt, & C. M. Woods (Eds.), New developments in quantitative psychology: presentations from the 77th annual psychometric society meeting (pp. 267–274). New York: Springer.

    Chapter  Google Scholar 

  • Spearman, C. E. (1927). The abilities of man: their nature and measurement. New York: Blackburn Press.

    Google Scholar 

  • Spearman, C. E. (1931). Our need of some science in place of the word ‘intelligence’. Journal of Educational Psychology, 22, 401–410. https://doi.org/10.1037/h0070599.

    Article  Google Scholar 

  • Spearman, C. E. (1939). Thurstone’s work re-worked. Journal of Educational Psychology, 30, 1–16. https://doi.org/10.1037/h0061267.

    Article  Google Scholar 

  • Spearman, C. E. (1946). Theory of general factor. British Journal of Psychology, 36, 117–131. https://doi.org/10.1111/j.2044-8295.1946.tb01114.x.

    Article  Google Scholar 

  • Thomson, G. H. (1927). The tetrad-difference criterion. British Journal of Psychology. General Section, 17, 235–255. https://doi.org/10.1111/j.2044-8295.1927.tb00426.x.

    Article  Google Scholar 

  • Thurstone, L. L. (1935). The vectors of mind: multiple-factor analysis for the isolation of primary traits. Chicago: University of Chicago Press.

    Book  Google Scholar 

  • Tomarken, A. J., & Waller, N. G. (2003). Potential problems with “well fitting” models. Journal of Abnormal Psychology, 112, 578–598. https://doi.org/10.1037/0021-843X.112.4.578.

    Article  PubMed  Google Scholar 

  • Wechsler, D. (1950). Cognitive, conative, and non-intellective intelligence. American Psychologist, 5, 78–83. https://doi.org/10.1037/h0063112.

    Article  Google Scholar 

  • Wechsler, D. (1975). Intelligence defined and undefined: a relativistic appraisal. American Psychologist, 30, 135–139. https://doi.org/10.1037/h0076868.

    Article  Google Scholar 

  • Wechsler, D. (1981). The psychometric tradition: developing the Wechsler adult intelligence scale. Contemporary Educational Psychology, 6, 82–85. https://doi.org/10.1016/0361-476X(81)90035-7.

    Article  Google Scholar 

  • Wechsler, D. (2014). Wechsler intelligence scale for children-fifth edition administration and scoring manual. Bloomington: NCS Pearson.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Alexander Beaujean.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beaujean, A.A., Benson, N.F. Theoretically-Consistent Cognitive Ability Test Development and Score Interpretation. Contemp School Psychol 23, 126–137 (2019). https://doi.org/10.1007/s40688-018-0182-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40688-018-0182-1

Keywords

Navigation