, Volume 77, Issue 1, pp 4–20 | Cite as

Future of Psychometrics: Ask What Psychometrics Can Do for Psychology

  • Klaas SijtsmaEmail author


I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of psychometric knowledge among test constructors and researchers in general is highly important. The second issue concerns the identification of psychometric research topics that are relevant for test constructors and test users but in my view do not receive enough attention in psychometrics. I discuss the influence of test length on decision quality in personnel selection and quality of difference scores in therapy assessment, and theory development in test construction and validity research. I also briefly mention the issue of whether particular attributes are continuous or discrete.

Key words

change assessment decision quality based on short tests didactics of psychometrics personnel selection test-quality assessment test validity theory construction 


  1. American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington: American Educational Research Association. Google Scholar
  2. Atkins, D.C., Bedics, J.D., McGlinchey, J.B., & Beauchaine, T.P. (2005). Assessing clinical significance: does it matter which method we use? Journal of Consulting and Clinical Psychology, 73, 982–989. PubMedCrossRefGoogle Scholar
  3. Bauer, S., Lambert, M.J., & Nielsen, S.L. (2004). Clinical significance methods: a comparison of statistical techniques. Journal of Personality Assessment, 82, 60–70. PubMedCrossRefGoogle Scholar
  4. Bentler, P.A., & Woodward, J.A. (1980). Inequalities among lower bounds to reliability: with applications to test construction and factor analysis. Psychometrika, 45, 249–267. CrossRefGoogle Scholar
  5. Boring, E.G. (1923). Intelligence as the tests test it. New Republic, 35, 35–37. Google Scholar
  6. Borsboom, D., Cramer, A.O.J., Kievit, R.A., Zand Scholten, A., & Franić, S. (2009). The end of construct validity. In R.W. Lissitz (Ed.), The concept of validity. Revisions, new directions, and applications (pp. 135–170). Charlotte: Information Age Publishing, Inc. Google Scholar
  7. Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004). The concept of validity. Psychological review, 111, 1061–1071. PubMedCrossRefGoogle Scholar
  8. Bouwmeester, S., Vermunt, J.K., & Sijtsma, K. (2007). Development and individual differences in transitive reasoning: a fuzzy trace theory approach. Developmental Review, 27, 41–74. CrossRefGoogle Scholar
  9. Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322. Google Scholar
  10. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. CrossRefGoogle Scholar
  11. Cronbach, L.J., & Furby, L. (1970). How we should measure “change”—or should we? Psychological Bulletin, 74, 68–80. CrossRefGoogle Scholar
  12. De Boeck, P., & Wilson, M. (2004). Explanatory item response models. A generalized linear and nonlinear approach. New York: Springer. Google Scholar
  13. Denollet, J. (2000). Type D personality: a potential risk facor refined. Journal of Psychosomatic Research, 49, 255–266. PubMedCrossRefGoogle Scholar
  14. Denollet, J. (2005). DS14: standard assessment of negative affectivity, social inhibition, and Type D personality. Psychosomatic Medicine, 67, 89–97. PubMedCrossRefGoogle Scholar
  15. Emons, W.H.M., Denollet, J., Sijtsma, K., & Pedersen, S.S. (2011). Dimensional and categorical approaches to the Type D personality construct (in preparation). Google Scholar
  16. Emons, W.H.M., Sijtsma, K., & Meijer, R.R. (2007). On the consistency of individual classification using short scales. Psychological Methods, 12, 105–120. PubMedCrossRefGoogle Scholar
  17. Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R.R. (2010). The Dutch review process for evaluating the quality of psychological tests: history, procedure and results. International Journal of Testing, 10, 295–317. CrossRefGoogle Scholar
  18. Ferguson, E., et al. (2009). A taxometric analysis of Type D personality. Psychosomatic Medicine, 71, 981–986. PubMedCrossRefGoogle Scholar
  19. Fischer, G.H. (1995). The linear logistic test model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models. Foundations, recent developments and applications (pp. 131–155). New York: Springer. Google Scholar
  20. Green, S.A., & Yang, Y. (2009). Commentary on coefficient alpha: a cautionary tale. Psychometrika, 74, 121–135. CrossRefGoogle Scholar
  21. Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. PubMedCrossRefGoogle Scholar
  22. Hermans, H.J.M. (2011). Prestatie Motivatie Test voor Kinderen 2 (PMT-K-2) (Performance motivation test for children 2). Amsterdam: Pearson Assessment. Google Scholar
  23. Jacobson, N.S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19. PubMedCrossRefGoogle Scholar
  24. Jansen, B.R.J., & Van der Maas, H.L.J. (1997). Statistical test of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321–357. CrossRefGoogle Scholar
  25. Jansen, B.R.J., & Van der Maas, H.L.J. (2002). The development of children’s rule use on the balance scale task. Journal of Experimental Child Psychology, 81, 383–416. PubMedCrossRefGoogle Scholar
  26. Kapinga, T.J. (2010). Drempelonderzoek. Didactische plaatsbepaling binnen het voortgezet onderwijs en praktijkonderwijs. 5 e versie 2010 (Threshold investigation. Didactical location within secondary education and practical education. 5th Version 2010). Ridderkerk: 678 Onderwijs Advisering. Google Scholar
  27. Korkman, M., Kirk, U., & Kemp, S. (2010). NEPSY-II-NL. Nederlandstalige bewerking (A developmental neuropsycological assessment, II, Dutch version). Amsterdam: Pearson Assessment. Google Scholar
  28. Kruyen, P.M., Emons, W.H.M., & Sijtsma, K. (in press). Test length and decision quality in personnel selection: when is short too short? International Journal of Testing. Google Scholar
  29. Lissitz, R.W. (2009). The concept of validity. Revisions, new directions, and applications. Charlotte: Information Age Publishing, Inc. Google Scholar
  30. Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley. Google Scholar
  31. Mellenbergh, G.J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299. CrossRefGoogle Scholar
  32. Mellenbergh, G.J. (1999). A note on simple gain score precision. Applied Psychological Measurement, 23, 87–89. Google Scholar
  33. Michell, J. (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  34. Nicewander, W.A., & Price, J.M. (1983). Reliability of measurement and the power of statistical tests: some new results. Psychological Bulletin, 94, 524–533. CrossRefGoogle Scholar
  35. Novick, M.R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13. PubMedCrossRefGoogle Scholar
  36. Ogles, B.M., Lunnen, K.M., & Bonesteel, K. (2001). Clinical significance: history, application, and current practice. Clinical Psychology Review, 21, 421–446. PubMedCrossRefGoogle Scholar
  37. Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 69–76. CrossRefGoogle Scholar
  38. Reise, S.P., & Haviland, M.G. (2005). Item response theory and the measurement of clinical change. Journal of Personality Assessment, 84, 228–238. PubMedCrossRefGoogle Scholar
  39. Ruscio, J., Haslam, N., & Ruscio, A.M. (2006). Introduction to the taxometric method: a practical guide. Mahwah: Erlbaum. Google Scholar
  40. Samejima, F. (1969). Psychometrika monograph: Vol. 17. Estimation of latent ability using a response pattern of graded scores. Richmond: Psychometric Society. Google Scholar
  41. Schlichting, L., & Lutje Spelberg, H. (2010). Schlichting Test voor Taalproductie—II (Schlichting test for language production—II). Houten: Bohn Stafleu van Loghum. Google Scholar
  42. Siegler, R.S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46(2, Serial No. 189). Google Scholar
  43. Sijtsma, K. (2009a). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. PubMedCrossRefGoogle Scholar
  44. Sijtsma, K. (2009b). Reliability beyond theory and into practice. Psychometrika, 74, 169–173. PubMedCrossRefGoogle Scholar
  45. Sijtsma, K. (2011). Psychological measurement between physics and statistics. Theory & Psychology. Google Scholar
  46. Sijtsma, K., & Emons, W.H.M. (2011). Advice on total-score reliability issues in psychosomatic measurement. Journal of Psychosomatic Research, 70, 565–572. PubMedCrossRefGoogle Scholar
  47. Singh, S. (1997). Fermat’s last theorem. London: Harper Perennial. Google Scholar
  48. Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. Google Scholar
  49. Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677–680. CrossRefGoogle Scholar
  50. Smits, D.J.M., & De Boeck, P. (2003). A componential IRT model for guilt. Multivariate Behavioral Research, 38, 161–188. CrossRefGoogle Scholar
  51. Ten Berge, J.M.F., Snijders, T.A.B., & Zegers, F.E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis. Psychometrika, 46, 201–213. CrossRefGoogle Scholar
  52. Van Breukelen, G.J.P., & Vlaeyen, J.W.S. (2005). Norming clinical questionnaires with multiple regression: the pain cognition list. Psychological Assessment, 17, 336–344. PubMedCrossRefGoogle Scholar
  53. Van Maanen, L., Been, P.H., & Sijtsma, K. (1989). Problem solving strategies and the linear logistic test model. In E.E.C.I. Roskam (Ed.), Mathematical psychology in progress (pp. 267–287). New York: Springer. Google Scholar
  54. Verguts, T., & De Boeck, P. (2002). The induction of solution rules in Raven’s progressive matrices test. European Journal of Cognitive Psychology, 14, 521–547. CrossRefGoogle Scholar
  55. Zachary, R.A., & Gorsuch, R.L. (1985). Continuous norming: implications for the WAIS-R. Journal of Clinical Psychology, 41, 86–94. PubMedCrossRefGoogle Scholar
  56. Zhu, J., & Chen, H.-Y. (2011). Utility of inferential norming with smaller sample sizes. Journal of Psychoeducational Assessment. doi: 10.1177/0734282910396323. Google Scholar

Copyright information

© The Psychometric Society 2011

Authors and Affiliations

  1. 1.Department of Methodology and Statistics, TSBTilburg UniversityTilburgThe Netherlands

Personalised recommendations