, Volume 78, Issue 2, pp 211–236 | Cite as

Seeking a Balance Between the Statistical and Scientific Elements in Psychometrics

  • Mark WilsonEmail author


In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.

Key words

psychometrics test theory test construction 



Many colleagues have contributed to the thoughts and ideas presented in this paper—unfortunately, I cannot acknowledge all of you. Hence, I restrict my acknowledgements to two groups. First, those who commented on drafts of the text: Ronli Diakow, Paul De Boeck, Karen Draney, Andy Maul, Roger Millsap, and David Torres Irribarra. Second, those who worked directly on the examples used in the text: for the saltus example, Karen Draney and Bob Mislevy; for the ADM example, Beth Ayers, Kristen Burmester, Tzur Karelitz, Rich Lehrer, David Torres Irribarra, Kavita Seeratan and Bob Schwartz; and for the SCM example, Ronli Diakow, and David Torres Irribarra. Any errors or omissions are, of course, the responsibility of the author.


  1. Adams, R.J., Wilson, M., & Wu, M. (1997a). Multilevel item response models: an approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22(1), 47–76. Google Scholar
  2. Adams, R.J., Wilson, M., & Wang, W.C. (1997b). The multidimensional random coefficients multinomial logit. Applied Psychological Measurement, 21, 1–23. CrossRefGoogle Scholar
  3. Adams, R.J., Wu, M., & Wilson, M. (2012). ConQuest 3.0 [computer program]. Hawthorn, Australia: ACER. Google Scholar
  4. Acton, G.S., Kunz, J.D., Wilson, M., & Hall, S.M. (2005). The construct of internalization: conceptualization, measurement, and prediction of smoking treatment outcome. Psychological Medicine, 35, 395–408. PubMedCrossRefGoogle Scholar
  5. American Educational Research Association, American Psychological Association, National Council for Measurement in Education (AERA, APA, NCME) (1999). Standards for educational and psychological testing. Washington: American Educational Research Association. Google Scholar
  6. American Institutes for Research (2000). Voluntary national test, cognitive laboratory report, year 2. Palo Alto: American Institutes for Research. Google Scholar
  7. Biggs, J.B., & Collis, K.F. (1982). Evaluating the quality of learning: the SOLO taxonomy. New York: Academic Press. Google Scholar
  8. Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440. PubMedCrossRefGoogle Scholar
  9. Brown, N.J.S., & Wilson, M. (2011). Model of cognition: the missing cornerstone of assessment. Educational Psychology Review, 23(2), 221–234. CrossRefGoogle Scholar
  10. Corcoran, T., Mosher, F.A., & Rogat, A. (2009). Learning progressions in science: an evidence-based approach to reform (CPRE Research Report #RR-63). New York: Center on Continuous Instructional Improvement, Teachers College—Columbia University. Google Scholar
  11. De Boeck, P., Wilson, M., & Acton, G.S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112(1), 129–158. PubMedCrossRefGoogle Scholar
  12. Demetriou, A., & Efklides, A. (1989). The person’s conception of the structures of developing intellect: early adolescence to middle age. Genetic, Social, and General Psychology Monographs, 115, 371–423. PubMedGoogle Scholar
  13. Demetriou, A., & Kyriakides, L. (2006). The functional and developmental organization of cognitive developmental sequences. British Journal of Educational Psychology, 76(2), 209–242. PubMedCrossRefGoogle Scholar
  14. Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38. Google Scholar
  15. Diakow, R., & Irribarra, D.T. (2011). Developing assessments of data modeling and mapping a learning progression using a structured constructs model. Paper presented at the international meeting of the psychometric society, Hong Kong, July 2011. Google Scholar
  16. Diakow, R., Irribarra, D.T., & Wilson, M. (2011). Analyzing the complex structure of a learning progression: structured construct models. Paper presented at the annual meeting of the national council of measurement in education, New Orleans, LA, April 2011. Google Scholar
  17. Diakow, R., Irribarra, D.T., & Wilson, M. (2012a). Analyzing the complex structure of a learning progression: structured construct models. Paper presented at the national council on measurement in education annual meeting, Vancouver, Canada, April 2012. Google Scholar
  18. Diakow, R., Irribarra, D.T., & Wilson, M. (2012b). Evaluating the impact of alternative models for between and within construct relations. Paper presented at the international meeting of the psychometric society, Lincoln, Nebraska, July 2012. Google Scholar
  19. Draney, K. (1996). The polytomous saltus model: a mixture model approach to the diagnosis of developmental differences. Unpublished doctoral dissertation, University of California, Berkeley. Google Scholar
  20. Draney, K., & Jeon, M. (2011). Investigating the saltus model as a tool for setting standards. Psychological Test and Assessment Modeling, 53(4), 486–498. Google Scholar
  21. Draney, K., & Wilson, M. (2004). Application of the polytomous saltus model to stage-like data. In A. van der Ark, M. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences. Mahwah: Erlbaum. Google Scholar
  22. Falmagne, J.-C., & Doignon, J.-P. (2011). Learning spaces. Heidelberg: Springer. CrossRefGoogle Scholar
  23. Fischer, K.W., Pipp, S.L., & Bullock, D. (1984). Detecting discontinuities in development: methods and measurement. In R.N. Emde & R. Harmon (Eds.), Continuities and discontinuities in development. Norwood: Ablex. Google Scholar
  24. Irribarra, D.T., Diakow, R., & Wilson, M. (2012). Alternative specifications for structured construct models. Paper presented at the IOMW 2012 conference, Vancouver, April 2012. Google Scholar
  25. Lehrer, R., Kim, M.-J., Ayers, E., & Wilson, M. (2013, in press). Toward establishing a learning progression to support the development of statistical reasoning. In J. Confrey & A. Maloney (Eds.), Learning over time: learning trajectories in mathematics education. Charlotte: Information Age Publishers. Google Scholar
  26. Marton, F. (1981). Phenomenography: describing conceptions of the world around us. Instructional Science, 10, 177–200. CrossRefGoogle Scholar
  27. Marton, F. (1986). Phenomenography—a research approach to investigating different understandings of reality. Journal of Thought, 21, 29–49. Google Scholar
  28. Marton, F. (1988). Phenomenography—exploring different conceptions of reality. In D. Fetterman (Ed.), Qualitative approaches to evaluation in education (pp. 176–205). New York: Praeger. Google Scholar
  29. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. CrossRefGoogle Scholar
  30. Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (2003). On the structure of educational assessments. Measurement Interdisciplinary Research & Perspective, 1, 3–67. CrossRefGoogle Scholar
  31. Mislevy, R.J., & Wilson, M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61, 41–71. CrossRefGoogle Scholar
  32. National Research Council (2001). Knowing what students know: the science and design of educational assessment. Committee on the Foundations of Assessment, J. Pellegrino, N. Chudowsky, & R. Glaser (Eds.), Washington: National Academy Press. Google Scholar
  33. Nunnally, J.C., & Bernstein, I.H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Google Scholar
  34. Patton, M.Q. (1980). Qualitative evaluation methods. Beverly Hills: Sage. Google Scholar
  35. Pirolli, P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105(1), 58–82. CrossRefGoogle Scholar
  36. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 4, pp. 321–334). Google Scholar
  37. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press (original work published 1960). Google Scholar
  38. Rost, J. (1990). Rasch models in latent classes: an integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282. CrossRefGoogle Scholar
  39. Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: The Guilford Press. Google Scholar
  40. Scalise, K., & Gifford, B.R. (2008). Innovative item types: intermediate constraint questions and tasks for computer-based testing. Paper presented at the national council on measurement in education (NCME), session on ‘Building adaptive and other computer-based tests’, in New York, May 2008. Google Scholar
  41. Schwartz, R., Ayers, E., & Wilson, M. (2010). Modeling a multi-dimensional learning progression. Paper presented at the annual meeting of the American educational research association, Denver, CO, April 2010. Google Scholar
  42. Siegler, R.S. (1981). Developmental sequences within and between concepts. Monograph of the Society for Research in Child Development, 46(2, Serial No. 189). Google Scholar
  43. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583–616. CrossRefGoogle Scholar
  44. Vermunt, J.K., & Magidson, J. (2007). Latent GOLD 4.5 syntax module (computer program). Belmont, MA: Statistical Innovations. Google Scholar
  45. Wilson, M. (1989). Saltus: a psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105(2), 276–289. CrossRefGoogle Scholar
  46. Wilson, M. (2005). Constructing measures: an item response modeling approach. Mahwah: Lawrence Erlbaum Associates. Google Scholar
  47. Wilson, M. (2009). Measuring progressions: assessment structures underlying a learning progression. Journal for Research in Science Teaching, 46(6), 716–730. CrossRefGoogle Scholar
  48. Wilson, M. (2012). Responding to a challenge that learning progressions pose to measurement practice: hypothesized links between dimensions of the outcome progression. In A.C. Alonzo & A.W. Gotwals (Eds.), Learning progressions in science. Rotterdam: Sense Publishers. Google Scholar

Copyright information

© The Psychometric Society 2013

Authors and Affiliations

  1. 1.University of California, BerkeleyBerkeleyUSA

Personalised recommendations