Skip to main content

Seeking a Balance Between the Statistical and Scientific Elements in Psychometrics

Abstract

In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.

This is a preview of subscription content, access via your institution.

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.

Notes

  1. An outcome space is a set of qualitatively described categories for recording and/or judging how respondents have responded to items (Marton 1981; Wilson 2005).

  2. This system, called the BEAR Assessment System (BAS), is described in Wilson (2005).

  3. This level of CoS is summarized as: “Consider statistics as measures of characteristics of a sample distribution.”

References

  • Adams, R.J., Wilson, M., & Wu, M. (1997a). Multilevel item response models: an approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22(1), 47–76.

    Google Scholar 

  • Adams, R.J., Wilson, M., & Wang, W.C. (1997b). The multidimensional random coefficients multinomial logit. Applied Psychological Measurement, 21, 1–23.

    Article  Google Scholar 

  • Adams, R.J., Wu, M., & Wilson, M. (2012). ConQuest 3.0 [computer program]. Hawthorn, Australia: ACER.

  • Acton, G.S., Kunz, J.D., Wilson, M., & Hall, S.M. (2005). The construct of internalization: conceptualization, measurement, and prediction of smoking treatment outcome. Psychological Medicine, 35, 395–408.

    PubMed  Article  Google Scholar 

  • American Educational Research Association, American Psychological Association, National Council for Measurement in Education (AERA, APA, NCME) (1999). Standards for educational and psychological testing. Washington: American Educational Research Association.

    Google Scholar 

  • American Institutes for Research (2000). Voluntary national test, cognitive laboratory report, year 2. Palo Alto: American Institutes for Research.

    Google Scholar 

  • Biggs, J.B., & Collis, K.F. (1982). Evaluating the quality of learning: the SOLO taxonomy. New York: Academic Press.

    Google Scholar 

  • Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440.

    PubMed  Article  Google Scholar 

  • Brown, N.J.S., & Wilson, M. (2011). Model of cognition: the missing cornerstone of assessment. Educational Psychology Review, 23(2), 221–234.

    Article  Google Scholar 

  • Corcoran, T., Mosher, F.A., & Rogat, A. (2009). Learning progressions in science: an evidence-based approach to reform (CPRE Research Report #RR-63). New York: Center on Continuous Instructional Improvement, Teachers College—Columbia University.

  • De Boeck, P., Wilson, M., & Acton, G.S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112(1), 129–158.

    PubMed  Article  Google Scholar 

  • Demetriou, A., & Efklides, A. (1989). The person’s conception of the structures of developing intellect: early adolescence to middle age. Genetic, Social, and General Psychology Monographs, 115, 371–423.

    PubMed  Google Scholar 

  • Demetriou, A., & Kyriakides, L. (2006). The functional and developmental organization of cognitive developmental sequences. British Journal of Educational Psychology, 76(2), 209–242.

    PubMed  Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38.

    Google Scholar 

  • Diakow, R., & Irribarra, D.T. (2011). Developing assessments of data modeling and mapping a learning progression using a structured constructs model. Paper presented at the international meeting of the psychometric society, Hong Kong, July 2011.

  • Diakow, R., Irribarra, D.T., & Wilson, M. (2011). Analyzing the complex structure of a learning progression: structured construct models. Paper presented at the annual meeting of the national council of measurement in education, New Orleans, LA, April 2011.

  • Diakow, R., Irribarra, D.T., & Wilson, M. (2012a). Analyzing the complex structure of a learning progression: structured construct models. Paper presented at the national council on measurement in education annual meeting, Vancouver, Canada, April 2012.

  • Diakow, R., Irribarra, D.T., & Wilson, M. (2012b). Evaluating the impact of alternative models for between and within construct relations. Paper presented at the international meeting of the psychometric society, Lincoln, Nebraska, July 2012.

  • Draney, K. (1996). The polytomous saltus model: a mixture model approach to the diagnosis of developmental differences. Unpublished doctoral dissertation, University of California, Berkeley.

  • Draney, K., & Jeon, M. (2011). Investigating the saltus model as a tool for setting standards. Psychological Test and Assessment Modeling, 53(4), 486–498.

    Google Scholar 

  • Draney, K., & Wilson, M. (2004). Application of the polytomous saltus model to stage-like data. In A. van der Ark, M. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences. Mahwah: Erlbaum.

    Google Scholar 

  • Falmagne, J.-C., & Doignon, J.-P. (2011). Learning spaces. Heidelberg: Springer.

    Book  Google Scholar 

  • Fischer, K.W., Pipp, S.L., & Bullock, D. (1984). Detecting discontinuities in development: methods and measurement. In R.N. Emde & R. Harmon (Eds.), Continuities and discontinuities in development. Norwood: Ablex.

    Google Scholar 

  • Irribarra, D.T., Diakow, R., & Wilson, M. (2012). Alternative specifications for structured construct models. Paper presented at the IOMW 2012 conference, Vancouver, April 2012.

  • Lehrer, R., Kim, M.-J., Ayers, E., & Wilson, M. (2013, in press). Toward establishing a learning progression to support the development of statistical reasoning. In J. Confrey & A. Maloney (Eds.), Learning over time: learning trajectories in mathematics education. Charlotte: Information Age Publishers.

  • Marton, F. (1981). Phenomenography: describing conceptions of the world around us. Instructional Science, 10, 177–200.

    Article  Google Scholar 

  • Marton, F. (1986). Phenomenography—a research approach to investigating different understandings of reality. Journal of Thought, 21, 29–49.

    Google Scholar 

  • Marton, F. (1988). Phenomenography—exploring different conceptions of reality. In D. Fetterman (Ed.), Qualitative approaches to evaluation in education (pp. 176–205). New York: Praeger.

    Google Scholar 

  • Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

    Article  Google Scholar 

  • Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (2003). On the structure of educational assessments. Measurement Interdisciplinary Research & Perspective, 1, 3–67.

    Article  Google Scholar 

  • Mislevy, R.J., & Wilson, M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61, 41–71.

    Article  Google Scholar 

  • National Research Council (2001). Knowing what students know: the science and design of educational assessment. Committee on the Foundations of Assessment, J. Pellegrino, N. Chudowsky, & R. Glaser (Eds.), Washington: National Academy Press.

  • Nunnally, J.C., & Bernstein, I.H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.

    Google Scholar 

  • Patton, M.Q. (1980). Qualitative evaluation methods. Beverly Hills: Sage.

    Google Scholar 

  • Pirolli, P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105(1), 58–82.

    Article  Google Scholar 

  • Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 4, pp. 321–334).

    Google Scholar 

  • Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press (original work published 1960).

    Google Scholar 

  • Rost, J. (1990). Rasch models in latent classes: an integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.

    Article  Google Scholar 

  • Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: The Guilford Press.

    Google Scholar 

  • Scalise, K., & Gifford, B.R. (2008). Innovative item types: intermediate constraint questions and tasks for computer-based testing. Paper presented at the national council on measurement in education (NCME), session on ‘Building adaptive and other computer-based tests’, in New York, May 2008.

  • Schwartz, R., Ayers, E., & Wilson, M. (2010). Modeling a multi-dimensional learning progression. Paper presented at the annual meeting of the American educational research association, Denver, CO, April 2010.

  • Siegler, R.S. (1981). Developmental sequences within and between concepts. Monograph of the Society for Research in Child Development, 46(2, Serial No. 189).

  • Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583–616.

    Article  Google Scholar 

  • Vermunt, J.K., & Magidson, J. (2007). Latent GOLD 4.5 syntax module (computer program). Belmont, MA: Statistical Innovations.

  • Wilson, M. (1989). Saltus: a psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105(2), 276–289.

    Article  Google Scholar 

  • Wilson, M. (2005). Constructing measures: an item response modeling approach. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Wilson, M. (2009). Measuring progressions: assessment structures underlying a learning progression. Journal for Research in Science Teaching, 46(6), 716–730.

    Article  Google Scholar 

  • Wilson, M. (2012). Responding to a challenge that learning progressions pose to measurement practice: hypothesized links between dimensions of the outcome progression. In A.C. Alonzo & A.W. Gotwals (Eds.), Learning progressions in science. Rotterdam: Sense Publishers.

    Google Scholar 

Download references

Acknowledgements

Many colleagues have contributed to the thoughts and ideas presented in this paper—unfortunately, I cannot acknowledge all of you. Hence, I restrict my acknowledgements to two groups. First, those who commented on drafts of the text: Ronli Diakow, Paul De Boeck, Karen Draney, Andy Maul, Roger Millsap, and David Torres Irribarra. Second, those who worked directly on the examples used in the text: for the saltus example, Karen Draney and Bob Mislevy; for the ADM example, Beth Ayers, Kristen Burmester, Tzur Karelitz, Rich Lehrer, David Torres Irribarra, Kavita Seeratan and Bob Schwartz; and for the SCM example, Ronli Diakow, and David Torres Irribarra. Any errors or omissions are, of course, the responsibility of the author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Wilson.

Appendix: Publications Related to the Saltus Model (in Chronological Order)

Appendix: Publications Related to the Saltus Model (in Chronological Order)

  1. 21.

    Draney, K., & Jeon, M. (2011). Investigating the saltus model as a tool for setting standards. Psychological Test and Assessment Modeling, 53(4), 486–498.

  2. 20.

    Draney, K., Wilson, M., Gluck, J., & Spiel, C. (2008). Mixture models in a developmental context. In G.R. Hancock & K.M. Samuelson (Eds.), Advances in latent variable mixture models (pp. 199–216). Charlotte: Information Age Publishing.

  3. 19.

    Draney, K., & Wilson, M. (2007). Application of the saltus model to stage-like data: some applications and current developments. In M. von Davier & C. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 119–130). New York: Springer.

  4. 18.

    Draney, K. (2007). Understanding Rasch measurement: the saltus model applied to proportional reasoning data. Journal of Applied Measurement, 8.

  5. 17.

    Demetriou, A., & Kyriakides, L. (2006). The functional and developmental organization of cognitive developmental sequences. British Journal of Educational Psychology, 76(2), 209–242.

  6. 16.

    Acton, G.S., Kunz, J.D., Wilson, M., & Hall, S.M. (2005). The construct of internalization: conceptualization, measurement, and prediction of smoking treatment outcome. Psychological Medicine, 35, 395–408.

  7. 15.

    De Boeck, P., Wilson, M., & Acton, G.S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112(1), 129–158.

  8. 14.

    Draney, K., & Wilson, M. (2004). Application of the polytomous saltus model to stage-like data. In A. van der Ark, M. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences. Mahwah: Erlbaum.

  9. 13.

    Fieuws, S., Spiessens, B., & Draney, K. (2004). Mixture models. In P. De Boeck & M. Wilson, (Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 317–340). New York: Springer.

  10. 12.

    Pirolli, P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105(1), 58–82.

  11. 11.

    Wilson, M., & Draney, K. (1997). Partial credit in a developmental context: the case for adopting a mixture model approach. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement IV: theory into practice (pp. 333–350). Norwood: Ablex.

  12. 10.

    Draney, K.L., & Wilson, M. (1997). PC-saltus [computer program]. BEAR Center Research Report, UC Berkeley.

  13. 9.

    Mislevy, R.J., & Wilson, M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61(1), 41–71.

  14. 8.

    Draney, K.L. (1996). The polytomous saltus model: a mixture model approach to the diagnosis of developmental differences. Unpublished doctoral dissertation, UC Berkeley.

  15. 7.

    Wilson, M. (1994). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.), International encyclopedia of education (2nd ed., pp. 1508–1514). Oxford: Pergamon Press.

  16. 6.

    Wilson, M. (1993). The “Saltus model” misunderstood. Methodika 7, 1–4.

  17. 5.

    Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.), International encyclopedia of education: research and studies. Supplementary volume 2. Oxford: Pergamon Press.

  18. 4.

    Demetriou, A., & Efklides, A. (1989). The person’s conception of the structures of developing intellect: early adolescence to middle age. Genetic, Social, and General Psychology Monographs, 115, 371–423.

  19. 3.

    Wilson, M. (1989). Saltus: a psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105(2), 276–289.

  20. 2.

    Wilson, M. (1985). Measuring stages of growth, ACER occasional paper, No. 19. Melbourne, Australia: ACER.

  21. 1.

    Wilson, M. (1984). A psychometric model of hierarchical development. Unpublished doctoral dissertation, University of Chicago.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wilson, M. Seeking a Balance Between the Statistical and Scientific Elements in Psychometrics. Psychometrika 78, 211–236 (2013). https://doi.org/10.1007/s11336-013-9327-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-013-9327-3

Key words

  • psychometrics
  • test theory
  • test construction