Concepts and Models from Psychometrics

Mislevy, Robert J.; Bolsinova, Maria

doi:10.1007/978-3-030-74394-9_6

Robert J. Mislevy¹² &
Maria Bolsinova¹³

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

1365 Accesses
1 Citations

Abstract

The concepts and methods of psychometrics originated under trait and behavioral psychology, with relatively simple data, used mainly for purposes of prediction and selection. Ideas emerged over that nevertheless hold value for the new psychological perspectives, contexts of use, and forms of data and analytic tools we are now seeing. In this chapter we review some fundamental models and ideas from psychometrics that are be profitably reconceived, extended, and augmented in in the new world of assessment. Methods we address include classical test theory, generalizability theory, item response theory, latent class models, cognitive diagnosis models, factor analysis, hierarchical models, and Bayesian networks. Key concepts are these: (1) The essential nature of psychometric models (observations, constructs, latent variables, and probability-based reasoning). (2) The interplay of design and discovery in assessment. (3) Understanding the measurement issues of validity, reliability, comparability, generalizability, and fairness as social values that pertain even as forms of data, analysis, context, and purpose evolve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Most of the other methodological chapters provide data and computer code for examples. The R or Python codes for those chapters can be found at the GitHub repository of this book, https://github.com/jgbrainstorm/computational_psychometrics. This chapter is instead meant to survey a large number of models and discuss underlying concepts. Fortunately, the literature offers many examples, tutorials, and more technical presentations on psychometric models. Many useful R packages are freely available for the models we discuss. The CRAN project web site maintains a comprehensive listing and brief descriptions of such resources, at https://cran.r-project.org/web/views/Psychometrics.html.
2.
Note that for some of the models, this notation is not the one typically used within that modeling framework.

References

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian networks in educational assessment. New York: Springer.
Book Google Scholar
Bartholomew, D. J. (1980). Factor analysis for categorical data. Journal of the Royal Statistical Society. Series B (Methodological), 42(3), 293–321.
Article Google Scholar
Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the currents of the digital ocean. In J. Larusson & B. White (Eds.), Learning analytics (pp. 39–60). New York, NY: Springer.
Chapter Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 395–479). Oxford, UK: Addison-Wesley.
Google Scholar
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414.
Article Google Scholar
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.
Article Google Scholar
Breese, J. S., Goldman, R. P., & Wellman, M. P. (1994). Introduction to the special section on knowledge-based construction of probabilistic and decision models. IEEE Transactions on Systems, Man, and Cybernetics, 24(11), 1577–1579.
Google Scholar
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456.
Article Google Scholar
Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q matrix. Psychometrika, 83(1), 89–108.
Article Google Scholar
Cronbach, L., Gleser, G., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
Google Scholar
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48, 1–28.
Article Google Scholar
De Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.
Book Google Scholar
De la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Article Google Scholar
Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56(3), 495–515.
Article Google Scholar
Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2018). Network psychometrics. In P. Irwing, D. Hughes, & T. Booth (Eds.), The Wiley handbook of psychometric testing. New York: Elsevier.
Google Scholar
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
Article Google Scholar
Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York: Springer Science & Business Media.
Book Google Scholar
Gao, X., Shavelson, R. J., & Baxter, G. P. (1994). Generalizability of large-scale performance assessments in science: Promises and problems. Applied Measurement in Education, 7(4), 323–342.
Article Google Scholar
Geerlings, H., Glas, C. A., & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76(2), 337.
Article Google Scholar
Gelman, A., Stern, H. S., Carlin, J. B., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. New York: Chapman and Hall/CRC.
Book Google Scholar
Gulliksen, H. (2013). Theory of mental tests. New York: Routledge.
Book Google Scholar
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191.
Article Google Scholar
Jansen, B. R., & van der Maas, H. L. (2002). The development of children’s rule use on the balance scale task. Journal of Experimental Child Psychology, 81(4), 383–416.
Article Google Scholar
Jones, L. V., & Thissen, D. (2007). A history and overview of psychometrics. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 1–27). New York: Elsevier.
Google Scholar
Joreskog, K. G., Sorbom, D., & Magidson, J. (1979). Advances in factor analysis and structural equation models. New York, NY: New York University Press.
Google Scholar
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Article Google Scholar
Lazarsfeld, P. F. (1959). Latent structure analysis. Psychology: A Study of a Science, 3, 476–543.
Google Scholar
Levy, R. (2009). The rise of Markov chain Monte Carlo estimation for psychometric modeling. Journal of Probability and Statistics, 2009, Article ID 537139.
Google Scholar
Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. New York: Chapman and Hall/CRC.
Google Scholar
Little, R. J., & Rubin, D. B. (2014). Statistical analysis with missing data. Hoboken, NJ: Wiley.
Google Scholar
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36(7), 548–564.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley.
Google Scholar
Luo, Y., & Jiao, H. (2018). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 78(3), 384–408.
Article Google Scholar
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862.
Article Google Scholar
Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R. van, Waldorp, L., …Maris, G. (2018). An introduction to network psychometrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research, 53(1), 15–35.
Google Scholar
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127–143.
Article Google Scholar
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.
Article Google Scholar
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.
Article Google Scholar
Millsap, R. E. (2012). Statistical approaches to measurement invariance. New York: Routledge.
Book Google Scholar
Mislevy, R. J. (2016). Missing responses in item response modeling. In W. J. van der Linden (Ed.), Handbook of item response theory, volume two: Statistical tools (pp. 171–194). Boca Raton, FL: Chapman Hall/CRC Press.
Google Scholar
Mislevy, R. J., Behrens, J. T., DiCerbo, K. E., & Levy, R. (2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of Educational Data Mining, 4(1), 11–48.
Google Scholar
Mislevy, R. J., & Gitomer, D. H. (1995). The role of probability-based inference in an intelligent tutoring system. User Modeling and User-Adapted Interaction, 5(3), 253–282.
Google Scholar
Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. Tech. Rep. No. RR-96-30-ONR. Princeton, NJ: Educational Testing Service.
Google Scholar
Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. Unpublished doctoral dissertation, University of California at Berkeley.
Google Scholar
National Research Council. (2001). Knowing what students know: The science and design of educational assessment. Washington, D.C.: National Academies Press.
Google Scholar
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.
Article Google Scholar
Pourret, O., Naïm, P., & Marcot, B. (2008). Bayesian networks: A practical guide to applications. John Wiley & Sons.
Book Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. New York: Chapman and Hall/CRC.
Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502.
Article Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory models. In M. D. Reckase (Ed.), Multidimensional item response theory (pp. 79–112). New York: Springer.
Chapter Google Scholar
Romero, C., Ventura, S., Pechenizkiy, M., & Baker, R. S. (2010). Handbook of educational data mining. New York: Chapman and Hall/CRC press.
Book Google Scholar
Rupp, A., Templin, J., & Henson, R. (2010). Diagnostic assessment: Theory, methods, and applications. New York: Guilford.
Google Scholar
Scalise, K. (2017). Hybrid measurement models for technology-enhanced assessments through mIRT-bayes. International Journal of Statistics and Probability, 6(3), 168.
Article Google Scholar
Shute, V. J. (2011). Stealth assessment in computer-based games to support learning. In J. D. Fletcher & S. Tobias (Eds.), Computer games and instruction (pp. 503–524). Charlotte, NC: Information Age Press.
Google Scholar
Spearman, C. (1904). “General Intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201–292.
Article Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Article Google Scholar
Takane, Y., & De Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408.
Article Google Scholar
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345–354.
Article Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Google Scholar
Tucker, L. R. (1955). The objective definition of simple structure in linear factor analysis. Psychometrika, 20(3), 209–225.
Article Google Scholar
VanLehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Shelby, R., Taylor, L.,…Wintersgill, M. (2005). The Andes physics tutoring system: Lessons learned. International Journal of Artificial Intelligence in Education, 15(3), 147–204.
Google Scholar
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
Article Google Scholar
Wang, X., Berger, J. O., Burdick, D. S., et al. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7(1), 126–153.
Article Google Scholar
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK: Cambridge University Press.
Book Google Scholar
Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239–252.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Educational Testing Service, Princeton, NJ, USA
Robert J. Mislevy
Tilburg University, Tilburg, Netherlands
Maria Bolsinova

Authors

Robert J. Mislevy
View author publications
You can also search for this author in PubMed Google Scholar
Maria Bolsinova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert J. Mislevy .

Editor information

Editors and Affiliations

Duolingo and EdAstra Tech, LLC, Newton, MA, USA
Alina A. von Davier
Educational Testing Service, Princeton, NJ, USA
Robert J. Mislevy
Educational Testing Service, Princeton, NJ, USA
Jiangang Hao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mislevy, R.J., Bolsinova, M. (2021). Concepts and Models from Psychometrics. In: von Davier, A.A., Mislevy, R.J., Hao, J. (eds) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-74394-9_6
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74393-2
Online ISBN: 978-3-030-74394-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics