Abstract
The concepts and methods of psychometrics originated under trait and behavioral psychology, with relatively simple data, used mainly for purposes of prediction and selection. Ideas emerged over that nevertheless hold value for the new psychological perspectives, contexts of use, and forms of data and analytic tools we are now seeing. In this chapter we review some fundamental models and ideas from psychometrics that are be profitably reconceived, extended, and augmented in in the new world of assessment. Methods we address include classical test theory, generalizability theory, item response theory, latent class models, cognitive diagnosis models, factor analysis, hierarchical models, and Bayesian networks. Key concepts are these: (1) The essential nature of psychometric models (observations, constructs, latent variables, and probability-based reasoning). (2) The interplay of design and discovery in assessment. (3) Understanding the measurement issues of validity, reliability, comparability, generalizability, and fairness as social values that pertain even as forms of data, analysis, context, and purpose evolve.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Most of the other methodological chapters provide data and computer code for examples. The R or Python codes for those chapters can be found at the GitHub repository of this book, https://github.com/jgbrainstorm/computational_psychometrics. This chapter is instead meant to survey a large number of models and discuss underlying concepts. Fortunately, the literature offers many examples, tutorials, and more technical presentations on psychometric models. Many useful R packages are freely available for the models we discuss. The CRAN project web site maintains a comprehensive listing and brief descriptions of such resources, at https://cran.r-project.org/web/views/Psychometrics.html.
- 2.
Note that for some of the models, this notation is not the one typically used within that modeling framework.
References
Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian networks in educational assessment. New York: Springer.
Bartholomew, D. J. (1980). Factor analysis for categorical data. Journal of the Royal Statistical Society. Series B (Methodological), 42(3), 293–321.
Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the currents of the digital ocean. In J. Larusson & B. White (Eds.), Learning analytics (pp. 39–60). New York, NY: Springer.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 395–479). Oxford, UK: Addison-Wesley.
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395–414.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.
Breese, J. S., Goldman, R. P., & Wellman, M. P. (1994). Introduction to the special section on knowledge-based construction of probabilistic and decision models. IEEE Transactions on Systems, Man, and Cybernetics, 24(11), 1577–1579.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456.
Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q matrix. Psychometrika, 83(1), 89–108.
Cronbach, L., Gleser, G., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48, 1–28.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.
De la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Embretson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56(3), 495–515.
Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2018). Network psychometrics. In P. Irwing, D. Hughes, & T. Booth (Eds.), The Wiley handbook of psychometric testing. New York: Elsevier.
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York: Springer Science & Business Media.
Gao, X., Shavelson, R. J., & Baxter, G. P. (1994). Generalizability of large-scale performance assessments in science: Promises and problems. Applied Measurement in Education, 7(4), 323–342.
Geerlings, H., Glas, C. A., & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76(2), 337.
Gelman, A., Stern, H. S., Carlin, J. B., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. New York: Chapman and Hall/CRC.
Gulliksen, H. (2013). Theory of mental tests. New York: Routledge.
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191.
Jansen, B. R., & van der Maas, H. L. (2002). The development of children’s rule use on the balance scale task. Journal of Experimental Child Psychology, 81(4), 383–416.
Jones, L. V., & Thissen, D. (2007). A history and overview of psychometrics. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 1–27). New York: Elsevier.
Joreskog, K. G., Sorbom, D., & Magidson, J. (1979). Advances in factor analysis and structural equation models. New York, NY: New York University Press.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Lazarsfeld, P. F. (1959). Latent structure analysis. Psychology: A Study of a Science, 3, 476–543.
Levy, R. (2009). The rise of Markov chain Monte Carlo estimation for psychometric modeling. Journal of Probability and Statistics, 2009, Article ID 537139.
Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. New York: Chapman and Hall/CRC.
Little, R. J., & Rubin, D. B. (2014). Statistical analysis with missing data. Hoboken, NJ: Wiley.
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36(7), 548–564.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley.
Luo, Y., & Jiao, H. (2018). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 78(3), 384–408.
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862.
Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R. van, Waldorp, L., …Maris, G. (2018). An introduction to network psychometrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research, 53(1), 15–35.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127–143.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.
Millsap, R. E. (2012). Statistical approaches to measurement invariance. New York: Routledge.
Mislevy, R. J. (2016). Missing responses in item response modeling. In W. J. van der Linden (Ed.), Handbook of item response theory, volume two: Statistical tools (pp. 171–194). Boca Raton, FL: Chapman Hall/CRC Press.
Mislevy, R. J., Behrens, J. T., DiCerbo, K. E., & Levy, R. (2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of Educational Data Mining, 4(1), 11–48.
Mislevy, R. J., & Gitomer, D. H. (1995). The role of probability-based inference in an intelligent tutoring system. User Modeling and User-Adapted Interaction, 5(3), 253–282.
Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. Tech. Rep. No. RR-96-30-ONR. Princeton, NJ: Educational Testing Service.
Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. Unpublished doctoral dissertation, University of California at Berkeley.
National Research Council. (2001). Knowing what students know: The science and design of educational assessment. Washington, D.C.: National Academies Press.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.
Pourret, O., Naïm, P., & Marcot, B. (2008). Bayesian networks: A practical guide to applications. John Wiley & Sons.
Rabe-Hesketh, S., & Skrondal, A. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. New York: Chapman and Hall/CRC.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502.
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Copenhagen: Danish Institute for Educational Research.
Reckase, M. D. (2009). Multidimensional item response theory models. In M. D. Reckase (Ed.), Multidimensional item response theory (pp. 79–112). New York: Springer.
Romero, C., Ventura, S., Pechenizkiy, M., & Baker, R. S. (2010). Handbook of educational data mining. New York: Chapman and Hall/CRC press.
Rupp, A., Templin, J., & Henson, R. (2010). Diagnostic assessment: Theory, methods, and applications. New York: Guilford.
Scalise, K. (2017). Hybrid measurement models for technology-enhanced assessments through mIRT-bayes. International Journal of Statistics and Probability, 6(3), 168.
Shute, V. J. (2011). Stealth assessment in computer-based games to support learning. In J. D. Fletcher & S. Tobias (Eds.), Computer games and instruction (pp. 503–524). Charlotte, NC: Information Age Press.
Spearman, C. (1904). “General Intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201–292.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Takane, Y., & De Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408.
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345–354.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Tucker, L. R. (1955). The objective definition of simple structure in linear factor analysis. Psychometrika, 20(3), 209–225.
VanLehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Shelby, R., Taylor, L.,…Wintersgill, M. (2005). The Andes physics tutoring system: Lessons learned. International Journal of Artificial Intelligence in Education, 15(3), 147–204.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
Wang, X., Berger, J. O., Burdick, D. S., et al. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7(1), 126–153.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK: Cambridge University Press.
Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12(3), 239–252.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mislevy, R.J., Bolsinova, M. (2021). Concepts and Models from Psychometrics. In: von Davier, A.A., Mislevy, R.J., Hao, J. (eds) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-74394-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74393-2
Online ISBN: 978-3-030-74394-9
eBook Packages: EducationEducation (R0)