Skip to main content
Log in

Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing

Psychometrika Aims and scope Submit manuscript

Abstract

Lord and Wingersky’s (Appl Psychol Meas 8:453–461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord–Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord–Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.

    Article  Google Scholar 

  • Cai, L. (2010a). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75, 33–57.

    Article  Google Scholar 

  • Cai, L. (2010b). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612.

    Article  Google Scholar 

  • Cai, L. (2013). flexMIRT Version 2.0: Flexible multilevel item analysis and test scoring (Computer software). Chapel Hill, NC: Vector Psychometric Group LLC.

    Google Scholar 

  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling (Computer software). Chicago, IL: Scientific Software International.

    Google Scholar 

  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248.

    Article  PubMed Central  PubMed  Google Scholar 

  • Chen, W. H., & Thissen, D. (1999). Estimation of item parameters for the three-parameter logistic model using the marginal likelihood of summed scores. British Journal of Mathematical and Statistical Psychology, 52, 19–37.

    Article  Google Scholar 

  • Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75, 474–497.

    Article  Google Scholar 

  • Ferrando, P. J., & Lorenzo-seva, U. (2001). Checking the appropriateness of item response theory models by predicting the distribution of observed scores: The program EO-fit. Educational and Psychological Measurement, 61, 895–902.

    Article  Google Scholar 

  • Gibbons, R. D., & Hedeker, D. (1992). Full-information item bifactor analysis. Psychometrika, 57, 423–436.

    Article  Google Scholar 

  • Gibbons, R. D., Bock, R. D., Hedeker, D., Weiss, D. J., Segawa, E., Bhaumik, D. K., et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.

    Article  Google Scholar 

  • Glas, C. A. W., Wainer, H., & Bradlow, E. T. (2000). Maximum marginal likelihood and expected a posteriori estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 271–288). Boston, MA: Kluwer.

    Chapter  Google Scholar 

  • Hambleton, R. K., & Traub, R. E. (1973). Analysis of empirical data using two logistic latent trait models. British Journal of Mathematical and Statistical Psychology, 26, 195–211.

    Article  Google Scholar 

  • Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41–54.

    Article  Google Scholar 

  • Ip, E. H. (2010a). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.

    Article  PubMed  Google Scholar 

  • Ip, E. H. (2010b). Interpretation of the three-parameter testlet response model and information function. Applied Psychological Measurement, 34, 467–482.

    Article  Google Scholar 

  • Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38, 32–60.

    Article  Google Scholar 

  • Li, Y., & Rupp, A. A. (2011). Performance of the \(S-X^{2}\) statistic for full-information bifactor models. Educational and Psychological Measurement, 71, 986–1005.

    Article  Google Scholar 

  • Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.

    Article  Google Scholar 

  • Li, Z. & Cai, L. (2012). Summed score based fit indices for testing latent variable distribution assumption in IRT. Paper presented at the 2012 International Meeting of the Psychometric Society, Lincoln, NE.

  • Lord, F. M. (1953). The relation of test score to the latent trait underlying the test. Educational and Psychological Measurement, 13, 517–548.

    Article  Google Scholar 

  • Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 453–461.

    Article  Google Scholar 

  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

    Article  Google Scholar 

  • Orlando, M., & Thissen, D. (2000). New item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.

    Article  Google Scholar 

  • Orlando, M., Sherbourne, C. D., & Thissen, D. (2000). Summed-score linking using item response theory: Application to depression measurement. Psychological Assessment, 12, 354–359.

    Article  PubMed  Google Scholar 

  • Reckase, M. D. (2009). Multidimentional item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the patient-reported outcome measurement information system (PROMIS). Medical Care, 45, 22–31.

    Article  Google Scholar 

  • Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.

    Article  PubMed Central  PubMed  Google Scholar 

  • Rijmen, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models (Tech. Rep. No. RR-09-03). Educational Testing Service.

  • Rijmen, F. (2010). Formal relations and an empirical comparison between the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.

    Article  Google Scholar 

  • Rijmen, F., Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73, 167–182.

    Article  PubMed Central  PubMed  Google Scholar 

  • Rosa, K., Swygert, K. A., Nelson, L., & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—scale scores for patterns of summed scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 253–292). Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Ross, J. (1966). An empirical study of a logistic mental test model. Psychometrika, 31, 325–340.

    Article  PubMed  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monographs No. 17). Richmond, VA: Psychometric Society.

    Google Scholar 

  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.

    Google Scholar 

  • Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53–61.

    Article  Google Scholar 

  • Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.

    Article  Google Scholar 

  • Stucky, B. D., Thissen, D., & Edelen, M. O. (2013). Using logistic approximations of marginal trace lines to develop short assessments. Applied Psychological Measurement, 37, 41–57.

    Article  Google Scholar 

  • Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39–49.

    Article  Google Scholar 

  • Thissen, D., Varni, J. W., Stucky, B. D., Liu, Y., Irwin, D. E., & DeWalt, D. A. (2011). Using the PedsQL™ 3.0 asthma module to obtain scores comparable with those of the PROMIS pediatric asthma impact scale (PAIS). Quality of Life Research, 20, 1497–1505.

    Article  PubMed Central  PubMed  Google Scholar 

  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.

    Book  Google Scholar 

  • Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.

    Article  PubMed Central  PubMed  Google Scholar 

  • Wu, E. J. C., & Bentler, P. M. (2011). EQSIRT: A user-friendly IRT program (Computer software). Encino, CA: Multivariate Software.

    Google Scholar 

  • Yung, Y. F., McLeod, L. D., & Thissen, D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113–128.

    Article  Google Scholar 

Download references

Acknowledgments

Part of this research is supported by the Institute of Education Sciences (R305B080016 and R305D100039) and the National Institute on Drug Abuse (R01DA026943 and R01DA030466). The views expressed here belong to the author and do not reflect the views or policies of the funding agencies. The author is grateful to Dr. David Thissen and members of the UCLA psychometric lab (in particular Carl Falk, Jane Li, and Ji Seung Yang) for comments on an earlier draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Cai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, L. Lord–Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing. Psychometrika 80, 535–559 (2015). https://doi.org/10.1007/s11336-014-9411-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9411-3

Keywords

Navigation