Abstract
In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.
Similar content being viewed by others
References
Allen, N., Donoghue, J.R., & Schoeps, T.L. (2001). The NAEP 1998 technical report (NCES 2001-509). Washington, DC: Office of Educational Research and Improvement, US Department of Education.
Angoff, W.H. (1968). How we calibrate college board scores. College Board Review, 68, 11–14.
Bock, R.D., & Zimowski, M.F. (2003). Feasibility studies of two-stage testing in large-scale educational assessment: implications for NAEP. NAEP Validity Studies (NVS). Washington, DC: Office of Educational Research and Improvement, US Department of Education.
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22(1), 13–20.
Camilli, G., Wang, M., & Fesq, J. (1995). The effects of dimensionality on equating the law school admission test. Journal of Educational Measurement, 32, 79–96.
Dorans, N.J., Kubiak, A., & Melican, G.J. (1998). Guidelines for selection of embedded common items for score equating (ETS SR-98-02). Princeton: ETS.
Grigg, W.S., Daane, M.C., Jin, Y., & Campbell, J.R. (2003). The nation’s report card: reading 2002 (NCES 2003-521). Washington, DC: National Center for Educational Statistics.
Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49–78.
Hattie, J. (1985). Methodological review: assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139–164.
Hays, W.L. (1973). Statistics for the social sciences. San Francisco: Holt, Rinehart & Winston.
Hetter, R., & Sympson, B. (1997). Item exposure control in CAT-ASVAB. In W. Sands, B. Waters, & J. McBride (Eds.), Computerized adaptive testing: from inquiry to operation (pp. 141–144). Washington, DC: American Psychological Association.
Hulin, C.L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: application to psychological measurement. Homewood: Dow Jones-Irwin.
Kim, H.R. (1994). New techniques for the dimensionality assessment of standardized test data. Unpublished doctoral dissertation, Department of Statistics, University of Illinois at Urbana—Champaign.
Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer.
Lord, F.M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.
McDonald, R.P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34, 100–117.
McDonald, R.P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379–396.
McDonald, R.P. (1994). Testing for approximate dimensionality. In D. Laveault, B. Zumbo, M. Gessaroli, & M. Boss (Eds.), Modern theories of measurement: problems and issues (pp. 63–85). Ottawa: University of Ottawa Press.
McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 258–269). New York: Springer.
Mislevy, R. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11, 3–31.
Mislevy, R., & Bock, R.D. (1982). BILOG: item analysis and test scoring with binary logistic models [Computer software]. Mooresville: Scientific Software.
Muraki, E., & Bock, R.D. (1997). PARSCALE: IRT item analysis and test scoring for rating scale data [Computer software]. Chicago: Scientific Software.
National Assessment Governing Board (2005). Reading framework for the 2005 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.
Oltman, P.K., Stricker, L.J., & Barrows, T.S. (1990). Analyzing test structure by multidimensional scaling. Journal of Applied Psychology, 75, 21–27.
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.
Reckase, M.D., & McKinley, R.L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361–373.
Roussos, L.A., & Ozbek, O. (2006). Formulation of the DETECT population parameter and evaluation of DETECT estimator bias. Journal of Educational Measurement, 43, 215–243.
Roussos, L.A., Stout, W.F., & Marden, J. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1–30.
Sinharay, S., & Holland, P.W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Journal of Educational Measurement, 44, 249–275.
Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589–617.
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293–326.
Stout, W.F., Habing, B., Douglas, J., Kim, H.R., Roussos, L.A., & Zhang, J. (1996). Conditional covariance based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331–354.
Sympson, J.B., & Hetter, R.D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the military testing association (pp. 973–977). San Diego: Navy Personnel Research and Development Center.
UNESCO-UIS (2008). Literacy assessment and monitoring programme (LAMP): framework for the assessment of reading component skills. Montreal: UNESCO Institute for Statistics (UIS).
UNESCO-UIS (2009). The next generation of literacy statistics: implementing the literacy assessment and monitoring programme (LAMP). Montreal: UNESCO Institute for Statistics (UIS).
Van Abswoude, A.A.H., Van der Ark, L.A., & Sijtsma, K. (2004). A comparative study on test dimensionality assessment procedures under nonparametric IRT models. Applied Psychological Measurement, 28, 3–24.
Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika, 72, 69–91.
Zhang, J., & Stout, W.F. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213–249.
Acknowledgements
The author would like to thank Brenda Tay-Lim and Demin Iris Qu for providing the LAMP field response data and other information about LAMP design, and Ting Lu and Sarah Zhang for their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, J. A Procedure for Dimensionality Analyses of Response Data from Various Test Designs. Psychometrika 78, 37–58 (2013). https://doi.org/10.1007/s11336-012-9287-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9287-z