Allen, N., Donoghue, J.R., & Schoeps, T.L. (2001). The NAEP 1998 technical report (NCES 2001-509). Washington, DC: Office of Educational Research and Improvement, US Department of Education.
Angoff, W.H. (1968). How we calibrate college board scores. College Board Review, 68, 11–14.
Bock, R.D., & Zimowski, M.F. (2003). Feasibility studies of two-stage testing in large-scale educational assessment: implications for NAEP. NAEP Validity Studies (NVS). Washington, DC: Office of Educational Research and Improvement, US Department of Education.
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement
Camilli, G., Wang, M., & Fesq, J. (1995). The effects of dimensionality on equating the law school admission test. Journal of Educational Measurement
Dorans, N.J., Kubiak, A., & Melican, G.J. (1998). Guidelines for selection of embedded common items for score equating (ETS SR-98-02). Princeton: ETS.
Grigg, W.S., Daane, M.C., Jin, Y., & Campbell, J.R. (2003). The nation’s report card: reading 2002 (NCES 2003-521). Washington, DC: National Center for Educational Statistics.
Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research
Hattie, J. (1985). Methodological review: assessing unidimensionality of tests and items. Applied Psychological Measurement
Hays, W.L. (1973). Statistics for the social sciences. San Francisco: Holt, Rinehart & Winston.
Hetter, R., & Sympson, B. (1997). Item exposure control in CAT-ASVAB. In W. Sands, B. Waters, & J. McBride (Eds.), Computerized adaptive testing: from inquiry to operation
(pp. 141–144). Washington, DC: American Psychological Association.
Hulin, C.L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: application to psychological measurement. Homewood: Dow Jones-Irwin.
Kim, H.R. (1994). New techniques for the dimensionality assessment of standardized test data. Unpublished doctoral dissertation, Department of Statistics, University of Illinois at Urbana—Champaign.
Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer.
Lord, F.M. (1971). A theoretical study of two-stage testing. Psychometrika
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.
McDonald, R.P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology
McDonald, R.P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement
McDonald, R.P. (1994). Testing for approximate dimensionality. In D. Laveault, B. Zumbo, M. Gessaroli, & M. Boss (Eds.), Modern theories of measurement: problems and issues (pp. 63–85). Ottawa: University of Ottawa Press.
McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 258–269). New York: Springer.
Mislevy, R. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics
Mislevy, R., & Bock, R.D. (1982). BILOG: item analysis and test scoring with binary logistic models [Computer software]. Mooresville: Scientific Software.
Muraki, E., & Bock, R.D. (1997). PARSCALE: IRT item analysis and test scoring for rating scale data [Computer software]. Chicago: Scientific Software.
National Assessment Governing Board (2005). Reading framework for the 2005 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.
Oltman, P.K., Stricker, L.J., & Barrows, T.S. (1990). Analyzing test structure by multidimensional scaling. Journal of Applied Psychology
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement
Reckase, M.D., & McKinley, R.L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement
Roussos, L.A., & Ozbek, O. (2006). Formulation of the DETECT population parameter and evaluation of DETECT estimator bias. Journal of Educational Measurement
Roussos, L.A., Stout, W.F., & Marden, J. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement
Sinharay, S., & Holland, P.W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Journal of Educational Measurement
Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika
Stout, W.F., Habing, B., Douglas, J., Kim, H.R., Roussos, L.A., & Zhang, J. (1996). Conditional covariance based nonparametric multidimensionality assessment. Applied Psychological Measurement
Sympson, J.B., & Hetter, R.D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the military testing association (pp. 973–977). San Diego: Navy Personnel Research and Development Center.
UNESCO-UIS (2008). Literacy assessment and monitoring programme (LAMP): framework for the assessment of reading component skills. Montreal: UNESCO Institute for Statistics (UIS).
UNESCO-UIS (2009). The next generation of literacy statistics: implementing the literacy assessment and monitoring programme (LAMP). Montreal: UNESCO Institute for Statistics (UIS).
Van Abswoude, A.A.H., Van der Ark, L.A., & Sijtsma, K. (2004). A comparative study on test dimensionality assessment procedures under nonparametric IRT models. Applied Psychological Measurement
Zhang, J. (2007). Conditional covariance theory and DETECT for polytomous items. Psychometrika
Zhang, J., & Stout, W.F. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika