Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption

Cook, Karon F.; Kallen, Michael A.; Amtmann, Dagmar

doi:10.1007/s11136-009-9464-4

Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption

Published: 18 March 2009

Volume 18, pages 447–460, (2009)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Karon F. Cook^1,2,
Michael A. Kallen³ &
Dagmar Amtmann¹

1607 Accesses
218 Citations
Explore all metrics

Abstract

Purpose

Confirmatory factor analysis fit criteria typically are used to evaluate the unidimensionality of item banks. This study explored the degree to which the values of these statistics are affected by two characteristics of item banks developed to measure health outcomes: large numbers of items and nonnormal data.

Methods

Analyses were conducted on simulated and observed data. Observed data were responses to the Patient-Reported Outcome Measurement Information System (PROMIS) Pain Impact Item Bank. Simulated data fit the graded response model and conformed to a normal distribution or mirrored the distribution of the observed data. Confirmatory factor analyses (CFA), parallel analysis, and bifactor analysis were conducted.

Results

CFA fit values were found to be sensitive to data distribution and number of items. In some instances impact of distribution and item number was quite large.

Conclusions

We concluded that using traditional cutoffs and standards for CFA fit statistics is not recommended for establishing unidimensionality of item banks. An investigative approach is favored over reliance on published criteria. We found bifactor analysis to be appealing in this regard because it allows evaluation of the relative impact of secondary dimensions. In addition to these methodological conclusions, we judged the items of the PROMIS Pain Impact bank to be sufficiently unidimensional for item response theory (IRT) modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory

Article Open access 30 July 2019

Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks

Article 01 December 2016

ExternalRefStart http://www.common-metrics.org www.common-metrics.org ExternalRefEnd : a web application to estimate scores from different patient-reported outcome measures on a common scale

Article Open access 19 October 2016

Abbreviations

CAT:: Computer adaptive testing
CFA:: Confirmatory factor analyses
CFI:: Comparative Fit Index
EAP:: Expected a priori
EFA:: Exploratory factor analyses
GED:: General Educational Development
GRM:: Graded response model
IRT:: Item response theory
NIH:: National Institutes of Health
NNFI:: Nonnormed Fit Index
PROMIS:: Patient-Reported Outcomes Measurement Information System
PROs:: Patient-reported outcomes
RMSEA:: Root-mean-square error of approximation
SD:: Standard deviation
SRMR:: Standardized root-mean-square error
TLI:: Tucker–Lewis index
WLSMV:: Weighted least squares with mean and variance adjustment
WRMR:: Weighted root-mean-square residual

References

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahway, NJ: Lawrence Erlbaum Associates, Publishers.
Google Scholar
Hambleton, R., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publishing, Inc.
Google Scholar
Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research, 40, 1694–1711. doi:10.1111/j.1475-6773.2005.00446.x.
Article PubMed Google Scholar
Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5–22. doi:10.1177/014662169501900103.
Article Google Scholar
Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care, 38, II28. doi:10.1097/00005650-200009002-00007.
Article PubMed CAS Google Scholar
Wainer, H. (1990). Computerized adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Cook, K. F., Teal, C. R., Bjorner, J. B., Cella, D., Chang, C. H., Crane, P. K., et al. (2007). IRT health outcomes data analysis project: An overview and summary. Quality of Life Research, 16(Suppl 1), 121–132. doi:10.1007/s11136-007-9177-5.
Article PubMed Google Scholar
McDonald, R. (1981). The dimensionality of test and items. The British Journal of Mathematical and Statistical Psychology, 34, 100–117.
Google Scholar
Reise, S. P., & Haviland, M. G. (2005). Item response theory and the measurement of clinical change. Journal of Personality Assessment, 84, 228–238. doi:10.1207/s15327752jpa8403_02.
Article PubMed Google Scholar
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287–297. doi:10.1037/1040-3590.12.3.287.
Article PubMed CAS Google Scholar
Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH Roadmap cooperative group during its first two years. Medical Care, 45, S3–S11. doi:10.1097/01.mlr.0000258615.42478.55.
Article PubMed Google Scholar
Bjorner, J. B., Kosinski, M., & Ware, J. E., Jr. (2003). Calibration of an item pool for assessing the burden of headaches: An application of item response theory to the headache impact test (HIT). Quality of Life Research, 12, 913–933. doi:10.1023/A:1026163113446.
Article PubMed Google Scholar
Lai, J. S., Crane, P. K., & Cella, D. (2006). Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue. Quality of Life Research, 15, 1179–1190. doi:10.1007/s11136-006-0060-6.
Article PubMed Google Scholar
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: The Guilford Press.
Google Scholar
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Hu, L. T., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362. doi:10.1037/0033-2909.112.2.351.
Article PubMed CAS Google Scholar
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286. doi:10.1037/1040-3590.7.3.286.
Article Google Scholar
Mannion, A. F., Elfering, A., Staerkle, R., Junge, A., Grob, D., Semmer, N. K., et al. (2005). Outcome assessment in low back pain: How low can you go? European Spine Journal, 14, 1014–1026. doi:10.1007/s00586-005-0911-9.
Article PubMed Google Scholar
McDonald, R. P., & Mok, M. M. C. (1995). Goodness of fit in item response models. Multivariate Behavioral Research, 30, 23–40. doi:10.1207/s15327906mbr3001_2.
Article Google Scholar
Bentler, P. M., & Mooijaart, A. (1989). Choice of structural model via parsimony: A rationale based on precision. Psychological Bulletin, 106, 315–317. doi:10.1037/0033-2909.106.2.315.
Article PubMed CAS Google Scholar
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–172). Newbury Park, CA: Sage Publications.
Google Scholar
Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues and applications (pp. 76–79). Thousand Oaks, CA: Sage Publications.
Google Scholar
Hu, L. T., & Bentler, P. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
Article Google Scholar
Muthen, B. O., & Muthen, L. K. (2001). Mplus user’s guide. Los Angeles, CA: Muthen & Muthen.
Google Scholar
Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Doctoral dissertation, University of California, Los Angeles.
Tucker, L., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10. doi:10.1007/BF02291170.
Article Google Scholar
Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246. doi:10.1037/0033-2909.107.2.238.
Article PubMed CAS Google Scholar
Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Lincolnwood, IL: Scientific Software International, Inc.
Google Scholar
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software.
Google Scholar
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. The British Journal of Mathematical and Statistical Psychology, 37, 62–83.
PubMed Google Scholar
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45, S22–S31. doi:10.1097/01.mlr.0000250483.85507.04.
Article PubMed Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahway, NJ: Lawrence Earlbaum.
Google Scholar
Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: The Guilford Press.
Google Scholar
West, S. G., Finch, J. F., & Curran, P. J. (1995). SEM with nonnormal variables. Thousand Oaks, CA: Sage Publications.
Google Scholar
Joreskog, K. G. (2005). Structural equation modeling with ordinal variables using LISREL. Lincolnwood, IL: Scientific Software International, Inc.
Google Scholar
Yuan, K. H., & Bentler, P. M. (1997). Mean and covariance structure analysis: Theoretical and practical improvements. Journal of the American Statistical Association, 92, 767–774. doi:10.2307/2965725.
Article Google Scholar
O’Connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, & Computers, 32, 396–402.
CAS Google Scholar
Reise, S., Widaman, K., & Pugh, R. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552. doi:10.1037/0033-2909.114.3.552.
Article PubMed CAS Google Scholar
Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(Suppl 1), 19–31. doi:10.1007/s11136-007-9183-7.
Article PubMed Google Scholar
Yung, Y. F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113–128. doi:10.1007/BF02294531.
Article Google Scholar
DeWalt, D. A., Rothrock, N., Yount, S., & Stone, A. A. (2007). Evaluation of item candidates: The PROMIS qualitative item review. Medical Care, 45, S12–S21. doi:10.1097/01.mlr.0000254567.79743.e2.
Article PubMed Google Scholar
Reeve, B. B., Burke, L. B., Chiang, Y. P., Clauser, S. B., Colpe, L. J., Elias, J. W., et al. (2007). Enhancing measurement in health outcomes research supported by Agencies within the US Department of Health and Human Services. Quality of Life Research, 16(Suppl 1), 175–186. doi:10.1007/s11136-007-9190-8.
Article PubMed Google Scholar
Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog (version 7). Lincolnwood, IL: Scientific Software International.
Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement No. 17.
Bjorner, J. B., Smith, K. J., Stone, C., & Sun, X. (2007). IRTFIT: A macro for item fit and local dependence tests under IRT models. Lincoln, RI: QualityMetric.
Google Scholar
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X²: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289–298. doi:10.1177/0146621603027004004.
Article Google Scholar
Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31, 457–459. doi:10.1177/0146621607299271.
Article Google Scholar
Han, K. T., & Hambleton, R. K. (2007). User’s manual: WinGen (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts, School of Education.
Google Scholar
Choi, S. W. (2008). Firestar: Computerized adaptive testing (CAT) simulation program for polytomous IRT models. Applied Psychological Measurement (in press).
Zinbarg, R. E., Barlow, D. H., & Brown, T. A. (1997). Hierarchical structure and general factor saturation of the Anxiety Sensitivity Index: Evidence and implications. Psychological Assessment, 9, 277–284. doi:10.1037/1040-3590.9.3.277.
Article Google Scholar

Download references

Acknowledgements

The Patient-Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) roadmap initiative to develop a computerized system measuring patient-reported outcomes in respondents with a wide range of chronic diseases and demographic characteristics. PROMIS was funded by cooperative agreements to a Statistical Coordinating Center (Evanston Northwestern Healthcare, PI: David Cella, PhD, U01AR52177) and six Primary Research Sites (Duke University, PI: Kevin Weinfurt, PhD, U01AR52186; University of North Carolina, PI: Darren DeWalt, MD, MPH, U01AR52181; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR52155; Stanford University, PI: James Fries, MD, U01AR52158; Stony Brook University, PI: Arthur Stone, PhD, U01AR52170; and University of Washington, PI: Dagmar Amtmann, PhD, U01AR52171). NIH Science Officers on this project are Deborah Ader, PhD, Susan Czajkowski, PhD, Lawrence Fine, MD, DrPH, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, and Susana Serrate-Sztein, PhD. This manuscript was reviewed by the PROMIS Publications Subcommittee prior to external peer review. See the web site at www.nihpromis.org for additional information on the PROMIS cooperative group.

Author information

Authors and Affiliations

Department of Rehabilitation Medicine, University of Washington, Box 357920, Seattle, WA, 98195-7920, USA
Karon F. Cook & Dagmar Amtmann
801 Cortlandt St, Houston, TX, 77007, USA
Karon F. Cook
Department of General Internal Medicine, University of Texas M. D. Anderson Cancer Center, PO Box 301402, Houston, TX, 77230-1402, USA
Michael A. Kallen

Authors

Karon F. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Kallen
View author publications
You can also search for this author in PubMed Google Scholar
Dagmar Amtmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karon F. Cook.

Appendix

See Table 5.

Table 5 Item content, response set, stem, and item parameters

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, K.F., Kallen, M.A. & Amtmann, D. Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption. Qual Life Res 18, 447–460 (2009). https://doi.org/10.1007/s11136-009-9464-4

Download citation

Received: 14 October 2008
Accepted: 04 March 2009
Published: 18 March 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s11136-009-9464-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption