Quality of Life Research

, Volume 23, Issue 2, pp 485–493 | Cite as

Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data

  • Wen-Hung Chen
  • William Lenderking
  • Ying Jin
  • Kathleen W. Wyrwich
  • Heather Gelhorn
  • Dennis A. Revicki



Large samples are generally considered necessary for Rasch model to obtain robust item parameter estimates. Recently, small sample Rasch analysis was suggested as preliminary assessment of items’ psychometric properties. This study is to evaluate the Rasch analysis results using small sample sizes.


Ten PROMIS pain behavior items were used. Random samples of 30, 50, 100, and 250, and a targeted sample of 30 were drawn 10 times each from a total of 800 subjects. Rasch analysis was conducted for each of these samples and the full sample.


In the full sample, there were 104 cases of extreme scores, no null categories, two incorrectly ordered items, and four misfit items. For samples of 250, 100, 50, 30, and targeted 30, the average numbers of extreme scores were 42.2, 17.1, 9.6, 6.1, and 1.2; the average numbers of null categories were 1.0, 3.2, 8.7, 13.4, and 8.3; the average numbers of items with incorrectly ordered item parameters were 0.1, 0.8, 2.9, 4.7, and 3.7; and the average numbers of items with fit residuals exceeding ±2.5 were 0.8, 0.3, 0.1, 0.2, and 0.3, respectively.


Rasch analysis based on small samples (≤50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥100). However, fewer items were identified as misfitting. Results from small samples led to opposite conclusions from those based on larger samples. Rasch analysis based on small samples should be used for exploratory purposes with extreme caution.


Rasch model PROMIS pain behavior item bank Mixed methods Patient-reported Outcomes measure Rasch model sample size 


  1. 1.
    Ring, L., Gross, C. R., & McColl, E. (2010). Putting the text back into context: Toward increased use of mixed methods for quality of life research. Quality of Life Research, 19(5), 613–615.PubMedCrossRefGoogle Scholar
  2. 2.
    Klassen, A. C., Creswell, J., Plano Clark, V. L., Smith, K. C., & Meissner, H. I. (2012). Best practices in mixed methods for quality of life research. Quality of Life Research, 21(3), 377–380.PubMedCrossRefGoogle Scholar
  3. 3.
    Food and Drug Administration. (2009). Guidance for industry on patient-reported outcome measures: use in medical product development to support labeling claims. Federal Register, 74(235), 65132–65133.Google Scholar
  4. 4.
    Hudgens, S., Globe, D., & Burgess, S. M. (2012). Utilization of Rasch measurement models for assessing validity: a mixed methods approach. Workshop presented at the International Society Pharmacoeconomics and Outcome Research (ISPOR) 17th Annual International Meeting, Washington, DC.Google Scholar
  5. 5.
    Hudgens, S., Norquist, J., Wyrwich, K. W., Coons, S. J., & Lenderking, W. R. (Oct 24, 2012). Perspectives on mixed methods to assess content validity of a PRO measure. Presented at the Industry Advisory Committee Symposium, International Society for Quality of Life Research (ISOQOL) 19th annual conference, Budapest.Google Scholar
  6. 6.
    Linacre, J. M. (1994). Sample size and item calibrations stability. Rasch Measurement Transactions, 7(4), 328.Google Scholar
  7. 7.
    Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106.PubMedGoogle Scholar
  8. 8.
    Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.CrossRefGoogle Scholar
  9. 9.
    Revicki, D. A., Chen, W. H., Harnam, N., Cook, K. F., Amtmann, D., Callahan, L. F., et al. (2009). Development and psychometric analysis of the PROMIS pain behavior item bank. Pain, 146(1–2), 158–169.PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Muthén, L. K., & Muthén, B. (1998–2004). Mplus user’s guide (3rd ed.). Los Angeles, CA: Muthén & Muthén.Google Scholar
  11. 11.
    Serlin, R. C., Mendoza, T. R., Nakamura, Y., Edwards, K. R., & Cleeland, C. S. (1995). When is cancer pain mild, moderate or severe? Grading pain severity by its interference with function. Pain, 61(2), 277–284.PubMedCrossRefGoogle Scholar
  12. 12.
    Andrich, D., Sheridan, B., & Lou, G. (2009). RUMM2030. Perth, Australia: RUMM Laboratory.Google Scholar
  13. 13.
    Andrich, D. (2004). Controversy and the Rasch model: a characteristic of incompatible paradigms? Medical Care, 42(1 Suppl), I7–16.PubMedGoogle Scholar
  14. 14.
    Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65–104). New Jersey: Lawrence Erlbaum Associates.Google Scholar
  15. 15.
    Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588–606.CrossRefGoogle Scholar
  16. 16.
    Greenwood, P. E., & Nihulin, M. S. (1996). A guide to Chi square testing. New York, NY: Wiley.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Wen-Hung Chen
    • 1
  • William Lenderking
    • 1
  • Ying Jin
    • 2
  • Kathleen W. Wyrwich
    • 1
  • Heather Gelhorn
    • 1
  • Dennis A. Revicki
    • 1
  1. 1.Center for Health Outcomes ResearchUnited BioSource CorporationBethesdaUSA
  2. 2.Association of American Medical CollegesWashingtonUSA

Personalised recommendations