Quality of Life Research

, Volume 12, Issue 8, pp 887–902 | Cite as

The feasibility of applying item response theory to measures of migraine impact: A re-analysis of three clinical studies

  • Jakob B. Bjorner
  • Mark Kosinski
  • John E. Ware Jr


Background: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. Objectives: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact – the Migraine Specific Questionnaire (MSQ). Methods: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. Results: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. Conclusions: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing.

Headache Health status Item response theory Migraine Questionnaires 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care 2000; 38: II28–II42.Google Scholar
  2. 2.
    McHorney CA, Cohen AS. Equating health status measures with item response theory: Illustrations with functional status items. Med Care 2000; 38: II43–II59.Google Scholar
  3. 3.
    McHorney CA. Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med 1997; 127: 743–750.Google Scholar
  4. 4.
    Breslau N, Rasmussen BK. The impact of migraine: Epidemiology, risk factors, and co-morbidities. Neurology 2001; 56: S4–S12.Google Scholar
  5. 5.
    Schwartz BS, Stewart WF, Simon D, Lipton RB. Epidemiology of tension-type headache. JAMA 1998; 279: 381–383.Google Scholar
  6. 6.
    Martin BC, Pathak DS, Sharfman MI, et al. Validity and reliability of the migraine-specific quality of life questionnaire (MSQ Version 2.1). Headache 2000; 40: 204–215.Google Scholar
  7. 7.
    Wagner TH, Patrick DL, Galer BS, Berzon RA. A new instrument to assess the long-term quality of life effects from migraine: Development and psychometric testing of the MSQOL. Headache 1996; 36: 484–492.Google Scholar
  8. 8.
    Patrick DL, Hurst BC, Hughes J. Further development and testing of the migraine-specific quality of life (MSQOL) measure. Headache 2000; 40: 550–560.Google Scholar
  9. 9.
    Jacobson GP, Ramadan NM, Aggarwal SK, Newman CW. The Henry Ford Hospital Headache Disability Inventory (HDI). Neurology 1994; 44: 837–842.Google Scholar
  10. 10.
    Jacobson GP, Ramadan NM, Norris L, Newman CW. Headache disability inventory (HDI): Short-term test-retest reliability and spouse perceptions. Headache 1995; 35: 534–539.Google Scholar
  11. 11.
    Stewart WF, Lipton RB, Simon D, Liberman J, Von Korff M. Validity of an illness severity measure for headache in a population sample of migraine sufferers. Pain 1999; 79: 291–301.Google Scholar
  12. 12.
    Stewart WF, Lipton RB, Dowson AJ, Sawyer J. Development and testing of the Migraine Disability Assessment (MIDAS) Questionnaire to assess headache-related disability. Neurology 2001; 56: S20–S28.Google Scholar
  13. 13.
    Stewart WF, Lipton RB, Kolodner K, Liberman J, Sawyer J. Reliability of the migraine disability assessment score in a population-based sample of headache sufferers. Cephalalgia 1999; 19: 107–114.Google Scholar
  14. 14.
    Stewart WF, Lipton RB, Kolodner KB, Sawyer J, Lee C, Liberman JN. Validity of the Migraine Disability Assessment (MIDAS) score in comparison to a diary-based measure in a population sample of migraine sufferers. Pain 2000; 88: 41–52.Google Scholar
  15. 15.
    Kosinski M, Bjorner JB, Ware JE Jr. Batenhorst A, Cady RK. The responsiveness of headache impact scales scored using 'classical' and 'modern' psychometric methods: A reanalysis of three clinical trials. Quality Life Res 2003; 12: 903–912.Google Scholar
  16. 16.
    Jhingran P, Osterhaus JT, Miller DW, Lee JT, Kirchdoerfer L. Development and validation of the Migraine-Specific Quality of Life Questionnaire. Headache 1998; 38: 295–302.Google Scholar
  17. 17.
    Muraki E. A generalized partial credit model. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory, Berlin: Springer, 1997; 153–164.Google Scholar
  18. 18.
    Muraki E. Information functions of the generalized partial credit model. Appl Psychol Measur 1993; 17: 351–363.Google Scholar
  19. 19.
    Thissen D, Steinberg L. A taxonomy of item response models. Psychometrika 1986; 51: 567–577.Google Scholar
  20. 20.
    Mellenbergh GJ. Conceptual Notes on Models for Discrete Polytomous Item Responses. Appl Psychol Measur 1995; 19: 91–100.Google Scholar
  21. 21.
    van der Linden WJ, Hambleton RK. Handbook of Modern Item Response Theory. Berlin: Springer, 1997.Google Scholar
  22. 22.
    Masters GN, Wright BD. The partial credit model. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory. Berlin: Springer, 1997; 101–122.Google Scholar
  23. 23.
    Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests, Chicago: University of Chicago Press, 1980.Google Scholar
  24. 24.
    Andersen EB. Discrete Statistical Models with Social Science Applications. Amsterdam: North-Holland, 1980.Google Scholar
  25. 25.
    Fischer GH, Molenaar IW. Rasch Models-Foundations, Recent Developments, and Applications. Berlin: Springer-Verlag, 1995.Google Scholar
  26. 26.
    Samejima F. Graded response model. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory. Berlin: Springer, 1997; 85–100.Google Scholar
  27. 27.
    Bock RD. The nominal categories model. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory. Berlin: Springer, 1997: 3–50.Google Scholar
  28. 28.
    Bock RD, Mislevy RJ. Adaptive EAP estimation of ability in a microcomputer environment. Appl Psychol Measur 1982; 6: 431–444.Google Scholar
  29. 29.
    Drasgow F, Levine MV, Williams EA. Appropriateness measurement with polychotomous item response models and standardized indices. Br J Math Stat Psychol 1985; 38: 67–86.Google Scholar
  30. 30.
    Cohen JA, Beall D, Beck A, et al. Sumatriptan treatment for migraine in a health maintenance organization: Economic, humanistic, and clinical outcomes. Clin Ther 1999; 21: 190–204.Google Scholar
  31. 31.
    Adelman JU, Sharfman MI, Johnson R, et al. Impact of oral sumatirptan on workplace productivity, health-related quality of life, healthcare use, and patient satisfaction with medication in nurses with migraine. Managed Care 1996; 2: 1407–1416.Google Scholar
  32. 32.
    Lofland JH, Johnson NE, Batenhorst AS, Nash DB. Changes in resource use and outcomes for patients with migraine treated with sumatriptan: A managed care perspective. Arch Intern Med 1999; 159: 857–863.Google Scholar
  33. 33.
    Jhingran P, Davis SM, LaVange LM, Miller DW, Helms RW. MSQ: Migraine-Specific Quality-of-Life Questionnaire. Further investigation of the factor structure. Pharmacoeconomics 1998; 13: 707–717.Google Scholar
  34. 34.
    Ware JE Jr. Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales. Med Care 2000; 38: II73–II82.Google Scholar
  35. 35.
    Muthen BO, Muthen L. Mplus User's Guide. Los Angeles: Muthén & Muthén, 1998.Google Scholar
  36. 36.
    Bollen KA, Barb KH. Pearson's r and coarsely categorized measures. Am Sociol Rev 1981; 46: 232–239.Google Scholar
  37. 37.
    Nunnally JC, Bernstein IH. Psychometric Theory. New York: McGraw-Hill, Inc., 1994.Google Scholar
  38. 38.
    Ramsay JO. Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika 1991; 56: 611–630.Google Scholar
  39. 39.
    Ramsay JO. TestGraf-A Program for the Graphical Analysis of Multiple Choice Test and Questionnaire Data. Montreal: McGill University, 1995.Google Scholar
  40. 40.
    Bock RD, Aitkin M. Marginal Maximum Likelihood Estimation of Item Parameters: Application of an EM Algorithm. Psychometrika 1981; 46: 443–459.Google Scholar
  41. 41.
    Muraki E, Bock RD. Parscale-IRT based Test Scoring and Item Analysis for Graded Open-ended Exercises and Performance Tasks. Chicago: Scientific Software Inc., 1996.Google Scholar
  42. 42.
    Orlando M, Thissen D. Likelihood-Based Item-Fit Indices for Dichotomous Item Response Theory Models. Applied Psychological Measurement 2000; 24: 50–64.Google Scholar
  43. 43.
    Holm S. A simple sequentially rejective multiple test procedure. Scand J Statist 1979; 6: 65–70.Google Scholar
  44. 44.
    Thissen D, Orlando M. Item Response Theory for Items Scored in Two Categories. In: Thissen D, Wainer H (eds), Test Scoring. Mahwah: Lawrence Earlbaum, 2001; 73–140.Google Scholar
  45. 45.
    Bjorner JB, Damsgaard MT, Watt T, Groenvold M. Test of Data Quality, Scaling Assumptions, and Reliability of the Danish SF-36. J Clin Epidemiol 1998; 51: 1001–1011.Google Scholar
  46. 46.
    Stewart WF, Lipton RB, Simon D, Von Korff M, Liberman J. Reliability of an illness severity measure for headache in a population sample of migraine sufferers. Cephalalgia 1998; 18: 44–51.Google Scholar
  47. 47.
    Bjorner JB, Kosinski M, Ware JE Jr. Calibration of an item pool for assessing the burden of headaches: An application of response theory to the headache impact test (HIT™). Quality Life Res 2003; 12: 913–933.Google Scholar
  48. 48.
    Ware JE Jr. Kosinski M, Bjorner JB, et al. Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality Life Res 2003; 12: 935–952.Google Scholar
  49. 49.
    Clayton D, Hills M. Statistical Models in Epidemiology. Oxford: Oxford University Press, 1993.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Jakob B. Bjorner
    • 1
    • 2
  • Mark Kosinski
    • 1
  • John E. Ware Jr
    • 1
    • 3
  1. 1.QualityMetric IncorporatedLincolnUSA
  2. 2.National Institute of Occupational HealthCopenhagenDenmark
  3. 3.Health Assessment LabWalthamUSA

Personalised recommendations