Quality of Life Research

, Volume 27, Issue 9, pp 2403–2413 | Cite as

Grooming a CAT: customizing CAT administration rules to increase response efficiency in specific research and clinical settings

  • Michael A. KallenEmail author
  • Karon F. Cook
  • Dagmar Amtmann
  • Elizabeth Knowlton
  • Richard C. Gershon



To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT).


Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a “clinically relevant range” (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings.


Two different sets of CAT administration rules were compared. The original CAT (CATORIG) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CATALT), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CATORIG and CATALT.


CATORIG and CATALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CATALT (e.g., 41.2% savings in omnibus dataset).


Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.


Computer adaptive testing CAT stopping rules Response burden PROMIS® 



Funding was provided by U.S. Army.


  1. 1.
    Ahmed, S., Ware, P., Gardner, W., Witter, J., Bingham, C. O. 3rd, Kairy, D., et al. (2017) Montreal Accord on patient-reported outcomes use series-paper 8: Patient-reported outcomes in electronic health records can inform clinical and policy decisions. Journal of Clinical Epidemiology, 89, 160–167.CrossRefPubMedGoogle Scholar
  2. 2.
    Ameur, H., Ravaud, P., Fayard, F., Riveros, C., & Dechartres, A. (2017). Systematic reviews of therapeutic interventions frequently consider patient-important outcomes. Journal of Clinical Epidemiology, 84, 70–77.CrossRefPubMedGoogle Scholar
  3. 3.
    Broderick, J. E., DeWitt, E. M., Rothrock, N., Crane, P. K., & Forrest, C. B. (2013) Advances in patient-reported outcomes: The NIH PROMIS((R)) measures. EGEMS (Wash DC), 1, 1015.Google Scholar
  4. 4.
    Health USDo, Human Services FDACfDE, Research, Health USDo, Human Services FDACfBE, Research, et al. (2006). Guidance for industry: Patient-reported outcome measures: Use in medical product development to support labeling claims: Draft guidance. Health and Quality of Life Outcomes, 4, 79.CrossRefGoogle Scholar
  5. 5.
    Noonan, V. K., Lyddiatt, A., Ware, P., Jaglal, S. B., Riopelle, R. J., & Bingham, C. O. 3rd, et al. (2017) Montreal Accord on patient-reported outcomes use series-paper 3: Patient-reported outcomes can facilitate shared decision-making and guide self-management. Journal of Clinical Epidemiology, 89, 125–135CrossRefPubMedGoogle Scholar
  6. 6.
    Petrillo, J., Cano, S. J., McLeod, L. D., & Coon, C. D. (2015). Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value Health, 18, 25–34.CrossRefPubMedGoogle Scholar
  7. 7.
    Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194.CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36, 648–662.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care, 38, II28-42.CrossRefPubMedGoogle Scholar
  10. 10.
    Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research, 40, 1694–1711.CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 70, 1–17.PubMedPubMedCentralGoogle Scholar
  12. 12.
    Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., et al. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS(R)): Depression, anxiety, and anger. Assessment, 18, 263–283.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Christodoulou, C., Junghaenel, D. U., DeWalt, D. A., Rothrock, N., & Stone, A. A. (2008). Cognitive interviewing in the evaluation of fatigue items: Results from the patient-reported outcomes measurement information system (PROMIS). Quality of life research: An international journal of quality of life aspects of treatment. Care and Rehabilitation, 17, 1239–1246.Google Scholar
  14. 14.
    Noonan, V. K., Cook, K. F., Bamer, A. M., Choi, S. W., Kim, J., & Amtmann, D. (2012). Measuring fatigue in persons with multiple sclerosis: Creating a crosswalk between the Modified Fatigue Impact Scale and the PROMIS fatigue short form. Quality of life research: An international journal of quality of life aspects of treatment. Care and Rehabilitation, 21, 1123–1133.Google Scholar
  15. 15.
    Revicki, D. A., Chen, W. H., Harnam, N., Cook, K. F., Amtmann, D., Callahan, L. F., et al. (2009). Development and psychometric analysis of the PROMIS pain behavior item bank. Pain, 146, 158–169.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W. H., Choi, S., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150, 173–182.CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Fries, J. F., Cella, D., Rose, M., Krishnan, E., & Bruce, B. (2009). Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing. The Journal of Rheumatology, 36, 2061–2066.CrossRefPubMedGoogle Scholar
  18. 18.
    Flynn, K. E., Shelby, R. A., Mitchell, S. A., Fawzy, M. R., Hardy, N. C., Husain, A. M., et al. (2010). Sleep-wake functioning along the cancer continuum: Focus group results from the patient-reported outcomes measurement information system (PROMIS((R))). Psychooncology, 19, 1086–1093.CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Hahn, E. A., Devellis, R. F., Bode, R. K., Garcia, S. F., Castel, L. D., Eisen, S. V., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of life research: An international journal of quality of life aspects of treatment. Care and Rehabilitation, 19, 1035–1044.Google Scholar
  20. 20.
    Amtmann, D., Kim, J., Chung, H., Askew, R. L., Park, R., & Cook, K. F. (2016). Minimally important differences for patient reported outcomes measurement information system pain interference for individuals with back pain. Journal of Pain Research, 9, 251–255.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Pilkonis, P. A., Yu, L., Dodds, N. E., Johnston, K. L., Maihoefer, C. C., & Lawrence, S. M. (2014). Validation of the depression item bank from the patient-reported outcomes measurement information system (PROMIS) in a three-month observational study. Journal of Psychiatric Research, 56, 112–119.CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ware, J. E. Jr., Kosinski, M., Bjorner, J. B., Bayliss, M. S., Batenhorst, A., Dahlof, C. G., et al. (2003) Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality of life research: An international journal of quality of life aspects of treatment. Care and Rehabilitation, 12, 935–952.Google Scholar
  23. 23.
    Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1, 307–310.CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Michael A. Kallen
    • 1
    Email author
  • Karon F. Cook
    • 1
  • Dagmar Amtmann
    • 2
  • Elizabeth Knowlton
    • 1
  • Richard C. Gershon
    • 1
  1. 1.Department of Medical Social Sciences, Feinberg School of MedicineNorthwestern UniversityChicagoUSA
  2. 2.Department of Physical Medicine and Rehabilitation, School of MedicineUniversity of WashingtonSeattleUSA

Personalised recommendations