Skip to main content

Linking Scores with Patient-Reported Health Outcome Instruments:A VALIDATION STUDY AND COMPARISON OF THREE LINKING METHODS

Abstract

The psychometric process used to establish a relationship between the scores of two (or more) instruments is generically referred to as linking. When two instruments with the same content and statistical test specifications are linked, these instruments are said to be equated. Linking and equating procedures have long been used for practical benefit in educational testing. In recent years, health outcome researchers have increasingly applied linking techniques to patient-reported outcome (PRO) data. However, these applications have some noteworthy purposes and associated methodological questions. Purposes for linking health outcomes include the harmonization of data across studies or settings (enabling increased power in hypothesis testing), the aggregation of summed score data by means of score crosswalk tables, and score conversion in clinical settings where new instruments are introduced, but an interpretable connection to historical data is needed. When two PRO instruments are linked, assumptions for equating are typically not met and the extent to which those assumptions are violated becomes a decision point around how (and whether) to proceed with linking. We demonstrate multiple linking procedures—equipercentile, unidimensional IRT calibration, and calibrated projection—with the Patient-Reported Outcomes Measurement Information System Depression bank and the Patient Health Questionnaire-9. We validate this link across two samples and simulate different instrument correlation levels to provide guidance around which linking method is preferred. Finally, we discuss some remaining issues and directions for psychometric research in linking PRO instruments.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  • Ahmed, S., Berzon, R. A., Revicki, D. A., Lenderking, W. R., Moinpour, C. M., Basch, E., Reeve, B. B., Wu, A. W., & International Society for Quality of Life Research (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: implications for clinical practice and health care policy. Medical Care, 50(12), 1060–1070.

  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1–36.

    Article  Google Scholar 

  • Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173–182.

    PubMed  PubMed Central  Article  Google Scholar 

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.) Educational measurement. (2nd ed., pp. 508–600). Washington, DC: American Council on Education.

  • Askew, R. L., Kim, J., Chung, H., Cook, K. F., Johnson, K. L., & Amtmann, D. (2013). Development of a crosswalk for pain interference measured by the BPI and PROMIS pain interference short form. Quality of Life Research, 22(10), 2769–2776.

    PubMed  Article  Google Scholar 

  • Basch, E. (2014). New frontiers in patient-reported outcomes: Adverse event reporting, comparative effectiveness, and quality assessment. Annual Review of Medicine, 65, 307–317.

    PubMed  Article  Google Scholar 

  • Basch, E., Spertus, J., Dudley, R. A., Wu, A., Chuahan, C., Cohen, P., et al. (2015). Methods for developing patient-reported outcome-based performance measures (PRO-PMs). Value in Health, 18(4), 493–504.

    PubMed  Article  Google Scholar 

  • Baumhauer, J. F., & Bozic, K. J. (2016). Value-based healthcare: Patient-reported outcomes in clinical decision making. Clinical Orthopaedics and Related Research®, 474(6), 1375–1378.

    Article  Google Scholar 

  • Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135–160.

    PubMed  Article  Google Scholar 

  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444.

    Article  Google Scholar 

  • Brennan, R. (2004). Linking with Equivalent Group or Single Group Design (LEGS; Version 2.0)[Computer software]. Iowa City, IA: University of Iowa, Center for Advanced Studies in Measurement and Assessment (CASMA).

  • Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21(2), 230–258.

    Article  Google Scholar 

  • Bryant, D. U., Smith, A. K., Alexander, S. G., Vaughn, K., & Canali, K. G. (2005). Expected a posteriori estimation of multiple latent traits (518612013-445)

  • Buysse, D. J., Yu, L., Moul, D. E., Germain, A., Stover, A., Dodds, N. E., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep, 33(6), 781–792.

    PubMed  PubMed Central  Article  Google Scholar 

  • Cai, L. (2015). Lord–Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80(2), 535–559.

    PubMed  Article  Google Scholar 

  • Carstensen, B. (2010). Comparing methods of measurement: Extending the LoA by regression. Statistics in Medicine, 29(3), 401–410.

    PubMed  Article  Google Scholar 

  • Cella, D., Choi, S. W., Condon, D. M., Schalet, B., Hays, R. D., Rothrock, N. E., et al. (2019). PROMIS® adult health profiles: Efficient short-form measures of seven health domains. Value in Health, 22(5), 537–544.

    PubMed  PubMed Central  Article  Google Scholar 

  • Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194.

    PubMed  PubMed Central  Article  Google Scholar 

  • Cella, D., Schalet, B., Kallen, M., Lai, J.-S., Cook, K., Rutsohn, J., & Choi, S. (2016). PROSETTA stone analysis report: A rosetta stone for patient reported outcomes.

  • Cella, D., & Stone, A. A. (2015). Health-related quality of life measurement in oncology: Advances and opportunities. American Psychologist, 70(2), 175.

    Article  PubMed  Google Scholar 

  • Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH Roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3.

    PubMed  PubMed Central  Article  Google Scholar 

  • Chalmers, R.P. mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29 (2012).

  • Choi S, Lim S, Schalet B, Kaat A, & Cella, D. (2020). PROsetta: Linking Patient-Reported Outcomes Measures. R package version 0.2.0, https://cran.r-project.org/package=PROsetta

  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1.

    PubMed  PubMed Central  Article  Google Scholar 

  • Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological Assessment, 26(2), 513.

    PubMed  PubMed Central  Article  Google Scholar 

  • Cleeland, C. S., Gonin, R., Hatfield, A. K., Edmonson, J. H., Blum, R. H., Stewart, J. A., et al. (1994). Pain and its treatment in outpatients with metastatic cancer. New England Journal of Medicine, 330(9), 592–596.

    Article  PubMed  Google Scholar 

  • Cook, K. F., Schalet, B. D., Kallen, M. A., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric. Quality of Life Research, 24(10), 2305–2318.

    PubMed  PubMed Central  Article  Google Scholar 

  • Coster, W. J., Ni, P., Slavin, M. D., Kisala, P. A., Nandakumar, R., Mulcahey, M. J., et al. (2016). Differential item functioning in the patient reported outcomes measurement information system pediatric short forms in a sample of children and adolescents with cerebral palsy. Developmental Medicine and Child Neurology, 58(11), 1132–1138.

    PubMed  PubMed Central  Article  Google Scholar 

  • Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14(2), 81–100. https://doi.org/10.1037/a0015914.

    Article  PubMed  PubMed Central  Google Scholar 

  • De Vet, H. C., Terwee, C. B., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine: A practical guide. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246.

    Article  Google Scholar 

  • Dorans, N. J. (2007). Linking scores from multiple health outcome instruments. Quality of Life Research, 16(1), 85–94.

    PubMed  Article  Google Scholar 

  • Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equatability of tests: Basic theory and the linear case. ETS Research Report Series, 2000(2), i–35.

    Article  Google Scholar 

  • Dorans, N. J., Lyu, C. F., Pommerich, M., & Houston, W. M. (1997). Concordance between ACT assessment and recentered SAT I sum scores. College and University, 73(2), 24–32.

    Google Scholar 

  • Fischer, H. F., & Rose, M. (2019). Scoring depression on a common metric: A comparison of EAP estimation, plausible value imputation, and full Bayesian IRT modeling. Multivariate Behavioral Research, 54(1), 85–99.

    PubMed  Article  Google Scholar 

  • Fischer, H. F., Wahl, I., Fliege, H., Klapp, B. F., & Rose, M. (2012). Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. Medical Care, 50(4), 320–326.

    PubMed  Article  Google Scholar 

  • Gershon, R. C., Lai, J. S., Bode, R., Choi, S., Moy, C., Bleck, T., et al. (2012). Neuro-QOL: Quality of life item banks for adults with neurological disorders: item development and calibrations based upon clinical and general population testing. Quality of Life Research, 21(3), 475–486.

    PubMed  Article  Google Scholar 

  • Gottfredson, N. C., Cole, V. T., Giordano, M. L., Bauer, D. J., Hussong, A. M., & Ennett, S. T. (2019). Simplifying the implementation of modern scale scoring methods with an automated R package: Automated moderated nonlinear factor analysis (aMNLFA). Addictive Behaviors, 94, 65–73.

    PubMed  Article  Google Scholar 

  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149.

    Article  Google Scholar 

  • Hahn, E. A., DeWalt, D. A., Bode, R. K., Garcia, S. F., DeVellis, R. F., Correia, H., et al. (2014). New English and Spanish social health measures will facilitate evaluating health determinants. Health Psychology, 33(5), 490.

    PubMed  PubMed Central  Article  Google Scholar 

  • Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. G., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS® smoking item banks. Nicotine and Tobacco Research, 16(Suppl 3), S175–S189.

    PubMed  Article  Google Scholar 

  • Hanson, B. A., Zeng, L., & Colton, D. A. (1994). A comparison of presmoothing and postsmoothing methods in equipercentile equating (Vol. 94). New York: American College Testing Program.

    Google Scholar 

  • Hays, R. D., Brodsky, M., Johnston, M. F., Spritzer, K. L., & Hui, K.-K. (2005). Evaluating the statistical significance of health-related quality-of-life change in individual patients. Evaluation and the Health Professions, 28(2), 160–171.

    PubMed  Article  Google Scholar 

  • Hays, R. D., Liu, H., & Kapteyn, A. (2015). Use of Internet panels to conduct surveys. Behavior Research Methods, 47(3), 685–690.

    PubMed  PubMed Central  Article  Google Scholar 

  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. Educational Measurement, 4, 187–220.

    Google Scholar 

  • Hu, L.-T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3(4), 424.

    Article  Google Scholar 

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

    Article  Google Scholar 

  • Hussong, A. M., Gottfredson, N. C., Bauer, D. J., Curran, P. J., Haroon, M., Chandler, R., et al. (2019). Approaches for creating comparable measures of alcohol use symptoms: Harmonization with eight studies of criminal justice populations. Drug and Alcohol Dependence, 194, 59–68. https://doi.org/10.1016/j.drugalcdep.2018.10.003.

    Article  PubMed  Google Scholar 

  • Jensen, R. E., Moinpour, C. M., Potosky, A. L., Lobo, T., Hahn, E. A., Hays, R. D. et al. (2017). Responsiveness of 8 Patient-Reported Outcomes Measurement Information System (PROMIS) measures in a large, community-based cancer study cohort. Cancer, 123(2), 327–335.

  • Kaat, A. J., Kallen, M. A., Nowinski, C. J., Sterling, S. A., Westbrook, S. R., & Peters, J. T. (2020). PROMIS® pediatric depressive symptoms as a harmonized score metric. Journal of Pediatric Psychology, 45(3), 271–280.

    PubMed  Article  Google Scholar 

  • Kaat, A. J., Newcomb, M. E., Ryan, D. T., & Mustanski, B. (2017). Expanding a common metric for depression reporting: linking two scales to PROMIS® depression. Quality of Life Research, 26(5), 1119–1128

  • Kang, T., & Petersen, N. S. (2012). Linking item parameters to a base scale. Asia Pacific Education Review, 13(2), 311–321.

    Article  Google Scholar 

  • Katzan, I. L., Fan, Y., Griffith, S. D., Crane, P. K., Thompson, N. R., & Cella, D. (2017). Scale linking to enable patient-reported outcome performance measures assessed with different patient-reported outcome measures. Value in Health, 20(8), 1143–1149.

    PubMed  Article  Google Scholar 

  • Kim, J., Chung, H., Askew, R. L., Park, R., Jones, S. M., Cook, K. F., & Amtmann, D. (2015). Translating CESD-20 and PHQ-9 scores to PROMIS depression. Assessment, 1073191115607042.

  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355–381.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. Berlin: Springer.

    Book  Google Scholar 

  • Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613.

    PubMed  PubMed Central  Article  Google Scholar 

  • Kroenke, K., Spitzer, R. L., Williams, J. B., & Löwe, B. (2010). The patient health questionnaire somatic, anxiety, and depressive symptom scales: A systematic review. General Hospital Psychiatry, 32(4), 345–359.

    PubMed  Article  Google Scholar 

  • Lai, J.-S., Cella, D., Yanez, B., & Stone, A. (2014). Linking fatigue measures on a common reporting metric. Journal of Pain and Symptom Management, 48(4), 639–648.

    PubMed  PubMed Central  Article  Google Scholar 

  • Lee, W. C., & Lee, G. (2018). IRT linking and equating (pp. 639–673). The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development.

  • Liegl, G., Wahl, I., Berghöfer, A., Nolte, S., Pieh, C., Rose, M., et al. (2016). Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation. Journal of Clinical Epidemiology, 71, 25–34.

    PubMed  Article  Google Scholar 

  • Liu, H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology, 63(11), 1169–1178.

    PubMed  PubMed Central  Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. London: Routledge.

    Google Scholar 

  • Lord, F. M. (1982). The standard error of equipercentile equating. Journal of Educational Statistics, 7(3), 165–174.

    Article  Google Scholar 

  • Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8(4), 453–461.

    Article  Google Scholar 

  • Lucke JF (2015). Unipolar item response models. In Reise SP & Revicki DA (Eds.), Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment (pp. 272–284). New York, NY: Routledge/Taylor & Francis Group.

  • Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., et al. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Quality of Life Research, 19(4), 539–549.

    PubMed  PubMed Central  Article  Google Scholar 

  • McHugh, R. K., Rasmussen, J. L., & Otto, M. W. (2011). Comprehension of self-report evidence-based measures of anxiety. Depression and Anxiety, 28(7), 607–614.

  • Park, T., Reilly-Spong, M., & Gross, C. R. (2013). Mindfulness: A systematic review of instruments to measure an emergent patient-reported outcome (PRO). Quality of Life Research, 22(10), 2639–2659.

    PubMed  Article  Google Scholar 

  • Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS®): Depression, anxiety, and anger. Assessment, 18(3), 263–283.

    PubMed  PubMed Central  Article  Google Scholar 

  • Pilkonis, P. A., Choi, S. W., Salsman, J. M., Butt, Z., Moore, T. L., Lawrence, S. M., et al. (2013). Assessment of self-reported negative affect in the NIH Toolbox. Psychiatry Research, 206(1), 88–97.

    PubMed  Article  Google Scholar 

  • Pilkonis, P. A., Yu, L., Dodds, N. E., Johnston, K. L., Maihoefer, C. C., & Lawrence, S. M. (2014). Validation of the depression item bank from the patient-reported outcomes measurement information system (PROMIS®) in a three-month observational study. Journal of Psychiatric Research, 56, 112–119.

    PubMed  PubMed Central  Article  Google Scholar 

  • Purvis, T. E., Neuman, B. J., Riley, L. H, I. I. I., & Skolasky, R. L. (2018). Discriminant ability, concurrent validity, and responsiveness of PROMIS health domains among patients with lumbar degenerative disease undergoing decompression with or without arthrodesis. Spine, 43(21), 1512–1520.

    PubMed  Article  Google Scholar 

  • Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5), S22–S31.

    PubMed  Article  Google Scholar 

  • Reeve, B. B., Thissen, D., DeWalt, D. A., Huang, I.-C., Liu, Y., Magnus, B., et al. (2016). Linkage between the PROMIS®pediatric and adult emotional distress measures. Quality of Life Research, 25(4), 823–833.

    PubMed  Article  Google Scholar 

  • Reinsch, C. H. (1967). Smoothing by spline functions. Numerische mathematik, 10(3), 177–183.

    Article  Google Scholar 

  • Reise, S. P., Moore, T. M., & Haviland, M. G. (2013). Applying unidimensional item response theory models to psychological data. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbooks in psychology®. APA handbook of testing and assessment in psychology, Vol. 1. Test theory and testing and assessment in industrial and organizational psychology (p. 101–119). American Psychological Association.

  • Reise, S. P., Rodriguez, A., Spritzer, K. L., & Hays, R. D. (2018). Alternative approaches to addressing non-normal distributions in the application of IRT models to personality measures. Journal of Personality Assessment, 100(4), 363–374.

    PubMed  Article  Google Scholar 

  • Revicki, D., Hays, R. D., Cella, D., & Sloan, J. (2008). Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. Journal of Clinical Epidemiology, 61(2), 102–109.

    PubMed  Article  Google Scholar 

  • Rose, J. S., Dierker, L. C., Hedeker, D., & Mermelstein, R. (2013). An integrated data analysis approach to investigating measurement equivalence of DSM nicotine dependence symptoms. Drug and Alcohol Dependence, 129(1–2), 25–32.

    PubMed  Article  Google Scholar 

  • Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E. (2014). The PROMIS physical function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology, 67(5), 516–526.

    PubMed  PubMed Central  Article  Google Scholar 

  • Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA). Journal of Statistical Software, 48(2), 1–36.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. (Psychometrika Monograph Supplement No. 17) Richmond, VA Psychometrics Society.

  • Schalet, B. D., Cook, K. F., Choi, S. W., & Cella, D. (2014). Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of Anxiety Disorders, 28(1), 88–96.

    PubMed  Article  Google Scholar 

  • Schalet, B. D., Janulis, P., Kipke, M. D., Mustanski, B., Shoptaw, S., Moore, R., et al. (2020). Psychometric Data Linking Across HIV and Substance Use Cohorts. AIDS and Behavior, 24, 3215–3224.

  • Segawa, E., Schalet, B., & Cella, D. (2020). A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Quality of Life Research, 29(1), 213–221.

    PubMed  Article  Google Scholar 

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.

    Article  Google Scholar 

  • ten Klooster, P. M., Voshaar, M. A. O., Gandek, B., Rose, M., Bjorner, J. B., Taal, E., et al. (2013). Development and evaluation of a crosswalk between the SF-36 physical functioning scale and Health Assessment Questionnaire disability index in rheumatoid arthritis. Health and Quality of Life Outcomes, 11(1), 1.

    Article  Google Scholar 

  • Thissen D., Liu Y., Magnus B., Quinn H. (2015) Extending the Use of Multidimensional IRT Calibration as Projection: Many-to-One Linking and Linear Computation of Projected Scores. In van der Ark L., Bolt D., Wang WC., Douglas J., Chow SM. (Eds.), Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 140 (pp 1–16). Springer, Cham.

  • Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19(1), 39–49.

    Article  Google Scholar 

  • Thissen, D., Varni, J. W., Stucky, B. D., Liu, Y., Irwin, D. E., & DeWalt, D. A. (2011). Using the PedsQL™3.0 asthma module to obtain scores comparable with those of the PROMIS pediatric asthma impact scale (PAIS). Quality of Life Research, 20(9), 1497–1505.

    PubMed  PubMed Central  Article  Google Scholar 

  • Tomitaka, S., Kawasaki, Y., Ide, K., Akutagawa, M., Ono, Y., & Furukawa, T. A. (2019). Distribution of psychological distress is stable in recent decades and follows an exponential pattern in the US population. Scientific Reports, 9(1), 1–10.

    Google Scholar 

  • Tuck, N. L., Johnson, M. H., & Bean, D. J. (2019). You’d better believe it: The conceptual and practical challenges of assessing malingering in patients with chronic pain. The Journal of Pain, 20(2), 133–145.

    PubMed  Article  Google Scholar 

  • Tulsky, D. S., Kisala, P. A., Boulton, A. J., Jette, A. M., Thissen, D., Ni, P., et al. (2019). Determining a transitional scoring link between PROMIS® pediatric and adult physical health measures. Quality of Life Research, 28(5), 1217–1229.

    PubMed  Article  Google Scholar 

  • Uijen, A. A., Heinst, C. W., Schellevis, F. G., van den Bosch, W. J., van de Laar, F. A., Terwee, C. B., et al. (2012). Measurement properties of questionnaires measuring continuity of care: A systematic review. PloS One, 7(7), e42256.

    PubMed  PubMed Central  Article  Google Scholar 

  • Victorson, D., Schalet, B. D., Kundu, S., Helfand, B. T., Novakovic, K., Penedo, F., et al. (2019). Establishing a common metric for self-reported anxiety in patients with prostate cancer: Linking the Memorial Anxiety Scale for Prostate Cancer with PROMIS Anxiety. Cancer, 125(18), 3249–3258.

    PubMed  Article  Google Scholar 

  • von Davier, M., Yamamoto, K., Shin, H. J., Chen, H., Khorramdel, L., Weeks, J., et al. (2019). Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assessment in Education: Principles, Policy and Practice, 26(4), 466–488.

    Google Scholar 

  • Voshaar, M. O., Vonkeman, H., Courvoisier, D., Finckh, A., Gossec, L., Leung, Y., et al. (2019). Towards standardized patient reported physical function outcome reporting: Linking ten commonly used questionnaires to a common metric. Quality of Life Research, 28(1), 187–197.

    Article  Google Scholar 

  • Wall, M. M., Park, J. Y., & Moustaki, I. (2015). IRT modeling in the presence of zero-inflation with application to psychiatric disorder severity. Applied Psychological Measurement, 39(8), 583–597.

    PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We wish to clarify that Seung W. Choi served as the senior author on this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin D. Schalet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schalet, B.D., Lim, S., Cella, D. et al. Linking Scores with Patient-Reported Health Outcome Instruments:A VALIDATION STUDY AND COMPARISON OF THREE LINKING METHODS. Psychometrika 86, 717–746 (2021). https://doi.org/10.1007/s11336-021-09776-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09776-z

Keywords

  • patient-reported outcomes
  • linking
  • scale alignment
  • depression
  • PROMIS
  • PHQ-9
  • calibrated projection