Skip to main content

Modeling strategies to improve parameter estimates in prognostic factors analyses with patient-reported outcomes in oncology

Abstract

Purpose

The inclusion of patient-reported outcome (PRO) questionnaires in prognostic factor analyses in oncology has substantially increased in recent years. We performed a simulation study to compare the performances of four different modeling strategies in estimating the prognostic impact of multiple collinear scales from PRO questionnaires.

Methods

We generated multiple scenarios describing survival data with different sample sizes, event rates and degrees of multicollinearity among five PRO scales. We used the Cox proportional hazards (PH) model to estimate the hazard ratios (HR) using automatic selection procedures, which were based on either the likelihood ratio-test (Cox-PV) or the Akaike Information Criterion (Cox-AIC). We also used Cox PH models which included all variables and were either penalized using the Ridge regression (Cox-R) or were estimated as usual (Cox-Full). For each scenario, we simulated 1000 independent datasets and compared the average outcomes of all methods.

Results

The Cox-R showed similar or better performances with respect to the other methods, particularly in scenarios with medium–high multicollinearity (ρ = 0.4 to ρ = 0.8) and small sample sizes (n = 100). Overall, the Cox-PV and Cox-AIC performed worse, for example they did not select one or more prognostic collinear PRO scales in some scenarios. Compared with the Cox-Full, the Cox-R provided HR estimates with similar bias patterns but smaller root-mean-squared errors, particularly in higher multicollinearity scenarios.

Conclusions

Our findings suggest that the Cox-R is the best approach when performing prognostic factor analyses with multiple and collinear PRO scales, particularly in situations of high multicollinearity, small sample sizes and low event rates.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Gotay, C. C., Kawamoto, C. T., Bottomley, A., & Efficace, F. (2008). The prognostic significance of patient-reported outcomes in cancer clinical trials. Journal of Clinical Oncology, 26(8), 1355–1363.

    Article  Google Scholar 

  2. Secord, A. A., Coleman, R. L., Havrilesky, L. J., Abernethy, A. P., Samsa, G. P., & CELLA, D. (2015). Patient-reported outcomes as end points and outcome indicators in solid tumours. Nature Reviews Clinical oncology, 12(6), 358–370.

    Article  Google Scholar 

  3. Efficace, F., Gaidano, G., Breccia, M., Voso, M. T., Cottone, F., Angelucci, E., et al. (2015). Prognostic value of self-reported fatigue on overall survival in patients with myelodysplastic syndromes: A multicentre, prospective, observational, cohort study. The Lancet Oncology, 16(15), 1506–1514.

    Article  Google Scholar 

  4. Efficace, F., Bottomley, A., Coens, C., Van Steen, K., Conroy, T., Schoffski, P., et al. (2006). Does a patient’s self-reported health-related quality of life predict survival beyond key biomedical data in advanced colorectal cancer? European Journal of Cancer, 42(1), 42–49.

    Article  Google Scholar 

  5. Quinten, C., Martinelli, F., Coens, C., Sprangers, M. A., Ringash, J., Gotay, C., et al. (2014). A global analysis of multitrial data investigating quality of life and symptoms as prognostic factors for survival in different tumor sites. Cancer, 120(2), 302–311.

    Article  Google Scholar 

  6. Efficace, F., Biganzoli, L., Piccart, M., Coens, C., Van Steen, K., Cufer, T., et al. (2004). Baseline health-related quality-of-life data as prognostic factors in a phase III multicentre study of women with metastatic breast cancer. European Journal of Cancer, 40(7), 1021–1030.

    CAS  Article  Google Scholar 

  7. Maisey, N. R., Norman, A., Watson, M., Allen, M. J., Hill, M. E., & Cunningham, D. (2002). Baseline quality of life predicts survival in patients with advanced colorectal cancer. European Journal of Cancer, 38(10), 1351–1357.

    CAS  Article  Google Scholar 

  8. Efficace, F., Innominato, P. F., Bjarnason, G., Coens, C., Humblet, Y., Tumolo, S., et al. (2008). Validation of patient’s self-reported social functioning as an independent prognostic factor for survival in metastatic colorectal cancer patients: results of an international study by the Chronotherapy Group of the European Organisation for Research and Treatment of Cancer. Journal of Clinical Oncology, 26(12), 2020–2026.

    Article  Google Scholar 

  9. Fang, F. M., Tsai, W. L., Chiu, H. C., Kuo, W. R., & Hsiung, C. Y. (2004). Quality of life as a survival predictor for esophageal squamous cell carcinoma treated with radiotherapy. International Journal of Radiation Oncology, Biology, Physics, 58(5), 1394–1404.

    Article  Google Scholar 

  10. Chau, I., Norman, A. R., Cunningham, D., Waters, J. S., Oates, J., & Ross, P. J. (2004). Multivariate prognostic factor analysis in locally advanced and metastatic esophago-gastric cancer–pooled analysis from three multicenter, randomized, controlled trials using individual patient data. Journal of Clinical Oncology, 22(12), 2395–2403.

    Article  Google Scholar 

  11. de Graeff, A., de Leeuw, J. R., Ros, W. J., Hordijk, G. J., Blijham, G. H., & Winnubst, J. A. (2001). Sociodemographic factors and quality of life as prognostic indicators in head and neck cancer. European Journal of Cancer, 37(3), 332–339.

    Article  Google Scholar 

  12. Chiarion-Sileni, V., Del Bianco, P., De Salvo, G. L., Lo Re, G., Romanini, A., Labianca, R., et al. (2003). Quality of life evaluation in a randomised trial of chemotherapy versus bio-chemotherapy in advanced melanoma patients. European Journal of Cancer, 39(11), 1577–1585.

    CAS  Article  Google Scholar 

  13. Dubois, D., Dhawan, R., van de Velde, H., Esseltine, D., Gupta, S., Viala, M., et al. (2006). Descriptive and prognostic value of patient-reported outcomes: the bortezomib experience in relapsed and refractory multiple myeloma. Journal of Clinical Oncology, 24(6), 976–982.

    CAS  Article  Google Scholar 

  14. Eton, D. T., Fairclough, D. L., Cella, D., Yount, S. E., Bonomi, P., & Johnson, D. H. (2003). Early change in patient-reported health during lung cancer chemotherapy predicts clinical outcomes beyond those predicted by baseline report: Results from Eastern Cooperative Oncology Group Study 5592. Journal of Clinical Oncology, 21(8), 1536–1543.

    Article  Google Scholar 

  15. Bottomley, A., Coens, C., Efficace, F., Gaafar, R., Manegold, C., Burgers, S., et al. (2007). Symptoms and patient-reported well-being: Do they predict survival in malignant pleural mesothelioma? A prognostic factor analysis of EORTC-NCIC 08983: Randomized phase III study of cisplatin with or without raltitrexed in patients with malignant pleural mesothelioma. Journal of Clinical Oncology, 25(36), 5770–5776.

    CAS  Article  Google Scholar 

  16. Cella, D., Traina, S., Li, T., Johnson, K., Ho, K. F., Molina, A., et al. (2018). Relationship between patient-reported outcomes and clinical outcomes in metastatic castration-resistant prostate cancer: post hoc analysis of COU-AA-301 and COU-AA-302. Annals of Oncology, 29(2), 392–397.

    CAS  Article  Google Scholar 

  17. Movsas, B., Hu, C., Sloan, J., Bradley, J., Komaki, R., Masters, G., et al. (2016). Quality of life analysis of a radiation dose-escalation study of patients with non-small-cell lung cancer: A secondary analysis of the radiation therapy oncology group 0617 randomized clinical trial. JAMA Oncology, 2(3), 359–367.

    Article  Google Scholar 

  18. Mauer, M., Bottomley, A., Coens, C., & Gotay, C. (2008). Prognostic factor analysis of health-related quality of life data in cancer: A statistical methodological evaluation. Expert Review of Pharmacoeconomics & Outcomes Research, 8(2), 179–196.

    Article  Google Scholar 

  19. Van Steen, K., Curran, D., Kramer, J., Molenberghs, G., Van Vreckem, A., Bottomley, A., et al. (2002). Multicollinearity in prognostic factor analyses using the EORTC QLQ-C30: identification and impact on model selection. Statistics in Medicine, 21(24), 3865–3884.

    Article  Google Scholar 

  20. Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., et al. (1993). The european organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85(5), 365–376.

    CAS  Article  Google Scholar 

  21. Cramer, E. M. (1985). Multicollinearity. In S. Kotz, N. L. Johnson & C. B. Read (Eds.), Encyclopedia of statistical sciences. (Vol. 2, pp. 639–643). New York, Wiley.

    Google Scholar 

  22. Slinker, B. K., & Glantz, S. A. (1985). Multiple regression for physiological data analysis: The problem of multicollinearity. The American Journal of Physiology, 249(1 Pt 2), R1–R12.

    CAS  PubMed  Google Scholar 

  23. Sithisarankul, P., Weaver, V. M., Diener-West, M., & Strickland, P. T. (1997). Multicollinearity may lead to artificial interaction: An example from a cross sectional study of biomarkers. The Southeast Asian Journal of Tropical Medicine and Public Health, 28(2), 404–409.

    CAS  PubMed  Google Scholar 

  24. Ediebah, D. E., Coens, C., Zikos, E., Quinten, C., Ringash, J., King, M. T., et al. (2014). Does change in health-related quality of life score predict survival? Analysis of EORTC 08975 lung cancer trial. British Journal of Cancer, 110(10), 2427–2433.

    CAS  Article  Google Scholar 

  25. Staren, E. D., Gupta, D., & Braun, D. P. (2011). The prognostic role of quality of life assessment in breast cancer. The Breast Journal, 17(6), 571–578.

    Article  Google Scholar 

  26. Harrell, f. e. jr., Lee, K. L., Matchar, D. B., & Reichert, T. A. (1985). Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Cancer Treatment Reports, 69(10), 1071–1077.

    PubMed  Google Scholar 

  27. Harrell, F. E. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Cham: Springer.

    Book  Google Scholar 

  28. Simon, R., & Altman, D. G. (1994). Statistical aspects of prognostic factor studies in oncology. British journal of cancer, 69(6), 979–985.

    CAS  Article  Google Scholar 

  29. Cohen, J. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah: Lawrence Erlbaum Associates Publishers.

    Google Scholar 

  30. Hoerl, A. E., & Kennard, R. W. (2000). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1), 80–86.

    Article  Google Scholar 

  31. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, F. Csaki (Ed.), Second international symposium on information theory (pp. 267–281): Budapest: Akademai Kiado.

    Google Scholar 

  32. Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.

    Book  Google Scholar 

  33. Fayers, P., Aaronson, N. K., Bjordal, K., Groenvold, M., Curran, D., & Bottomley, A. on behalf of the EORTC Quality of Life Group. (2001). The EORTC QLQ-C30 Scoring Manual (3rd Edn). European Organisation for Research and Treatment of Cancer, Brussels.

  34. Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

    Article  Google Scholar 

  35. Lee, E. T., & Go, O. T. (1997). Survival analysis in public health research. Annual Review of Public Health, 18, 105–134.

    CAS  Article  Google Scholar 

  36. Bender, R., Augustin, T., & Blettner, M. (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24(11), 1713–1723.

    Article  Google Scholar 

  37. Altman, D. G., & Andersen, P. K. (1989). Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine, 8(7), 771–783.

    CAS  Article  Google Scholar 

  38. Sauerbrei, W., Boulesteix, A. L., & Binder, H. (2011). Stability investigations of multivariable regression models derived from low- and high-dimensional data. Journal of Biopharmaceutical Statistics, 21(6), 1206–1231.

    Article  Google Scholar 

  39. Efron, B. (1977). The efficiency of Cox’s likelihood function for censored data. Journal of the American Statistical Association, 72, 557–565.

    Article  Google Scholar 

  40. Team, R. C. (2016). R: A language and environment for statistical computing. https://www.R-project.org/.

  41. Morozova, O., Levina, O., Uuskula, A., & Heimer, R. (2015). Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia. BMC Medical Research Methodology, 15, 71.

    Article  Google Scholar 

  42. Steyerberg, E. W., Eijkemans, M. J., Harrell, F. E. Jr., & Habbema, J. D. (2000). Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Statistics in Medicine, 19(8), 1059–1079.

    CAS  Article  Google Scholar 

  43. Yoo, W., Mayberry, R., Bae, S., Singh, K., He, P., Q., & Lillard, J. W. Jr. (2014). A study of effects of multicollinearity in the multivariable analysis. International Journal of Applied Science and Technology, 4(5), 9–19.

    PubMed  PubMed Central  Google Scholar 

  44. Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., et al. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46.

    Article  Google Scholar 

  45. Xue, X., Kim, M. Y., & Shore, R. E. (2007). Cox regression analysis in presence of collinearity: An application to assessment of health risks associated with occupational radiation exposure. Lifetime Data Analysis, 13(3), 333–350.

    Article  Google Scholar 

  46. Sauerbrei, W., & Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistics in Medicine, 11(16), 2093–2109.

    CAS  Article  Google Scholar 

  47. Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373–1379.

    CAS  Article  Google Scholar 

  48. Harrell, F. E. Jr., Lee, K. L., Califf, R. M., Pryor, D. B., & Rosati, R. A. (1984). Regression modelling strategies for improved prognostic prediction. Statistics in Medicine, 3(2), 143–152.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

FC, FE: Conception and design, FC, ND, FE: Statistical analyses, all authors: Interpretation of results, all authors: Manuscript writing.

Corresponding author

Correspondence to Francesco Cottone.

Ethics declarations

Conflict of interest

No potential conflict of interest for this paper was reported by the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1541 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cottone, F., Deliu, N., Collins, G.S. et al. Modeling strategies to improve parameter estimates in prognostic factors analyses with patient-reported outcomes in oncology. Qual Life Res 28, 1315–1325 (2019). https://doi.org/10.1007/s11136-018-02097-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-018-02097-2

Keywords

  • Health-related quality of life
  • Multicollinearity
  • Patient-reported outcomes
  • Prognostic factor analysis
  • Ridge regression