Skip to main content

Using big data to retrospectively validate the COMPASS-CAT risk assessment model: considerations on methodology


External validation is a prerequisite in order for a prediction model to be introduced into clinical practice. Nonetheless, methodologically intact external validation studies are a scarce finding. Utilization of big datasets can help overcome several causes of methodological failure. However, transparent reporting is needed to standardize the methods, assess the risk of bias and synthesize multiple validation studies in order to infer model generalizability. We describe the methodological challenges faced when using multiple big datasets to perform the first retrospective external validation study of the Prospective Comparison of Methods for thromboembolic risk assessment with clinical Perceptions and AwareneSS in real life patients-Cancer Associated Thrombosis (COMPASS-CAT) Risk Assessment Model for predicting venous thromboembolism in patients with cancer. The challenges included choosing the starting point, defining time sensitive variables that serve both as risk factors and outcome variables and using non-research oriented databases to form validated definitions from administrative codes. We also present the structured plan we used so as to overcome those obstacles and reduce bias with the target of producing an external validation study that successfully complies with prediction model reporting guidelines.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. 1.

    Timp JF et al (2013) Epidemiology of cancer-associated venous thrombosis. Blood 122(10):1712–1723

    CAS  Article  Google Scholar 

  2. 2.

    Bick RL (2003) Cancer-associated thrombosis. New Engl J Med 349(2):109–111

    CAS  Article  Google Scholar 

  3. 3.

    Khorana AA et al (2007) Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J Thromb Haemost 5(3):632–634

    CAS  Article  Google Scholar 

  4. 4.

    Gerotziafas GT et al (2017) A predictive score for thrombosis associated with breast, colorectal, lung, or ovarian cancer: the prospective COMPASS–Cancer-Associated Thrombosis Study. Oncologist 22(10):1222–1231

    CAS  Article  Google Scholar 

  5. 5.

    Di Nisio M et al (2012) Primary prophylaxis for venous thromboembolism in ambulatory cancer patients receiving chemotherapy. Cochrane Database Syst Rev 2(2):CD008500

    Google Scholar 

  6. 6.

    Anand LN et al (2019) External validation of the COMPASS-Cancer Associated Thrombosis Study: a predictive score to identify patients with solid tumors on treatment who are at risk for venous thromboembolism. J Clin Oncol.

  7. 7.

    Collins GS et al (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med 13(1):1

    Article  Google Scholar 

  8. 8.

    Tamariz L, Harkins T, Nair V (2012) A systematic review of validated methods for identifying venous thromboembolism using administrative and claims data. Pharmacoepidemiol Drug Saf 21:154–162

    Article  Google Scholar 

  9. 9.

    Birman-Deych E et al (2005) Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care 43:480–485

    Article  Google Scholar 

  10. 10.

    Hippisley-Cox J, Coupland C (2011) Development and validation of risk prediction algorithm (QThrombosis) to estimate future risk of venous thromboembolism: prospective cohort study. BMJ 343:d4656

    Article  Google Scholar 

  11. 11.

    Lidegaard Ø et al (2009) Hormonal contraception and risk of venous thromboembolism: national follow-up study. BMJ 339:b2890

    Article  Google Scholar 

  12. 12.

    Ammann EM et al (2018) Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf 27(10):1092–1100

    Article  Google Scholar 

  13. 13.

    Riley RD et al (2016) External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 353:i3140

    Article  Google Scholar 

  14. 14.

    Collins GS, Ogundimu EO, Altman DG (2016) Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 35(2):214–226

    Article  Google Scholar 

  15. 15.

    Altman DG, Royston P (2000) What do we mean by validating a prognostic model? Stat Med 19(4):453–473

    CAS  Article  Google Scholar 

  16. 16.

    Pavlou M (2015) How to develop a more accurate risk prediction model when there are few events. BMJ 11(351):h3868

    Article  Google Scholar 

  17. 17.

    Riley RD et al (2019) Minimum sample size for developing a multivariable prediction model: PART II-binary and time-to-event outcomes. Stat Med 38(7):1276–1296

    Article  Google Scholar 

  18. 18.

    McGinn TG et al (2000) Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. JAMA 284(1):79–84

    CAS  Article  Google Scholar 

  19. 19.

    Moons KG et al (2019) PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 170(1):W1–W33

    Article  Google Scholar 

  20. 20.

    Wolff RF et al (2019) PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 170(1):51–58

    Article  Google Scholar 

  21. 21.

    Collins GS et al (2014) External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol 14:40

    Article  Google Scholar 

Download references


This research received no specific Grant from any funding agency in the public, commercial or not-for-profit sectors.

Author information




The paper was conceived by ACS and SN and written by IN, ACS and SN. JBE, LA, MZ, MQ critically reviewed the manuscript at all stages of preparation and approved the final draft submitted.

Corresponding author

Correspondence to Alex C. Spyropoulos.

Ethics declarations

Conflict of interest

All authors report no relevant disclosures.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

David Rosenberg—deceased.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nikolakopoulos, I., Nourabadi, S., Eldredge, J.B. et al. Using big data to retrospectively validate the COMPASS-CAT risk assessment model: considerations on methodology. J Thromb Thrombolysis 51, 12–16 (2021).

Download citation


  • External validation
  • Risk models
  • Cancer
  • Venous thromboembolism