External validation is a prerequisite in order for a prediction model to be introduced into clinical practice. Nonetheless, methodologically intact external validation studies are a scarce finding. Utilization of big datasets can help overcome several causes of methodological failure. However, transparent reporting is needed to standardize the methods, assess the risk of bias and synthesize multiple validation studies in order to infer model generalizability. We describe the methodological challenges faced when using multiple big datasets to perform the first retrospective external validation study of the Prospective Comparison of Methods for thromboembolic risk assessment with clinical Perceptions and AwareneSS in real life patients-Cancer Associated Thrombosis (COMPASS-CAT) Risk Assessment Model for predicting venous thromboembolism in patients with cancer. The challenges included choosing the starting point, defining time sensitive variables that serve both as risk factors and outcome variables and using non-research oriented databases to form validated definitions from administrative codes. We also present the structured plan we used so as to overcome those obstacles and reduce bias with the target of producing an external validation study that successfully complies with prediction model reporting guidelines.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Timp JF et al (2013) Epidemiology of cancer-associated venous thrombosis. Blood 122(10):1712–1723
Bick RL (2003) Cancer-associated thrombosis. New Engl J Med 349(2):109–111
Khorana AA et al (2007) Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J Thromb Haemost 5(3):632–634
Gerotziafas GT et al (2017) A predictive score for thrombosis associated with breast, colorectal, lung, or ovarian cancer: the prospective COMPASS–Cancer-Associated Thrombosis Study. Oncologist 22(10):1222–1231
Di Nisio M et al (2012) Primary prophylaxis for venous thromboembolism in ambulatory cancer patients receiving chemotherapy. Cochrane Database Syst Rev 2(2):CD008500
Anand LN et al (2019) External validation of the COMPASS-Cancer Associated Thrombosis Study: a predictive score to identify patients with solid tumors on treatment who are at risk for venous thromboembolism. J Clin Oncol.
Collins GS et al (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med 13(1):1
Tamariz L, Harkins T, Nair V (2012) A systematic review of validated methods for identifying venous thromboembolism using administrative and claims data. Pharmacoepidemiol Drug Saf 21:154–162
Birman-Deych E et al (2005) Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care 43:480–485
Hippisley-Cox J, Coupland C (2011) Development and validation of risk prediction algorithm (QThrombosis) to estimate future risk of venous thromboembolism: prospective cohort study. BMJ 343:d4656
Lidegaard Ø et al (2009) Hormonal contraception and risk of venous thromboembolism: national follow-up study. BMJ 339:b2890
Ammann EM et al (2018) Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf 27(10):1092–1100
Riley RD et al (2016) External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 353:i3140
Collins GS, Ogundimu EO, Altman DG (2016) Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 35(2):214–226
Altman DG, Royston P (2000) What do we mean by validating a prognostic model? Stat Med 19(4):453–473
Pavlou M (2015) How to develop a more accurate risk prediction model when there are few events. BMJ 11(351):h3868
Riley RD et al (2019) Minimum sample size for developing a multivariable prediction model: PART II-binary and time-to-event outcomes. Stat Med 38(7):1276–1296
McGinn TG et al (2000) Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. JAMA 284(1):79–84
Moons KG et al (2019) PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 170(1):W1–W33
Wolff RF et al (2019) PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 170(1):51–58
Collins GS et al (2014) External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol 14:40
This research received no specific Grant from any funding agency in the public, commercial or not-for-profit sectors.
Conflict of interest
All authors report no relevant disclosures.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nikolakopoulos, I., Nourabadi, S., Eldredge, J.B. et al. Using big data to retrospectively validate the COMPASS-CAT risk assessment model: considerations on methodology. J Thromb Thrombolysis 51, 12–16 (2021). https://doi.org/10.1007/s11239-020-02191-8
- External validation
- Risk models
- Venous thromboembolism