Apply machine learning principles to predict hip fractures and estimate predictor importance in Dual-energy X-ray absorptiometry (DXA)-scanned men and women. Dual-energy X-ray absorptiometry data from two Danish regions between 1996 and 2006 were combined with national Danish patient data to comprise 4722 women and 717 men with 5 years of follow-up time (original cohort n = 6606 men and women). Twenty-four statistical models were built on 75% of data points through k-5, 5-repeat cross-validation, and then validated on the remaining 25% of data points to calculate area under the curve (AUC) and calibrate probability estimates. The best models were retrained with restricted predictor subsets to estimate the best subsets. For women, bootstrap aggregated flexible discriminant analysis (“bagFDA”) performed best with a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities following Naïve Bayes adjustments. A “bagFDA” model limited to 11 predictors (among them bone mineral densities (BMD), biochemical glucose measurements, general practitioner and dentist use) achieved a test AUC of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting (“xgbTree”) performed best with a test AUC of 0.89 [0.82; 0.95], but with poor calibration in higher probabilities. A ten predictor subset (BMD, biochemical cholesterol and liver function tests, penicillin use and osteoarthritis diagnoses) achieved a test AUC of 0.86 [0.78; 0.94] using an “xgbTree” model. Machine learning can improve hip fracture prediction beyond logistic regression using ensemble models. Compiling data from international cohorts of longer follow-up and performing similar machine learning procedures has the potential to further improve discrimination and calibration.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Lin CC, Ou YK, Chen SH, Liu YC, Lin J (2010) Comparison of artificial neural network and logistic regression models for predicting mortality in elderly patients with hip fracture. Injury 41(8):869–873
Jin H, Lu Y, Harris ST et al (2004) Classification algorithms for hip fracture prediction based on recursive partitioning methods. Med Decis Mak 24(4):386–398
Sundhedsdatastyrelsen, Cancerregistret. http://sundhedsdatastyrelsen.dk/da/registre-og-services/om-de-nationale-sundhedsregistre/sygedomme-laegemidler-og-behandlinger/cancerregisteret
Sundhedsdatastyrelsen, Landspatientregistret. http://sundhedsdatastyrelsen.dk/da/registre-og-services/om-de-nationale-sundhedsregistre/sygedomme-laegemidler-og-behandlinger/landspatientregisteret
Sundhedsdatastyrelsen, Lægemiddelstatistikregisteret. http://sundhedsdatastyrelsen.dk/da/registre-og-services/om-de-nationale-sundhedsregistre/sygedomme-laegemidler-og-behandlinger/laegemiddelstatistikregisteret
Statistics Denmark. http://www.dst.dk/da/Statistik/emner/befolkning-og-befolkningsfremskrivning/folketal.aspx
Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA (2004) New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J Clin Epidemiol 57(12):1288–1294
Mitra AK, Mukherjee UK, Harding T et al (2016) Single-cell analysis of targeted transcriptome predicts drug sensitivity of single cells within human myeloma tumors. Leukemia 30(5):1094–1102
Sharma GB, Robertson DD, Laney DA, Gambello MJ, Terk M (2016) Machine learning based analytics of micro-MRI trabecular bone microarchitecture and texture in type 1 Gaucher disease. J Biomech 49(9):1961–1968
Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005) SVM modeling via a hybrid genetic strategy. A health care application. Stud Health Technol Inform 116:193–198
Kim JH (2009) Estimating classification error rate: repeated cross–validation, repeated hold–out and bootstrap. Comput Stat Data Anal 53(11):3735–3745
Kohavi R (1995) A study of cross–validation and bootstrap for accuracy estimation and model selection. Int Jt Conf Artif Intell 14:1137–1145
Simon R, Radmacher MD, Dobbin K, Mcshane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Altman DG, Bland JM (1994) Diagnostic tests 3: receiver operating characteristic plots. BMJ 309(6948):188
Brown C, Davis H (2006) Receiver operating characteristics curves and related decision measures: a tutorial. Chemom Intell Lab Syst 80(1):24–38
Kvalseth T (1985) Cautionary note about R2. Am Stat 39(4):279–285
Hawkins DM, Basak SC, Mills D (2003) Assessing model fit by cross-validation. J Chem Inform Comput Sci 43(2):579–586
Martin J, Hirschberg D (1996) Small sample statistics for classification error rates I: error rate measurements. Department of Informatics and Computer Science Technical Report
Lemeshow S, Hosmer DW (1982) A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115(1):92–106
Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv large Margin Classif 10(3):61–74
B Zadrozny, C Elkan (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77(4):802–813
Kanis JA, Oden A, Johnell O et al (2007) The use of clinical risk factors enhances the performance of BMD in the prediction of hip and osteoporotic fractures in men and women. Osteoporos Int 18(8):1033–1046
Kanis JA, Johnell O, Oden A, Dawson A, De laet C, Jonsson B (2001) Ten year probabilities of osteoporotic fractures according to BMD and diagnostic thresholds. Osteoporos Int 12(12):989–995
Azagra R, Roca G, Encabo G et al (2012) FRAX® tool, the WHO algorithm to predict osteoporotic fractures: the first analysis of its discriminative and predictive ability in the Spanish FRIDEX cohort. BMC Musculoskelet Disord 13:204
Friis-holmberg T, Rubin KH, Brixen K, Tolstrup JS, Bech M (2014) Fracture risk prediction using phalangeal bone mineral density or FRAX(®)?-a Danish cohort study on men and women. J Clin Densitom 17(1):7–15
Hawkins DM (2004) The problem of overfitting. J Chem Inform Comput Sci 44(1):1–12
Van Der Putten P, Van Someren M (2004) A bias–variance analysis of a real world learning problem: the CoIL challenge 2000. Mach Learn 7(1–2):177–195
Ho-le TP, Center JR, Eisman JA, Nguyen HT, Nguyen TV (2016) Prediction of bone mineral density and fragility fracture by genetic profiling. J Bone Miner Res. Doi: 10.1002/jbmr.2998
Vestergaard P, Mosekilde L (2002) Fracture risk in patients with celiac Disease, Crohn’s disease, and ulcerative colitis: a nationwide follow-up study of 16,416 patients in Denmark. Am J Epidemiol 156(1):1–10
Zorn C (2005) A solution to separation in binary response models. Political Anal 13(2):157–170
We acknowledge Statistics Denmark for providing data and a server platform for data analysis. The Obel Family Foundation of Aalborg, Denmark, and the Department of Clinical Medicine at Aalborg University, Denmark, are acknowledged for funding the PhD fellowship of Dr. Christian Kruse. Grant Numbers are not applicable in Denmark.
Christian Kruse designed the study and performed data management, modelling, model validation, statistical analysis, graphical presentations and manuscript preparation of first draft of the paper. He is guarantor. Pia Eiken and Peter Vestergaard performed revisions and final approval of the manuscript draft. All authors revised the paper critically for intellectual content and approved the final version. All authors agree to be accountable for the work and to ensure that any questions relating to the accuracy and integrity of the paper are investigated and properly resolved.
Conflict of interest
CK has received travel grants from Eli Lilly, Otsuka Pharmaceutical and is a speaker for Novartis and Otsuka Pharmaceutical. PE is an advisory board member with Amgen, MSD and Eli Lilly and at the speakers bureau with Amgen and Eli Lilly, stocks from Novo Nordisk A/S. PV has received unrestricted grants from MSD and Servier, and travel grants from Amgen, Eli Lilly, Novartis, Sanofi-Aventis and Servier.
Human and Animal Rights and Informed Consent
The procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008.
Electronic supplementary material
Below is the link to the electronic supplementary material.
MOESM2: Supplementary material 2: Calibration plot of binned probability intervals versus actual observed percentages. A line close to the diagonal line indicates good calibration. Female subjects, bootstrap aggregated flexible discriminant analysis with Naïve Bayes calibration.
MOESM3: Supplementary material 3: Calibration plot of binned probability intervals versus actual observed percentages. A line close to the diagonal line indicates good calibration. Male subjects, eXtreme Gradient Boosting with Naïve Bayes calibration.
Rights and permissions
About this article
Cite this article
Kruse, C., Eiken, P. & Vestergaard, P. Machine Learning Principles Can Improve Hip Fracture Prediction. Calcif Tissue Int 100, 348–360 (2017). https://doi.org/10.1007/s00223-017-0238-7
- Machine learning