Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records

  • Shivapratap GopakumarEmail author
  • Tu Dinh Nguyen
  • Truyen Tran
  • Dinh Phung
  • Svetha Venkatesh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9078)


Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.


Electronic Medical Record Consistency Index Transfer Learning Jaccard Index Semantic Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology 57, 1138–1146 (2004)CrossRefGoogle Scholar
  2. 2.
    Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. Journal of the American Statistical Association 108, 247–264 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems 21. Curran Associates, Inc., pp. 1401–1408 (2009)Google Scholar
  4. 4.
    Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)CrossRefGoogle Scholar
  5. 5.
    Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowledge and Information Systems, 1–28 (2014)Google Scholar
  6. 6.
    Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explorations Newsletter 14, 4–15 (2012)CrossRefGoogle Scholar
  7. 7.
    Zhao, P., Yu, B.: On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563 (2006)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Cun, Y., Fröhlich, H.: Biomarker gene signature discovery integrating network knowledge. Biology 1, 5–17 (2012)CrossRefGoogle Scholar
  9. 9.
    Dao, P., Wang, K., Collins, C., Ester, M., Lapuk, A., Sahinalp, S.C.: Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27, i205–i213 (2011)CrossRefGoogle Scholar
  10. 10.
    Sun, H., Lin, W., Feng, R., Li, H.: Network-regularized high-dimensional cox regression for analysis of genomic data. Statistica Sinica 24, 1433–1459 (2014)MathSciNetGoogle Scholar
  11. 11.
    Fröhlich, H.: Including network knowledge into cox regression models for biomarker signature discovery. Biometrical Journal 56, 287–306 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of Statistical Software 39, 1–13 (2011)Google Scholar
  13. 13.
    Vinzamuri, B., Reddy, C.: Cox regression with correlation based regularization for electronic health records. In: ICDM, pp. 757–766 (2013)Google Scholar
  14. 14.
    Tibshirani, R., et al.: The lasso method for variable selection in the cox model. Statistics in Medicine 16, 385–395 (1997)CrossRefGoogle Scholar
  15. 15.
    Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Xu, H., Caramanis, C., Mannor, S.: Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 187–193 (2012)CrossRefGoogle Scholar
  17. 17.
    Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Mathematical Programming 45, 503–528 (1989)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359 (2010)CrossRefGoogle Scholar
  19. 19.
    Tran, T., Phung, D.Q., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: KDD, 1410–1418 (2013)Google Scholar
  20. 20.
    Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, 421–427 (2007)Google Scholar
  21. 21.
    Vinzamuri, B., Li, Y., Reddy, C.K.: Active learning based survival regression for censored data. In: CIKM 2014, 241–250. ACM, New York (2014)Google Scholar
  22. 22.
    Bilal, E., Dutkowski, J., Guinney, J., Jang, I.S., Logsdon, B.A., Pandey, G., Sauerwine, B.A., Shimoni, Y., Vollan, H.K.M., Mecham, B.H., et al.: Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Computational Biology 9, e1003047 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Shivapratap Gopakumar
    • 1
    Email author
  • Tu Dinh Nguyen
    • 1
  • Truyen Tran
    • 1
  • Dinh Phung
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Center for Pattern Recognition and Data AnalyticsDeakin UniversityMelbourneAustralia

Personalised recommendations