Knowledge and Information Systems

, Volume 43, Issue 3, pp 555–582 | Cite as

Stabilized sparse ordinal regression for medical risk stratification

  • Truyen TranEmail author
  • Dinh Phung
  • Wei Luo
  • Svetha Venkatesh
Regular Paper


The recent wide adoption of electronic medical records (EMRs) presents great opportunities and challenges for data mining. The EMR data are largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.


Medical risk stratification Sparse ordinal regression  Stability  Feature graph Electronic medical record 



We thank Ross Arblaster and Ann Larkins for helping data collections, Paul Cohen for providing management support for the project, Richard Harvey for risk stratification, Michael Berk and Richard Kennedy for valuable opinions and anonymous reviewers for helpful comments.


  1. 1.
    Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J (2010) Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinform 11(277)Google Scholar
  2. 2.
    Allen MH, Abar BW, McCormick M, Barnes DH, Haukoos J, Garmel GM, Boudreaux ED (2013) Screening for suicidal ideation and attempts among emergency department medical patients: instrument and results from the psychiatric emergency research collaboration. Suicide Life-Threat Behav 43(3):313–323CrossRefGoogle Scholar
  3. 3.
    Austin PC, Tu JV (2004) Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol 57(11):1138–1146CrossRefGoogle Scholar
  4. 4.
    Baccianella S, Esuli A, Sebastiani F (2009) Evaluation measures for ordinal regression. In: Intelligent systems design and applications, 2009. ISDA’09. Ninth international conference on. IEEE, pp 283–287Google Scholar
  5. 5.
    Bender R, Grouven U (1997) Ordinal logistic regression in medical research. J R Coll Phys Lond 31(5):546–551Google Scholar
  6. 6.
    Bi J, Bennett K, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243zbMATHGoogle Scholar
  7. 7.
    Blasco-Fontecilla H, Delgado-Gomez D, Ruiz-Hernandez D, Aguado D, Baca-Garcia E, Lopez-Castroman J (2012) Combining scales to assess suicide risk. J Psychiatr Res 46(10):1272–1277CrossRefGoogle Scholar
  8. 8.
    Borges G, Nock MK, Abad JMH, Hwang I, Sampson NA, Alonso J, Andrade LH, Angermeyer MC, Beautrais A, Bromet E et al (2010) Twelve month prevalence of and risk factors for suicide attempts in the WHO World Mental Health Surveys. J Clin Psychiatry 71(12):1617–1628CrossRefGoogle Scholar
  9. 9.
    Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526zbMATHMathSciNetGoogle Scholar
  10. 10.
    Brown G, Beck A, Steer R, Grisham J (2000) Risk factors for suicide in psychiatric outpatients: a 20-year prospective study. J Consult Clin Psychol 68(3):371–377CrossRefGoogle Scholar
  11. 11.
    Cardoso J, da Costa J (2007) Learning to classify ordinal data: the data replication method. J Mach Learn Res 8:1393–1429zbMATHMathSciNetGoogle Scholar
  12. 12.
    Chu W, Ghahramani Z (2006) Gaussian processes for ordinal regression. J Mach Learn Res 6:1019–1041MathSciNetGoogle Scholar
  13. 13.
    Chu W, Keerthi S (2007) Support vector ordinal regression. Neural Comput 19(3):792–815CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Crammer K, Singer Y (2002) Pranking with ranking. In: Advances in neural information processing systems, vol. 14, pp 641–647Google Scholar
  15. 15.
    Da Cruz D, Pearson A, Saini P, Miles C, While D, Swinson N, Williams A, Shaw J, Appleby L, Kapur N (2011) Emergency department contact prior to suicide in mental health patients. Emerg Med J 28(6):467–471CrossRefGoogle Scholar
  16. 16.
    Delgado-Gomez D, Blasco-Fontecilla H, Alegria AA, Legido-Gil T, Artes-Rodriguez A, Baca-Garcia E (2011) Improving the accuracy of suicide attempter classification. Artif Intell Med 52(3):165–168CrossRefGoogle Scholar
  17. 17.
    Donoho DL, Elad M, Temlyakov VN (2006) Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans Inf Theory 52(1):6–18CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1(1):54–75CrossRefMathSciNetGoogle Scholar
  19. 19.
    Elixhauser A, Steiner C, Harris DR, Coffey RM (1998) Comorbidity measures for use with administrative data. Med Care 36(1):8–27CrossRefGoogle Scholar
  20. 20.
    Fei H, Quanz B, Huan J (2010) Regularization and feature selection for networked features. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1893–1896Google Scholar
  21. 21.
    Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Gonda X, Pompili M, Serafini G, Montebovi F, Campi S, Dome P, Duleba T, Girardi P, Rihmer Z (2012) Suicidal behavior in bipolar disorder: epidemiology, characteristics and major risk factors. J Affect DisordGoogle Scholar
  23. 23.
    Gulgezen G, Cataltepe Z, Yu L (2009) Stable and accurate feature selection. In: Machine learning and knowledge discovery in databases. Lecture Notes in Computer Science, vol 5781, Chap 47. Springer, pp 455–468. doi: 10.1007/978-3-642-04180-8_47.
  24. 24.
    Haw C, Hawton K (2011) Living alone and deliberate self-harm: a case-control study of characteristics and risk factors. Soc Psychiatry Psychiatr Epidemiol 46(11):1115–1125CrossRefGoogle Scholar
  25. 25.
    Herbrich R, Graepel T, Obermayer K (1999) Large margin rank boundaries for ordinal regression. Advances in neural information processing systems, pp 115–132Google Scholar
  26. 26.
    Huang J, Zhang T, Metaxas D (2011) Learning with structured sparsity. J Mach Learn Res 12:3371–3412zbMATHMathSciNetGoogle Scholar
  27. 27.
    Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405CrossRefGoogle Scholar
  28. 28.
    Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116CrossRefGoogle Scholar
  29. 29.
    Kuncheva LI (2007) A stability index for feature selection. In: Artificial intelligence and applications, pp 421–427Google Scholar
  30. 30.
    Large M, Nielssen O (2010) Suicide in Australia: meta-analysis of rates and methods of suicide between 1988 and 2007. Med J Aust 192(8):432–437Google Scholar
  31. 31.
    Large M, Nielssen O (2012) Suicide is preventable but not predictable. Australas Psychiatry 20(6):532–533CrossRefGoogle Scholar
  32. 32.
    Large M, Ryan C, Nielssen O (2011) The validity and utility of risk assessment for inpatient suicide. Australas Psychiatry 19(6):507–512CrossRefGoogle Scholar
  33. 33.
    Lausser L, Müssel C, Maucher M, Kestler HA (2013) Measuring and visualizing the stability of biomarker selection techniques. Comput Stat 28(1):51–65CrossRefzbMATHGoogle Scholar
  34. 34.
    Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9):1175–1182CrossRefGoogle Scholar
  35. 35.
    Li L, Lin H-T (2006) Ordinal regression by extended binary classification. In: Advances in neural information processing systems. pp 865–872Google Scholar
  36. 36.
    Luo D, Ding C, Huang H (2012) Toward structural sparsity: an explicit \(\ell \_{2}/ \ell \_{0}\) approach. Knowl Inf Syst 36(2):411–438CrossRefGoogle Scholar
  37. 37.
    Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) SOR: scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: SIAM data mining conferenceGoogle Scholar
  38. 38.
    Luoma JB, Martin CE, Pearson JL (2002) Contact with mental health and primary care providers before suicide: a review of the evidence. Am J Psychiatry 159(6):909–916CrossRefGoogle Scholar
  39. 39.
    Martin-Fumadó C, Hurtado-Ruíz G (2012) Clinical and epidemiological aspects of suicide in patients with schizophrenia. Actas Esp Psiquiatr 40(6):333–345Google Scholar
  40. 40.
    McCullah P (1980) Regression models for ordinal data. J R Stat Soc Ser B (Methodological) 42(2):109–142Google Scholar
  41. 41.
    Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc Ser B (Statistical Methodology) 72(4):417–473CrossRefGoogle Scholar
  42. 42.
    Miguel Hernández-Lobato J, Hernández-Lobato D, Suárez A (2011) Network-based sparse Bayesian classification. Pattern Recognit 44(4):886–900CrossRefzbMATHGoogle Scholar
  43. 43.
    Modai I, Kurs R, Ritsner M, Oklander S, Silver H, Segal A, Goldberg I, Mendel S (2002) Neural network identification of high-risk suicide patients. Inform Health Soc Care 27(1):39–47CrossRefGoogle Scholar
  44. 44.
    Morris-Yates A (2000) Mapping ICD-10 codes to mental health diagnostic groups. In: The SPGPPS national model for data collection and analysis. Commonwealth of Australia. Retrieved from, 09/09/2013, Ch. Appendix 11, pp 316–322
  45. 45.
    Nock MK, Green JG, Hwang I, McLaughlin KA, Sampson NA, Zaslavsky AM, Kessler RC (2013) Prevalence, correlates, and treatment of lifetime suicidal behavior among adolescentsresults from the national comorbidity survey replication adolescent supplementlifetime suicidal behavior among adolescents. JAMA Psychiatry 70(3):300–310CrossRefGoogle Scholar
  46. 46.
    Oquendo M, Baca-Garcia E, Artes-Rodriguez A, Perez-Cruz F, Galfalvy H, Blasco-Fontecilla H, Madigan D, Duan N (2012) Machine learning and data mining: strategies for hypothesis generation. Mol Psychiatry 17(10):956–959CrossRefGoogle Scholar
  47. 47.
    Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227CrossRefzbMATHGoogle Scholar
  48. 48.
    Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A (2010) Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010(3):19–28CrossRefGoogle Scholar
  49. 49.
    Poggio T, Rifkin R, Mukherjee S, Niyogi P (2004) General conditions for predictivity in learning theory. Nature 428(6981):419–422CrossRefGoogle Scholar
  50. 50.
    Pokorny AD (1983) Prediction of suicide in psychiatric patients: report of a prospective study. Arch Gen Psychiatry 40(3):249–257CrossRefGoogle Scholar
  51. 51.
    Qin P, Webb R, Kapur N, Sørensen HT (2013) Hospitalization for physical illness and risk of subsequent suicide: a population study. J Intern Med 273(1):48–58CrossRefGoogle Scholar
  52. 52.
    Ruiz F, Valera I, Blanco C, Perez-Cruz F (2012) Bayesian nonparametric modeling of suicide attempts. Advances in neural information processing systems 25, pp 1862–1870Google Scholar
  53. 53.
    Ryan C, Large M (2012) Suicide risk assessment: where are we now? Med J Aust 198(9):462–463CrossRefGoogle Scholar
  54. 54.
    Ryan C, Nielssen O, Paton M, Large M (2010) Clinical decisions in psychiatry should not be based on risk assessment. Australas Psychiatry 18(5):398–403CrossRefGoogle Scholar
  55. 55.
    Sandler T, Blitzer J, Talukdar PP, Ungar LH (2008) Regularized learning with networks of features. In: Advances in neural information processing systems, pp 1401–1408Google Scholar
  56. 56.
    Somol P, Novovicova J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anals Mach Intell 32(11):1921–1939CrossRefGoogle Scholar
  57. 57.
    Soneson C, Fontes M (2012) A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities. Biostatistics 13(1):129–141CrossRefzbMATHGoogle Scholar
  58. 58.
    Steyerberg EW (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer, BerlinCrossRefGoogle Scholar
  59. 59.
    Sun B-Y, Li J, Wu DD, Zhang X-M, Li W-B (2010) Kernel discriminant learning for ordinal regression. IEEE Trans Knowl Data Eng 22(6):906–910CrossRefGoogle Scholar
  60. 60.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288zbMATHMathSciNetGoogle Scholar
  61. 61.
    Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Statistical Methodology) 67(1):91–108CrossRefzbMATHMathSciNetGoogle Scholar
  62. 62.
    Tran T, Phung D, Luo W, Harvey R, Berk M, Venkatesh S (2013) An integrated framework for suicide risk prediction. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1410–1418Google Scholar
  63. 63.
    Tran T, Phung D, Venkatesh S (2012) Sequential decision approach to ordinal preferences in recommender systems. In: Proceedings of the 26th AAAI conference. Toronto, ON, CanadaGoogle Scholar
  64. 64.
    Tutz G (1991) Sequential models in categorical regression. Comput Stat Data Anal 11(3):275–295CrossRefzbMATHMathSciNetGoogle Scholar
  65. 65.
    Wang F, Lee N, Hu J, Sun J, Ebadollahi S (2012) Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 453–461Google Scholar
  66. 66.
    Xu H, Caramanis C, Mannor S (2012) Sparse algorithms are not stable: a no-free-lunch theorem. IEEE Trans Pattern Anal Mach Intell 34(1):187–193CrossRefMathSciNetGoogle Scholar
  67. 67.
    Ye J, Liu J (2012) Sparse methods for biomedical data. ACM SIGKDD Explor Newsl 14(1):4–15CrossRefGoogle Scholar
  68. 68.
    Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 803–811Google Scholar
  69. 69.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Statistical Methodology) 68(1):49–67 Google Scholar
  70. 70.
    Zhou J, Liu J, Narayan VA, Ye J (2013) Modeling disease progression via multi-task learning. NeuroImage 78:233–248CrossRefGoogle Scholar
  71. 71.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Statistical Methodology) 67(2):301–320CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Truyen Tran
    • 1
    • 2
    Email author
  • Dinh Phung
    • 1
  • Wei Luo
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Center for Pattern Recognition and Data Analytics, School of ITDeakin UniversityWaurn PondsAustralian
  2. 2.Department of ComputingCurtin UniversityBentleyAustralia

Personalised recommendations