Joint Prediction of Chronic Conditions Onset: Comparing Multivariate Probits with Multiclass Support Vector Machines
We consider the problem of building accurate models that can predict, in the short term (2–3 years), the onset of one or more chronic conditions at individual level. Five chronic conditions are considered: heart disease, stroke, diabetes, hypertension and cancer. Covariates for the models include standard demographic/socio-economic variables, risk factors and the presence of the chronic conditions at baseline. We compare two predictive models. The first model is the multivariate probit (MVP), chosen because it allows to model correlated outcome variables. The second model is the Multiclass Support Vector Machine (MSVM), a leading predictive method in machine learning. We use Australian data from the Social, Economic, and Environmental Factory (SEEF) study, a follow up to the 45 and Up Study survey, that contains two repeated observations of 60,000 individuals in NSW, over age 45. We find that MSVMs predictions have specificity rates similar to those of MVPs, but sensitivity rates that are on average 12 % points larger than those of MVPs, translating in a large average improvement in sensitivity of 30 %.
KeywordsChronic Condition Deep Learn Latent Variable Model Multiclass Support Vector Machine Multivariate Probit Model
This research was completed using data collected through the 45 and Up Study (http://www.saxinstitute.org.au). The 45 and Up Study is managed by the Sax Institute in collaboration with major partner Cancer Council NSW, and partners the National Heart Foundation of Australia (NSW Division), NSW Ministry of Health, beyondblue, NSW Government Family & Community Services Carers, Ageing and Disability Inclusion, and the Australian Red Cross Blood Service. We thank the many thousands of people participating in the 45 and Up Study. We also thank Capital Markets CRC,that has sponsored this research.
- 1.World Health Organization: Preventing Chronic Diseases. A Vital Investment, World Health Organization, Geneva (2005)Google Scholar
- 3.Lymer, S., Brown, L., Duncan, A.: Modelling the health system in an ageing Australia, using a dynamic microsimulation model. Working Paper 11/09, NATSEM at the University of Canberra, June 2011Google Scholar
- 5.Greene, W.: Econometric Analysis. Pearson Education, New Jersey (2003)Google Scholar
- 6.Cappellari, L., Jenkins, S.: Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation. Stata J. 6(2), 156–189 (2006)Google Scholar
- 7.Collaborators, U.S.: Cohort profile: the 45 and up study. Int. J. Epidemiol. 37(5), 941 (2008)Google Scholar
- 12.Smits, G.F., Jordaan, E.M.: Improved SVM regression using mixtures of kernels. In: Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN 2002, vol. 3, pp. 2785–2790. IEEE (2002)Google Scholar
- 14.Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. ESANN 99, 219–224 (1999)Google Scholar