Early-Stage Event Prediction for Longitudinal Data

  • Mahtab J. Fard
  • Sanjay Chawla
  • Chandan K. Reddy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9651)


Predicting event occurrence at an early stage in longitudinal studies is an important problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. The main objective of this work is to predict the event occurrence in the future for a particular subject in the study using the data collected at the initial stages of a longitudinal study. In this paper, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we develop two probabilistic algorithms based on Naive Bayes and Tree-Augmented Naive Bayes (TAN), called ESP-NB and ESP-TAN, respectively, for early stage event prediction by modifying the posterior probability of event occurrence using different extrapolations that are based on Weibull and Lognormal distributions. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative approaches.


Prediction Regression Longitudinal data Survival analysis 


  1. 1.
    Bandyopadhyay, S., Wolfson, J., Vock, D.M., Vazquez-Benitez, G., Adomavicius, G., Elidrisi, M., Johnson, P.E., O’Connor, P.J.: Data mining for censored time-to-event data: a bayesian network model for predicting cardiovascular risk from electronic health record data. Data Min. Knowl. Disc. 29(4), 1033–1069 (2015)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bender, R., Augustin, T., Blettner, M.: Generating survival times to simulate Cox proportional hazards models. Stat. Med. 25, 1978–1979 (2006)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Carroll, K.J.: On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials 24(6), 682–701 (2003)CrossRefGoogle Scholar
  4. 4.
    Dawber, T.R., Kannel, W.B., Lyell, L.P.: An approach to longitudinal studies in a community: the Framingham study. Ann. N.Y. Acad. Sci. 107(2), 539–556 (1963)CrossRefGoogle Scholar
  5. 5.
    Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., Leisch, M.F.: Package e1071. R Software package (2009). http://cran.rproject.org/web/packages/e1071/index.html
  6. 6.
    Donovan, M.J., Donovan, M.J., Hamann, S., Clayton, M., et al.: Systems pathology approach for the prediction of prostate cancer progression after radical prostatectomy. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 26(24), 3923–3929 (2008)CrossRefGoogle Scholar
  7. 7.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)CrossRefMATHGoogle Scholar
  8. 8.
    Gordon, L., Plshen, R.: Tree-structured survival analysis. Cancer Treat Rep. 69(10), 1065–1074 (1985)Google Scholar
  9. 9.
    Hosmer, D.W., Lemeshow, S.: Applied Survival Analysis: Regression Modeling of Time to Event Data. Wiley, New York (1999)MATHGoogle Scholar
  10. 10.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  11. 11.
    Khan, F.M., Zubek, V.B.: Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 8th IEEE International Conference on Data Mining, pp. 863–868 (2008)Google Scholar
  12. 12.
    Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)CrossRefGoogle Scholar
  13. 13.
    Lee, E.T., Wang, J.: Statistical Methods for Survival Data Analysis, vol. 476. Wiley, New York (2003)CrossRefMATHGoogle Scholar
  14. 14.
    Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in biomedicine and health-care. Artif. Intell. Med. 30(3), 201–214 (2004)CrossRefGoogle Scholar
  15. 15.
    Reddy, C.K., Li, Y.: A review of clinical prediction models. In: Reddy, C.K., Aggarwal, C.C. (eds.) Healthcare Data Analytics. Chapman and Hall/CRC Press, Boca Raton (2015)Google Scholar
  16. 16.
    Royston, P.: The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Stat. Neerl. 55(1), 89–104 (2001)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Segal, M.R.: Regression trees for censored data. Biometrics 44(1), 35–47 (1988)CrossRefMATHGoogle Scholar
  18. 18.
    Shiao, H.-T., Cherkassky, V.: Learning using privileged information (LUPI) for modeling survival data. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1042–1049, July 2014Google Scholar
  19. 19.
    Štajduhar, I., Dalbelo-Bašić, B.: Uncensoring censored data for machine learning: a likelihood-based approach. Expert Syst. Appl. 39(8), 7226–7234 (2012)CrossRefGoogle Scholar
  20. 20.
    Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Vock, D.M., Musgrove, D., Adomavicius, G., Johnson, P.E., O’Connor, P.J.: A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat. Med. 34(21), 2941–2957 (2015)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Zupan, B., DemšAr, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mahtab J. Fard
    • 1
  • Sanjay Chawla
    • 2
    • 3
  • Chandan K. Reddy
    • 1
  1. 1.Computer Science DepartmentWayne State UniversityDetroitUSA
  2. 2.Qatar Computing Research Institute, HBKUAr-rayyanQatar
  3. 3.University of SydneySydneyAustralia

Personalised recommendations