Skip to main content

Time to CARE: a collaborative engine for practical disease prediction


The monumental cost of health care, especially for chronic disease treatment, is quickly becoming unmanageable. This crisis has motivated the drive towards preventative medicine, where the primary concern is recognizing disease risk and taking action at the earliest signs. However, universal testing is neither time nor cost efficient. We propose CARE, a Collaborative Assessment and Recommendation Engine, which relies only on patient’s medical history using ICD-9-CM codes in order to predict future disease risks. CARE uses collaborative filtering methods to predict each patient’s greatest disease risks based on their own medical history and that of similar patients. We also describe an Iterative version, ICARE, which incorporates ensemble concepts for improved performance. Also, we apply time-sensitive modifications which make the CARE framework practical for realistic long-term use. These novel systems require no specialized information and provide predictions for medical conditions of all kinds in a single run. We present experimental results on a large Medicare dataset, demonstrating that CARE and ICARE perform well at capturing future disease risks.

This is a preview of subscription content, access via your institution.


  1. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2): 121–167

    Article  Google Scholar 

  2. Barabasi A-L (2007) Network medicine—from obesity to the dieaseasome. N Engl J Med 357: 404–407

    Article  Google Scholar 

  3. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. Technical Report MSR-TR-98-12, Microsoft Research, May

  4. Cherry DK, Burt CW, Woodwell D (2001) A national ambulatory medical care survey: 2001 summary. Adv Data 337: 1–16

    Google Scholar 

  5. Christakis NA, Allison PD (2006) Mortality after the hospitalization of a spouse. N Engl J Med 354(7): 719–730

    Article  Google Scholar 

  6. Cordn O, Herrera F, de la Montab́1a J, Sd́fnchez A, Villar P (2002) A prediction system for cardiovascularity diseases using genetic fuzzy rule-based systems. In: Proceedings of the 8th Ibero–American Conference on AI, pp 381–391. Springer, Berlin

  7. Coyle P, Hartung H-P (2002) Use of interferon beta in multiple sclerosis: rationale for early treatment and evidence of dose- and frequency-dependent effects on clinical response. Multiple Scler 8(1): 2–9

    Article  Google Scholar 

  8. Davis D, Chawla NV, Blumm N, Christakis N, Barabasi A-L (2008a) Care for your future: prospective disease prediction using collaborative filtering. In: Proceedings of the KDD 2008 workshop on mining medical data

  9. Davis D, Chawla NV, Blumm N, Christakis N, Barabasi A-L (2008b) Predicting individual disease risk based on medical history. In: Proceedings of the ACM conference on information and knowledge management

  10. Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp 1–15, June

  11. Edelman D et al (2006) A multidimensional integrative medicine intervention to improve cardiovascular risk. J Gen Intern Med 21(7): 728–734

    Article  Google Scholar 

  12. Glasgow RE et al (2001) Does the chronic care model serve also as a template for improving prevention?. Milbank Q 79(4): 579–612

    Article  Google Scholar 

  13. Goldberg K, Roeder T, Gupta D, Perkins C (2000) Eigentaste: a constant time collaborative filtering algorithm. Technical report, University of California, Berkley, August

  14. Grcar M, Fortuna B, Mladenic D (2005) Knn versus svm in the collaborative filtering framework. In: WebKDD August

  15. Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C (2001) Dependency networks for inference, collaborative filtering, and data visualization. Technical Report MSR-TR-2000-16, Microsoft Research, February

  16. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Sys 22: 5–53

    Article  Google Scholar 

  17. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Sys 22: 89–115

    Article  Google Scholar 

  18. Hofmann T, Puzicha J (1999) Latent class models for collaborative filtering. In: Proceedings of the 16th international joint conference on artificial intelligence, pp 688–693

  19. Hunt J, Kristal A, White E, Lynch J, Fries E (1995) Physician recommendations for dietary change: their prevalence and impact in a population-based sample. Am J Public Health 85: 722–726

    Article  Google Scholar 

  20. Kahn CE Jr (2005) Collaborative filtering to improve navigation of large radiology knowledge resources. J Digit Imaging 18(2): 131–137

    Article  Google Scholar 

  21. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III (1961) Factors of risk in the development of coronary heart disease: six-year follow-up experience: The Framingham Study. Ann Intern Med 55: 33–50

    Google Scholar 

  22. Koertge J et al (2003) Improvement in medical risk factors and quality of life in women and men with coronary heart disease in the Multicenter Lifestyle Demonstration Project. Am J Cardiol 91(11): 1316–1322

    Article  Google Scholar 

  23. Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) Grouplens: applying collaborative filtering to usenet news. Commun ACM 40: 77–87

    Article  Google Scholar 

  24. Lauderdale DS, Furner SE, Miles TP, Goldberg J (1993) Epidemiologic uses of medicare data. Epidemiol Rev 15: 319–327

    Google Scholar 

  25. Liu Y, Teverovskiy L, Lopez O, Aizenstein H, Meltzer C, Becker J (2007) Discovery of biomarkers for alzheimer’s disease prediction from structural mr images. In: 2007 IEEE international symposium on biomedical imaging, April

  26. Loscalzo J (2007) Association studies in an era of too much information - clinical analysis of new biomarker andgenetic data. Circulation 116(17): 1866–1870

    Article  Google Scholar 

  27. Loscalzo J, Kohane I, Barabasi A-L (2007) Human disease classification in the postgenomic era. Mol Syst Biol

  28. Mitchell JB, Bubolz T, Paul JE, Pashos CI, Escarce JJ, Muhlbaier LH, Wiesman JM, Young WW, Epstein RS, Javitt JC (1994) Using medicare claims for outcomes research. Medical Care 32: 38–51

    Article  Google Scholar 

  29. Mould R (2003) Prediction of long-term survival rates of cancer patients. Lancet 361: 262

    Article  Google Scholar 

  30. NC for Health Statistics (2007) International Classification of Diseases, 9th Revision, Clinical modification (icd-9-cm).

  31. NC Institute (2007) Cancer trends progress report—2007 update

  32. Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: KDDCup August

  33. Pennock DM, Horvitz E (1999) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: Proceedings of the IJCAI workshop on machine learning for information filtering

  34. Resnick P, Iancovou N, Sushak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the ACM conference on computer supported cooperative, pp 175–186

  35. Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill, New York

    MATH  Google Scholar 

  36. Shardanand U, Maes P (1995) Social information filtering: algorithms for automating “word of mouth”. In: Proceedings of the computer human interaction, pp 210–217, May

  37. Si L, Jin R (2003) Flexible mixture model for collaborative filtering. In: Proceedings of ICML

  38. Snyderman R, Williams RS (2003) Prospective medicine: the next health care transformation. Future Med

  39. Starfield B, Lemke KW, Bernhardt T, Foldes SS, Forrest CB, Weiner JP (2003) Comorbidity: implications for the the importance of primary care in case management. Ann Fam Med 1: 8–14

    Article  Google Scholar 

  40. van den Akker M, Buntinx F, Metsemakers JF, Roos S, Knottnerus JA (1998) Multimorbidity in general practice: prevalence, incidence, and determinants of co-occuring chronic and recurrent diseases. J Clin Epidemiol 51: 367–375

    Article  Google Scholar 

  41. Weston AD, Hood L (2004) Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res 3(2): 179–196

    Article  Google Scholar 

  42. Wong DT, Knaus WA (1991) Predicting outcome in critical care: the current status of the apache prognostic scoring system. Can J Anesth 38: 374–383

    Article  Google Scholar 

  43. WTC Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Nitesh V. Chawla.

Additional information

Responsible editor: R. Bharat Rao and Romer Rosales.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Davis, D.A., Chawla, N.V., Christakis, N.A. et al. Time to CARE: a collaborative engine for practical disease prediction. Data Min Knowl Disc 20, 388–415 (2010).

Download citation


  • Collaborative filtering
  • Prospective medicine
  • Disease prediction
  • Electronic healthcare record