Data Mining and Knowledge Discovery

, Volume 20, Issue 3, pp 388–415 | Cite as

Time to CARE: a collaborative engine for practical disease prediction

  • Darcy A. Davis
  • Nitesh V. Chawla
  • Nicholas A. Christakis
  • Albert-László Barabási


The monumental cost of health care, especially for chronic disease treatment, is quickly becoming unmanageable. This crisis has motivated the drive towards preventative medicine, where the primary concern is recognizing disease risk and taking action at the earliest signs. However, universal testing is neither time nor cost efficient. We propose CARE, a Collaborative Assessment and Recommendation Engine, which relies only on patient’s medical history using ICD-9-CM codes in order to predict future disease risks. CARE uses collaborative filtering methods to predict each patient’s greatest disease risks based on their own medical history and that of similar patients. We also describe an Iterative version, ICARE, which incorporates ensemble concepts for improved performance. Also, we apply time-sensitive modifications which make the CARE framework practical for realistic long-term use. These novel systems require no specialized information and provide predictions for medical conditions of all kinds in a single run. We present experimental results on a large Medicare dataset, demonstrating that CARE and ICARE perform well at capturing future disease risks.


Collaborative filtering Prospective medicine Disease prediction Electronic healthcare record 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2): 121–167CrossRefGoogle Scholar
  2. Barabasi A-L (2007) Network medicine—from obesity to the dieaseasome. N Engl J Med 357: 404–407CrossRefGoogle Scholar
  3. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. Technical Report MSR-TR-98-12, Microsoft Research, MayGoogle Scholar
  4. Cherry DK, Burt CW, Woodwell D (2001) A national ambulatory medical care survey: 2001 summary. Adv Data 337: 1–16Google Scholar
  5. Christakis NA, Allison PD (2006) Mortality after the hospitalization of a spouse. N Engl J Med 354(7): 719–730CrossRefGoogle Scholar
  6. Cordn O, Herrera F, de la Montab́1a J, Sd́fnchez A, Villar P (2002) A prediction system for cardiovascularity diseases using genetic fuzzy rule-based systems. In: Proceedings of the 8th Ibero–American Conference on AI, pp 381–391. Springer, BerlinGoogle Scholar
  7. Coyle P, Hartung H-P (2002) Use of interferon beta in multiple sclerosis: rationale for early treatment and evidence of dose- and frequency-dependent effects on clinical response. Multiple Scler 8(1): 2–9CrossRefGoogle Scholar
  8. Davis D, Chawla NV, Blumm N, Christakis N, Barabasi A-L (2008a) Care for your future: prospective disease prediction using collaborative filtering. In: Proceedings of the KDD 2008 workshop on mining medical dataGoogle Scholar
  9. Davis D, Chawla NV, Blumm N, Christakis N, Barabasi A-L (2008b) Predicting individual disease risk based on medical history. In: Proceedings of the ACM conference on information and knowledge managementGoogle Scholar
  10. Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp 1–15, JuneGoogle Scholar
  11. Edelman D et al (2006) A multidimensional integrative medicine intervention to improve cardiovascular risk. J Gen Intern Med 21(7): 728–734CrossRefGoogle Scholar
  12. Glasgow RE et al (2001) Does the chronic care model serve also as a template for improving prevention?. Milbank Q 79(4): 579–612CrossRefGoogle Scholar
  13. Goldberg K, Roeder T, Gupta D, Perkins C (2000) Eigentaste: a constant time collaborative filtering algorithm. Technical report, University of California, Berkley, AugustGoogle Scholar
  14. Grcar M, Fortuna B, Mladenic D (2005) Knn versus svm in the collaborative filtering framework. In: WebKDD AugustGoogle Scholar
  15. Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C (2001) Dependency networks for inference, collaborative filtering, and data visualization. Technical Report MSR-TR-2000-16, Microsoft Research, FebruaryGoogle Scholar
  16. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Sys 22: 5–53CrossRefGoogle Scholar
  17. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Sys 22: 89–115CrossRefGoogle Scholar
  18. Hofmann T, Puzicha J (1999) Latent class models for collaborative filtering. In: Proceedings of the 16th international joint conference on artificial intelligence, pp 688–693Google Scholar
  19. Hunt J, Kristal A, White E, Lynch J, Fries E (1995) Physician recommendations for dietary change: their prevalence and impact in a population-based sample. Am J Public Health 85: 722–726CrossRefGoogle Scholar
  20. Kahn CE Jr (2005) Collaborative filtering to improve navigation of large radiology knowledge resources. J Digit Imaging 18(2): 131–137CrossRefGoogle Scholar
  21. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III (1961) Factors of risk in the development of coronary heart disease: six-year follow-up experience: The Framingham Study. Ann Intern Med 55: 33–50Google Scholar
  22. Koertge J et al (2003) Improvement in medical risk factors and quality of life in women and men with coronary heart disease in the Multicenter Lifestyle Demonstration Project. Am J Cardiol 91(11): 1316–1322CrossRefGoogle Scholar
  23. Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) Grouplens: applying collaborative filtering to usenet news. Commun ACM 40: 77–87CrossRefGoogle Scholar
  24. Lauderdale DS, Furner SE, Miles TP, Goldberg J (1993) Epidemiologic uses of medicare data. Epidemiol Rev 15: 319–327Google Scholar
  25. Liu Y, Teverovskiy L, Lopez O, Aizenstein H, Meltzer C, Becker J (2007) Discovery of biomarkers for alzheimer’s disease prediction from structural mr images. In: 2007 IEEE international symposium on biomedical imaging, AprilGoogle Scholar
  26. Loscalzo J (2007) Association studies in an era of too much information - clinical analysis of new biomarker andgenetic data. Circulation 116(17): 1866–1870CrossRefGoogle Scholar
  27. Loscalzo J, Kohane I, Barabasi A-L (2007) Human disease classification in the postgenomic era. Mol Syst BiolGoogle Scholar
  28. Mitchell JB, Bubolz T, Paul JE, Pashos CI, Escarce JJ, Muhlbaier LH, Wiesman JM, Young WW, Epstein RS, Javitt JC (1994) Using medicare claims for outcomes research. Medical Care 32: 38–51CrossRefGoogle Scholar
  29. Mould R (2003) Prediction of long-term survival rates of cancer patients. Lancet 361: 262CrossRefGoogle Scholar
  30. NC for Health Statistics (2007) International Classification of Diseases, 9th Revision, Clinical modification (icd-9-cm).
  31. NC Institute (2007) Cancer trends progress report—2007 updateGoogle Scholar
  32. Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: KDDCup AugustGoogle Scholar
  33. Pennock DM, Horvitz E (1999) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: Proceedings of the IJCAI workshop on machine learning for information filteringGoogle Scholar
  34. Resnick P, Iancovou N, Sushak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the ACM conference on computer supported cooperative, pp 175–186Google Scholar
  35. Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill, New YorkzbMATHGoogle Scholar
  36. Shardanand U, Maes P (1995) Social information filtering: algorithms for automating “word of mouth”. In: Proceedings of the computer human interaction, pp 210–217, MayGoogle Scholar
  37. Si L, Jin R (2003) Flexible mixture model for collaborative filtering. In: Proceedings of ICMLGoogle Scholar
  38. Snyderman R, Williams RS (2003) Prospective medicine: the next health care transformation. Future MedGoogle Scholar
  39. Starfield B, Lemke KW, Bernhardt T, Foldes SS, Forrest CB, Weiner JP (2003) Comorbidity: implications for the the importance of primary care in case management. Ann Fam Med 1: 8–14CrossRefGoogle Scholar
  40. van den Akker M, Buntinx F, Metsemakers JF, Roos S, Knottnerus JA (1998) Multimorbidity in general practice: prevalence, incidence, and determinants of co-occuring chronic and recurrent diseases. J Clin Epidemiol 51: 367–375CrossRefGoogle Scholar
  41. Weston AD, Hood L (2004) Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res 3(2): 179–196CrossRefGoogle Scholar
  42. Wong DT, Knaus WA (1991) Predicting outcome in critical care: the current status of the apache prognostic scoring system. Can J Anesth 38: 374–383CrossRefGoogle Scholar
  43. WTC Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Darcy A. Davis
    • 1
  • Nitesh V. Chawla
    • 1
  • Nicholas A. Christakis
    • 2
  • Albert-László Barabási
    • 3
  1. 1.Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA)University of Notre DameNotre DameUSA
  2. 2.Harvard Medical SchoolBostonUSA
  3. 3.Northeastern UniversityBostonUSA

Personalised recommendations