Abstract
An approach for disease prediction that combines clustering, Markov models and association analysis techniques is proposed. Patient medical records are first clustered, and then a Markov model is generated for each cluster to perform predictions about illnesses a patient could likely be affected in the future. However, when the probability of the most likely state in the Markov models is not sufficiently high, the framework resorts to the association analysis. High confidence rules generated by recurring to sequential disease patterns are considered, and items induced by these rules are predicted. Experimental results show that the combination of different mining models gives good predictive accuracy and it is a feasible way to diagnose diseases.
Similar content being viewed by others
References
Hidalgo CA, Blumm N, Barabási A-L, Christakis NA (2009) A dynamic network approach for the study of human phenotypes. PLoS Comput Biol 5(4):e1000353. doi:10.1371/journal.pcbi.1000353
Starfield B et al (2003) Comorbidity: implications for the importance of primary care in ‘case’ managment. Ann Family Med 1(1):8–14
Lowensteyn I et al (1998) Can computerized risk profiles help patients improve their coronary risk? The results of the coronary health assessment study. Prev Med 27(5):730–737
Wilson PWF et al (1998) Prediction of coronary heart disease using risk factor categories. Circulation 97:1837–1847
Tan P, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson International Edition, San Francisco
Davis DA, Chawla NV, Blumm N, Christakis NA, Barabási A-L (2008) Predicting individual disease risk based on medical history. In: Proceedings of the ACM international conference on information and knowledge management (CIKM’08), pp 769–778
Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to CARE: a collaborative engine for practical disease prediction. Data Mining Knowl Discov 20:388–415
Shardanand U, Maes P (1995) Social information filtering: algorithms for automating word of mouth. In: Proceedings of ACM conference on human factors in computing systems (CHI’95), pp 210–217
Steinhaeuser K, Chawla NV (2009) A network-based approach to understanding and predicting diseases. In: Social computing and behavioral modeling
Strehk RMA, Ghosh J (2000) Impact of similarity measures on web-page clustering. In: Proceedings of AAAI workshop on AI for web search, pp 58–64
Jaccard P (1912) The distribution of the flora of the alpine zone. New Phytol 11:37–50
Giannotti F, Gozzi C, Manco G (2002) Clustering transactional data. In: Proceedings of principles of data mining and knowledge discovery (PKDD’02), pp 175–187
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th berkeley symposium, vol 1, pp 281–297
Deshpande M, Karypis G (2004) Selective Markov models for predicting web page accesses. ACM Trans Internet Techn 4(2):163–184
Mongomery D, Runger G (2004) Applied statistics and probability for engineers. Wiley, London
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Folino, F., Pizzuti, C. A recommendation engine for disease prediction. Inf Syst E-Bus Manage 13, 609–628 (2015). https://doi.org/10.1007/s10257-014-0242-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10257-014-0242-7