Skip to main content

Advertisement

Log in

Mining health knowledge graph for health risk prediction

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients’ health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the patterns of various observations, the research contributes to the work of practitioners by providing a multifaceted understanding of individual and public health.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. https://www.cdc.gov/nchs/nhanes/index.htm

References

  1. Abacha, A.B., Zweigenbaum, P.: Automatic extraction of semantic relations between medical entities: a rule based approach. Journal of Biomedical Semantics 2(5), S4 (2011)

    Article  Google Scholar 

  2. Al-Mubaid, H., Nguyen, H..: Measuring semantic similarity between biomedical concepts within multiple ontologies. IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews) 39(4), 389–398 (2009)

    Article  Google Scholar 

  3. Alonso, I., Contreras, D.: Evaluation of semantic similarity metrics applied to the automatic retrieval of medical documents An UMLS approach. Expert Syst. Appl. 44, 386–399 (2016)

    Article  Google Scholar 

  4. Bowes, D., Hall, T., Gray, D.: Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering, pp. 109–118 (2012)

  5. Chang, C.-D., Wang, C.-C., Jiang, B.C.: Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors. Expert Syst. Appl. 38(5), 5507–5513 (2011)

    Article  Google Scholar 

  6. Chen, L., Li, X., Sheng, Q.Z., Peng, W.-C., Bennett, J., Hu, H.-Y., Huang, N.: Mining health examination record: graph-based approach. IEEE Trans. Knowl. Data Eng. 28(9), 2423–2437 (2016)

    Article  Google Scholar 

  7. Cheng, Y.-T., Lin, Y.-F., Chiang, K.-H., Tseng, V. S.: Mining sequential risk patterns from large-scale clinical databases for early assessment of chronic diseases: a case study on chronic obstructive pulmonary disease. IEEE J. Biomed. Health Inf. 21 (2), 303–311 (2017)

    Google Scholar 

  8. Chin, C.Y., Weng, M.Y., Lin, T.C., Cheng, S.Y., Yang, Y.H.K., Tseng, V.S.: Mining disease risk patterns from nationwide clinical databases for the assessment of early rheumatoid arthritis risk. PloS One 10(4), e0122508 (2015)

    Article  Google Scholar 

  9. Collins, F.S., Varmus, H.: A new initiative on precision medicine. New England J. Med. 372(9), 793–795 (2015)

    Article  Google Scholar 

  10. Diem, L., Chevallet, J.-P., Thuy, D.T.B.: Thesaurus-based query and document expansion in conceptual indexing with UMLS. In: 2007 IEEE International Conference on Research Innovation and Vision for the Future, 2008 (2007)

  11. Egghe, L., Leydesdorff, L.: The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. J. Am. Soc. Inf. Sci. Technol. 60(5), 1027–1036 (2009)

    Article  Google Scholar 

  12. Gardner, K., Sibthorpe, B., Chan, M., Sargent, G., Dowden, M., McAullay, D.: Implementation of continuous quality improvement in Aboriginal and Torres Strait Islander primary health care in Australia: a scoping systematic review. BMC Health Serv. Res. 18(1), 541 (2018)

    Article  Google Scholar 

  13. Greenberg, P.L., et al.: Revised international prognostic scoring system (IPSS-r) for myelodysplastic syndromes. Blood, pp. blood012 (2012)

  14. Guillory, A., Bilmes, J.A.: Label selection on graphs. In: Advances in Neural Information Processing Systems, pp. 691–699 (2009)

  15. Ha, J.-W, et al.: Predicting high-risk prognosis from diagnostic histories of adult disease patients via deep recurrent neural networks. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 394–399 (2017)

  16. Hand, D.J.: Principles of data mining. Drug Saf. 30(7), 621–622 (2007)

    Article  Google Scholar 

  17. Herland, M., Khoshgoftaar, T.M., Wald, R.: A review of data mining using big data in health informatics. J. Big Data 1(1), 2 (2014)

    Article  Google Scholar 

  18. Holzinger, A.: Machine learning for health informatics. Machine Learning for Health Informatics. Springer, pp. 1–4 (2016)

  19. Huang, F., Wang, S., Chan, C.-C.: Predicting disease by using data mining based on healthcare information system. In: 2012 IEEE International Conference on Granular Computing, pp. 191–194 (2012)

  20. Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 583–594 (2010)

  21. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 570–586 (2010)

  22. Karpagam, P., Sivasubramanian, S., Nalini, C.: Extending disease ontology with newly evaluated terms to improve semantic medical information retrieval. Int. J. Appl. Eng. Res. 11(5), 3527–3535 (2016)

    Google Scholar 

  23. Keegan, M.T., Gajic, O., Afessa, B.: Comparison of APACHE III, APACHE IV, SAPS 3, and MPM 0 III and influence of resuscitation status on model performance. Chest 142(4), 851–858 (2012)

    Article  Google Scholar 

  24. Kim, J.-K., Lee, J.-S., Park, D.-K., Lim, Y.-S., Lee, Y.-H., Jung, E.-Y.: Adaptive mining prediction model for content recommendation to coronary heart disease patients. Cluster Comput. 17(3), 881–891 (2014)

    Article  Google Scholar 

  25. Koh, H.C., Tan, G., et al.: Data mining applications in healthcare. J. Healthcare Inf. Manag. 19(2), 65 (2011)

    Google Scholar 

  26. Kong, X., Yu, P. S., Ding, Y., Wild, D.J.: Meta path-based collective classification in heterogeneous information networks. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1567–1571. ACM (2012)

  27. Lee, J.B., Kim, J, Park, J.C.: Automatic extension of Gene Ontology with flexible identification of candidate terms?. Bioinformatics ,Oxford Univ. Press 22(6), 665–670 (2006)

    Google Scholar 

  28. Long, B., Zhang, Z.M., Wu, X., Yu, P.S.: Spectral clustering for multi-type relational data. In: Proceedings of the 23rd international conference on Machine learning, pp. 585–592 (2006)

  29. Luo, C., Guan, R., Wang, Z., Lin, C.: Hetpathmine: a novel transductive classification algorithm on heterogeneous information networks. In: European Conference on Information Retrieval, pp. 210–221 (2014)

  30. Mirel, L.B., Carper, K.: Trends in health care expenditures for the elderly, Age 65 and Older: 2001, 2006, and 2011 (2014)

  31. Neuvirth, H., et al.: Toward personalized care management of patients at risk: the diabetes case study. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 395–403 (2011)

  32. Nguyen, Q., Valizadegan, H., Hauskrecht, M.: Learning classification models with soft-label information. J. Am. Med. Inform. Assoc. 21(3), 501–508 (2014)

    Article  Google Scholar 

  33. Ni, Y., Xu, Q.K., Cao, F., Mass, Y., Sheinwald, D., Zhu, H.J., Cao, S.S.: Semantic Documents Relatedness using Concept Graph Representation. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining – WSDM ’16, pp 635–644. ACM Press, New York (2016)

  34. Ni, Y., Xu, Q.K., Cao, F., Mass, Y., Sheinwald, D., Zhu, H.J., Cao, S.S.: Semantic documents relatedness using concept graph representation. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 635–644 (2016)

  35. Prakash, P., Krishna, K., Bhatia, D.: Usefulness of SAPS II scoring system as an early predictor of outcome in ICU patients. J. Indian Acad. Clin. Med. 7(3), 202–5 (2006)

    Google Scholar 

  36. Rosset, S., Perlich, C., Swirszcz, G., Melville, P., Liu, Y.: Medical data mining: insights from winning two competitions. Data Min. Knowl. Disc. 20(3), 439–468 (2010)

    Article  MathSciNet  Google Scholar 

  37. Sabibullah, M., Shanmugasundaram, V., Priya, R.: Diabetes patients risk through soft computing model. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 2(6), 60–65 (2013)

    Google Scholar 

  38. Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 797–806 (2009)

  39. Torres, S.O., Eicher-Miller, H., Boushey, C., Ebert, D., Maciejewski, R.: Applied Visual Analytics for Exploring the National Health and Nutrition Examination Survey. In: 2012 45th Hawaii Int. Conf. Syst. Sci., pp. 1855–1863 (2012)

  40. Tsanas, A., Little, M.A., Mcsharry, P.E.: A methodology for the analysis of medical data. In: Handbook of Systems and Complexity in Health, pp 113–125. Springer, Berlin (2013)

  41. Visa, G.P., Salembier, P.: Precision-recall-classification evaluation framework: Application to depth estimation on single images, in European Conference on Computer Vision, pp. 648–662 (2014)

  42. Wagner, D.P., Draper, E.A.: Acute physiology and chronic health evaluation (APACHE II) and Medicare reimbursement. Health Care Financing Review 1984 (Suppl), 91 (1984)

    Google Scholar 

  43. Wan, M., Ouyang, Y., Kaplan, L., Han, J.: Graph regularized meta-path based transductive regression in heterogeneous information network. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 918–926 (2015)

  44. Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in Folksonomy. Neural Netw. v58, 111–121 (2014)

    Article  Google Scholar 

  45. Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Zheng, C.: Mining latent user community for tag-based and content-based search in social media. Comput. J. 57(9), 1415–1430 (2014)

    Article  Google Scholar 

  46. Xie, H., Li, X., Wang, T., Chen, L., Li, K., Wang, F.L., Cai, Y., Li, Q., Min, H.: Personalized search for social media via dominating verbal context. Neurocomputing 172(C), 27–37 (2016)

    Article  Google Scholar 

  47. Xu, R., Li, L., Wang, Q.: Risk KB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text. BMC Bioinformatics 15(1), 105 (2014)

    Article  MathSciNet  Google Scholar 

  48. Yang, Y., Loog, M.: Active learning using uncertainty information. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2646–2651 (2016)

  49. Yeh, D.-Y., Cheng, C.-H., Chen, Y.-W.: A predictive model for cerebrovascular disease using data mining. Expert Syst. Appl. 38(7), 8970–8977 (2011)

    Article  Google Scholar 

  50. Yoo, I., et al.: Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36(4), 2431–2448 (2012)

    Article  Google Scholar 

  51. Zhou, X., Menche, J., Barabsi, A.-L., Sharma, A.: Human symptoms disease network. Nat. Commun. 5, 4212 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

The work presented in this manuscript is a significant extensive work of its preliminary form, which is published with a title “Mining Heterogeneous Information Graph for Health Status Classification” in the Proceedings of the 6th International Conference on Behavioral, Economic, and Socio-Cultural Computing, Kaohsiung, Taiwan 12–14 November 2018 (DOI: 10.1109/BESC.2018.8697292). The work is conducted with consent from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H18REA049).

The work is partially supported by the National Natural Science Foundation of China (NSFC 71801217). The authors also like to acknowledge the use of National Health and Nutrition Examination Survey (NHAMES) in the study and specifically, thank Centers for Disease Control and Prevention of the Department of Health and Human Services, United States for making the data set publicly available for research purpose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Tao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tao, X., Pham, T., Zhang, J. et al. Mining health knowledge graph for health risk prediction. World Wide Web 23, 2341–2362 (2020). https://doi.org/10.1007/s11280-020-00810-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00810-1

Keywords

Navigation