Skip to main content

Patient Centric Data Integration for Improved Diagnosis and Risk Prediction

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2019, Poly 2019)

Abstract

A typical biological study includes analysis of heterogeneous biological databases, e.g., genomics, proteomics, metabolomics, and microarray gene expression. These datasets correlate at the patient-level, e.g., decrease in the workload of a group of genes in body cells increases the work of other group and raises the number of their products. Joint analysis of correlated patient-level data sources improves the final diagnosis. State-of-art biological methods, such as differential expression analysis, do not support heterogeneous data source integration and analysis. Recently, scientists in different computational fields have made significant improvements in classical algorithms for data integration to enable investigation of different data types at the same level. Applying these methods on biological data gives more insight into associating diseases with heterogeneous groups of patients. In this paper, we improve upon our previous study and propose the use of a combination of a data reduction technique and similarity network analysis (SNF) as a scalable mechanism for integrating new biological data types. We demonstrated our approach by analyzing the risk factors of Acute Myeloid Leukemia (AML) patients when multiple data sources are presented and uncover new correlations between patients and patient survival time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)

    Article  Google Scholar 

  2. Assenov, Y., Müller, F., Lutsik, P., Walter, J., Lengauer, T., Bock, C.: Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11(11), 1138 (2014)

    Article  Google Scholar 

  3. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier Syst. 34(8), 1–17 (2007)

    Google Scholar 

  4. Dimitrakopoulos, C., et al.: Network-based integration of multi-omics data for prioritizing cancer genes. Bioinformatics 34, 2441–2448 (2018)

    Article  Google Scholar 

  5. Haasdonk, B., Bahlmann, C.: Learning with distance substitution kernels. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 220–227. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_27

    Chapter  Google Scholar 

  6. Hu, Y., Shmygelska, A., Tran, D., Eriksson, N., Tung, J.Y., Hinds, D.A.: GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person. Nat. Commun. 7, 10448 (2016)

    Article  Google Scholar 

  7. Huynh-Thu, V.A., Sanguinetti, G.: Gene regulatory network inference: an introductory survey. In: Sanguinetti, G., Huynh-Thu, V.A. (eds.) Gene Regulatory Networks. MMB, vol. 1883, pp. 1–23. Springer, New York (2019). https://doi.org/10.1007/978-1-4939-8882-2_1

    Chapter  MATH  Google Scholar 

  8. National Cancer Institute: TCGA-LAML. https://portal.gdc.cancer.gov/projects/TCGA-LAML. Accessed 30 May 2019

  9. Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Res. 31(4), e15 (2003)

    Article  Google Scholar 

  10. Jansen, P.R., et al.: Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019)

    Article  Google Scholar 

  11. Jemal, A., Thomas, A., Murray, T., Thun, M., et al.: Cancer statistics, 2002. Ca-A Cancer J. Clin. 52(1), 23–47 (2002)

    Article  Google Scholar 

  12. Jolliffe, I.: Principal Component Analysis. Springer, New York (2011)

    MATH  Google Scholar 

  13. Kodinariya, T.M., Makwana, P.R.: Review on determining number of cluster in k-means clustering. Int. J. 1(6), 90–95 (2013)

    Google Scholar 

  14. Marx, V.: Machine learning, practically speaking. Nat. Methods 16, 463–467 (2019)

    Article  Google Scholar 

  15. Meng, C., Zeleznik, O.A., Thallinger, G.G., Kuster, B., Gholami, A.M., Culhane, A.C.: Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings Bioinform. 17(4), 628–641 (2016)

    Article  Google Scholar 

  16. Moarii, M., Papaemmanuil, E.: Classification and risk assessment in AML: integrating cytogenetics and molecular profiling. Hematol. Am. Soc. Hematol. Educ. Program 2017(1), 37–44 (2017)

    Article  Google Scholar 

  17. Pai, S., Bader, G.D.: Patient similarity networks for precision medicine. J. Mol. Biol. 430(18, Part A), 2924–2938 (2018). Theory and Application of Network Biology Toward Precision Medicine

    Article  Google Scholar 

  18. Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

    Article  Google Scholar 

  19. Samimi, H.: Identification of gene sets that predict acute myeloid leukemia prognosis using integrative gene network analysis. Master’s thesis, Texas State University, August 2018. txi:b4789711

    Google Scholar 

  20. Saultz, J.N., Garzon, R.: Acute myeloid leukemia: a concise review. J. Clin. Med. 5(3), 33 (2016)

    Article  Google Scholar 

  21. Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647 (2010)

    Article  Google Scholar 

  22. Serra, A., Fratello, M., Greco, D., Tagliaferri, R.: Data integration in genomics and systems biology. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 1272–1279. IEEE (2016)

    Google Scholar 

  23. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  24. Wang, B., et al.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333 (2014)

    Article  Google Scholar 

  25. Wanga, B., et al.: SNFtool: similarity network fusion, Published 24 April 2018. https://CRAN.R-project.org/package=SNFtool

  26. Wanichthanarak, K., Fahrmann, J.F., Grapov, D.: Genomic, proteomic, and metabolomic data integration strategies. Biomark. Insights 10s4 (2015)

    Google Scholar 

  27. Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., Hoffman, M.M.: Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Inf. Fusion 50, 71–91 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanie Samimi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samimi, H., Tešić, J., Ngu, A.H.H. (2019). Patient Centric Data Integration for Improved Diagnosis and Risk Prediction. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33752-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33751-3

  • Online ISBN: 978-3-030-33752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics