Patient Centric Data Integration for Improved Diagnosis and Risk Prediction

Samimi, Hanie; Tešić, Jelena; Ngu, Anne Hee Hiong

doi:10.1007/978-3-030-33752-0_13

Hanie Samimi¹⁵,
Jelena Tešić¹⁵ &
Anne Hee Hiong Ngu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11721))

Included in the following conference series:

1087 Accesses

Abstract

A typical biological study includes analysis of heterogeneous biological databases, e.g., genomics, proteomics, metabolomics, and microarray gene expression. These datasets correlate at the patient-level, e.g., decrease in the workload of a group of genes in body cells increases the work of other group and raises the number of their products. Joint analysis of correlated patient-level data sources improves the final diagnosis. State-of-art biological methods, such as differential expression analysis, do not support heterogeneous data source integration and analysis. Recently, scientists in different computational fields have made significant improvements in classical algorithms for data integration to enable investigation of different data types at the same level. Applying these methods on biological data gives more insight into associating diseases with heterogeneous groups of patients. In this paper, we improve upon our previous study and propose the use of a combination of a data reduction technique and similarity network analysis (SNF) as a scalable mechanism for integrating new biological data types. We demonstrated our approach by analyzing the risk factors of Acute Myeloid Leukemia (AML) patients when multiple data sources are presented and uncover new correlations between patients and patient survival time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)
Article Google Scholar
Assenov, Y., Müller, F., Lutsik, P., Walter, J., Lengauer, T., Bock, C.: Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11(11), 1138 (2014)
Article Google Scholar
Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier Syst. 34(8), 1–17 (2007)
Google Scholar
Dimitrakopoulos, C., et al.: Network-based integration of multi-omics data for prioritizing cancer genes. Bioinformatics 34, 2441–2448 (2018)
Article Google Scholar
Haasdonk, B., Bahlmann, C.: Learning with distance substitution kernels. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 220–227. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_27
Chapter Google Scholar
Hu, Y., Shmygelska, A., Tran, D., Eriksson, N., Tung, J.Y., Hinds, D.A.: GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person. Nat. Commun. 7, 10448 (2016)
Article Google Scholar
Huynh-Thu, V.A., Sanguinetti, G.: Gene regulatory network inference: an introductory survey. In: Sanguinetti, G., Huynh-Thu, V.A. (eds.) Gene Regulatory Networks. MMB, vol. 1883, pp. 1–23. Springer, New York (2019). https://doi.org/10.1007/978-1-4939-8882-2_1
Chapter MATH Google Scholar
National Cancer Institute: TCGA-LAML. https://portal.gdc.cancer.gov/projects/TCGA-LAML. Accessed 30 May 2019
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Res. 31(4), e15 (2003)
Article Google Scholar
Jansen, P.R., et al.: Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019)
Article Google Scholar
Jemal, A., Thomas, A., Murray, T., Thun, M., et al.: Cancer statistics, 2002. Ca-A Cancer J. Clin. 52(1), 23–47 (2002)
Article Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, New York (2011)
MATH Google Scholar
Kodinariya, T.M., Makwana, P.R.: Review on determining number of cluster in k-means clustering. Int. J. 1(6), 90–95 (2013)
Google Scholar
Marx, V.: Machine learning, practically speaking. Nat. Methods 16, 463–467 (2019)
Article Google Scholar
Meng, C., Zeleznik, O.A., Thallinger, G.G., Kuster, B., Gholami, A.M., Culhane, A.C.: Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings Bioinform. 17(4), 628–641 (2016)
Article Google Scholar
Moarii, M., Papaemmanuil, E.: Classification and risk assessment in AML: integrating cytogenetics and molecular profiling. Hematol. Am. Soc. Hematol. Educ. Program 2017(1), 37–44 (2017)
Article Google Scholar
Pai, S., Bader, G.D.: Patient similarity networks for precision medicine. J. Mol. Biol. 430(18, Part A), 2924–2938 (2018). Theory and Application of Network Biology Toward Precision Medicine
Article Google Scholar
Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)
Article Google Scholar
Samimi, H.: Identification of gene sets that predict acute myeloid leukemia prognosis using integrative gene network analysis. Master’s thesis, Texas State University, August 2018. txi:b4789711
Google Scholar
Saultz, J.N., Garzon, R.: Acute myeloid leukemia: a concise review. J. Clin. Med. 5(3), 33 (2016)
Article Google Scholar
Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647 (2010)
Article Google Scholar
Serra, A., Fratello, M., Greco, D., Tagliaferri, R.: Data integration in genomics and systems biology. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 1272–1279. IEEE (2016)
Google Scholar
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Wang, B., et al.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333 (2014)
Article Google Scholar
Wanga, B., et al.: SNFtool: similarity network fusion, Published 24 April 2018. https://CRAN.R-project.org/package=SNFtool
Wanichthanarak, K., Fahrmann, J.F., Grapov, D.: Genomic, proteomic, and metabolomic data integration strategies. Biomark. Insights 10s4 (2015)
Google Scholar
Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., Hoffman, M.M.: Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Inf. Fusion 50, 71–91 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Texas State University, San Marcos, TX, 78666, USA
Hanie Samimi, Jelena Tešić & Anne Hee Hiong Ngu

Authors

Hanie Samimi
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Tešić
View author publications
You can also search for this author in PubMed Google Scholar
Anne Hee Hiong Ngu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanie Samimi .

Editor information

Editors and Affiliations

Massachusetts Institute of Technology, Lexington, MA, USA
Vijay Gadepally
Intel Corporation, Portland, OR, USA
Timothy Mattson
Massachusetts Institute of Technology, Cambridge, MA, USA
Michael Stonebraker
Stony Brook University, Stony Brook, NY, USA
Fusheng Wang
University of Washington, Seattle, WA, USA
Gang Luo
Google, Mountain View, CA, USA
Yanhui Laing
Lucerne University of Applied Sciences, Rotkreuz, Switzerland
Alevtina Dubovitskaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Samimi, H., Tešić, J., Ngu, A.H.H. (2019). Patient Centric Data Integration for Improved Diagnosis and Risk Prediction. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-33752-0_13
Published: 23 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33751-3
Online ISBN: 978-3-030-33752-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics