Abstract
Precision medicine and omics technologies play a critical role in the development and implementation of existing biostatistics and bioinformatics approaches. Investments in the development of data mining algorithms and their application to health data are increasing rapidly. It is possible to reveal the relationships between omic levels in order to evaluate all the omic data obtained in the field of biostatistics and bioinformatics, to determine the biomarkers for the diagnosis and treatment of diseases, to provide detailed disease pathophysiology and personalized treatment options. However, there are many challenges to overcome in identifying relationships in big data, such as high dimensionality, heterogeneity, and highly correlated genomic features. Numerous methods using statistics, machine learning and artificial intelligence tools to address these issues are currently being developed and investigated for omic data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Asab MS, Chaouchi M, Alesci S, Galli S, Laassri M, Cheema AK, Atouf F, VanMeter J, Amri H (2011) Biomarkers in the age of omics: time for a systems biology approach. OMICS 15(3):105–112. https://doi.org/10.1089/omi.2010.0023
Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422(6928):198–207. https://doi.org/10.1038/nature01511
Aggio R, Villas-Boas SG, Ruggiero K (2011) Metab: an R package for high-throughput analysis of metabolomics data generated by GC-MS. Bioinformatics 27(16):2316–2318. https://doi.org/10.1093/bioinformatics/btr379
Ahmed Z (2020) Practicing precision medicine with intelligently integrative clinical and multi-omics data analysis. Hum Genom 14(1):35. https://doi.org/10.1186/s40246-020-00287-z
Ahsan MM, Luna SA, Siddique Z (2022) Machine-learning-based disease diagnosis: a comprehensive review. Healthcare 10(3):541. https://doi.org/10.3390/healthcare10030541
Aibar S, Fontanillo C, Droste C, Roson-Burgo B, Campos-Laborie FJ, Hernandez-Rivas JM, De Las Rivas J (2015) Analyse multiple disease subtypes and build associated gene networks using genome-wide expression profiles. BMC Genomics 16(S5):S3. https://doi.org/10.1186/1471-2164-16-s5-s3
Alakwaa FM, Chaudhary K, Garmire LX (2018) Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J Proteome Res 17(1):337–347. https://doi.org/10.1021/acs.jproteome.7b00595
Ali SM, Hoemann MZ, Aubé J, Georg GI, Mitscher LA, Jayasinghe LR (1997) Butitaxel analogues: synthesis and structure–activity relationships. J Med Chem 40(2):236–241. https://doi.org/10.1021/jm960505t
Alonso A, Marsal S, Julià A (2015) Analytical methods in untargeted metabolomics: state of the art in 2015. Front Bioeng Biotechnol 3:23. https://doi.org/10.3389/fbioe.2015.00023
Alyass A, Turcotte M, Meyre D (2015) From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genom 8:33
Amiour N, Merlino M, Leroy P, Branlard G (2002) Proteomic analysis of amphiphilic proteins of hexaploid wheat kernels. Proteomics 2(6):632–641. https://doi.org/10.1002/1615-9861(200206)2:6<632::AID-PROT632>3.0.CO;2-M
Anaissi A, Goyal M, Catchpoole DR, Braytee A, Kennedy PJ (2016) Ensemble feature learning of genomic data using support vector machine. PLoS One 11(6):e0157330. https://doi.org/10.1371/journal.pone.0157330
Anders S, Huber W (2012) Differential expression of RNA-Seq data at the gene level–the DESeq package. European Molecular Biology Laboratory (EMBL), Heidelberg
Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data
Armitage EG, Ciborowski M (2017) Applications of metabolomics in cancer studies. Adv Exp Med Biol 965:209–234. https://doi.org/10.1007/978-3-319-47656-8_9
Aydın M, Kryvoruchko IS, Şakiroğlu M (2019) widgetcon: a website and program for quick conversion among common population genetic data formats. Mol Ecol Resour 19(5):1374–1377
Azad RK, Shulaev V (2019) Metabolomics technology and bioinformatics for precision medicine. Brief Bioinform 20(6):1957–1971. https://doi.org/10.1093/bib/bbx170
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477
Baraldi E, Carraro S, Giordano G, Reniero F, Perilongo G, Zacchello F (2009) Metabolomics: moving towards personalized medicine. Ital J Pediatr 35(1):30. https://doi.org/10.1186/1824-7288-35-30
Beale DJ, Karpe AV, Ahmed W (2016) Beyond metabolomics: a review of multi-omics-based approaches. In: Beale D, Kouremenos K, Palombo E (eds) Microbial metabolomics. Springer, Cham, pp 289–312
Beale DJ, Pinu FR, Kouremenos KA, Poojary MM, Narayana VK, Boughton BA, Kanojia K, Dayalan S, Jones OAH, Dias DA (2018) Review of recent developments in GC–MS approaches to metabolomics-based research. Metabolomics 14(11):152. https://doi.org/10.1007/s11306-018-1449-2
Bekri S (2016) The role of metabolomics in precision medicine. Exp Rev Prec Med Drug Dev. https://doi.org/10.1080/23808993.2016.1273067
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin C, Chen J, Goodlett D, Whiteaker J, Paulovich A, McIntosh M (2006) A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 22(15):1902–1909
Beranova-Giorgianni S (2003) Proteome analysis by two-dimensional gel electrophoresis and mass spectrometry: strengths and limitations. TrAC Trends Anal Chem 22(5):273–281. https://doi.org/10.1016/s0165-9936(03)00508-9
Berman HM (2000) The protein data bank. Nucleic Acids Res 28(1):235–242. https://doi.org/10.1093/nar/28.1.235
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA (2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28(10):1045–1048. https://doi.org/10.1038/nbt1010-1045
Bielow C, Mastrobuoni G, Kempa S (2016) Proteomics quality control: quality control software for MaxQuant results. J Proteome Res 15(3):777–787
Bird A (2007) Perceptions of epigenetics. Nature 447(7143):396–398. https://doi.org/10.1038/nature05913
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) mlr: Machine learning in R. J Mach Learn Res 17:1–5
Bøvelstad HM, Nygård S, Borgan Ø (2009) Survival prediction from clinico-genomic models - a comparative study. BMC Bioinform 10(1):413. https://doi.org/10.1186/1471-2105-10-413
Bravo-Merodio L, Williams JA, Gkoutos GV, Acharjee A (2019) Omics biomarker identification pipeline for translational medicine. J Transl Med 17(1):155. https://doi.org/10.1186/s12967-019-1912-5
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Erratum: near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(8):888. https://doi.org/10.1038/nbt0816-888d. Erratum for: Nat Biotechnol. 34(5):525-527. doi:10.1038/nbt0816-888d
Brown KR, Jurisica I (2005) Online predicted human interaction database. Bioinformatics 21(9):2076–2082. https://doi.org/10.1093/bioinformatics/bti273
Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, Tolstoy I, Tatusova T, Pruitt KD, Maglott DR, Murphy TD (2015) Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 43(Database Issue):D36–D42. https://doi.org/10.1093/nar/gku1055
Budak ŞÖ, Dönmez S (2012) Novel omics technologies in food science. J Food 37(3):173–179
Buenrostro JD, Wu B, Chang HY, Greenleaf WJ (2015) ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol 109:21.29.1–21.29.9. https://doi.org/10.1002/0471142727.mb2129s109
Cai Z, Xu D, Zhang Q, Zhang J, Ngai S-M, Shao J (2015) Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol BioSyst 11(3):791–800. https://doi.org/10.1039/c4mb00659c
Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF, Lisacek F, Packer NH (2013) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 42(D1):D215–D221. https://doi.org/10.1093/nar/gkt1128
Cao Y, Charisi A, Cheng LC, Jiang T, Girke T (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24(15):1733–1734. https://doi.org/10.1093/bioinformatics/btn307
Cao DS, Xiao N, Xu QS, Chen AF (2014) Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31(2):279–281. https://doi.org/10.1093/bioinformatics/btu624
Carbonaro M (2004) Proteomics: present and future in food quality evaluation. Trends Food Sci Technol 15(3–4):209–216. https://doi.org/10.1016/j.tifs.2003.09.020
Chadwick LH, Sawa A, Yang IV, Baccarelli A, Breakefield XO, Deng H-W, Dolinoy DC, Fallin MD, Holland NT, Houseman EA, Lomvardas S, Rao M, Satterlee JS, Tyson FL, Vijayanand P, Greally JM (2015) New insights and updated guidelines for epigenome-wide association studies. Neuroepigenetics 1:14–19. https://doi.org/10.1016/j.nepig.2014.10.004
Chakraborty S, Hosen MI, Ahmed M, Shekhar HU (2018) Onco-multi-OMICS approach: a new frontier in cancer research. Biomed Res Int 2018:1–14. https://doi.org/10.1155/2018/9836256
Charlab R, Zhang L (2013) Pharmacogenomics: historical perspective and current status. Methods Mol Biol 1015:3–22. https://doi.org/10.1007/978-1-62703-435-7_1
Chatr-aryamontri A, Breitkreutz B-J, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O’Donnell L, Reguly T, Nixon J, Ramage L, Winter A, Sellam A, Chang C, Hirschman J, Theesfeld C, Rust J, Livstone MS (2015) The BioGRID interaction database: 2015 update. Nucleic Acids Res 43(D1):D470–D478. https://doi.org/10.1093/nar/gku1204
Chaudhary K, Poirion OB, Lu L, Garmire LX (2018) Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24(6):1248–1259. https://doi.org/10.1158/1078-0432.ccr-17-0853
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuźmin VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010. https://doi.org/10.1021/jm4004285
Ching T, Zhu X, Garmire LX (2018) Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol 14(4):e1006076. https://doi.org/10.1371/journal.pcbi.1006076
Cho G, Yim J, Choi Y, Ko J, Lee SH (2019) Review of machine learning algorithms for diagnosing mental illness. Psychiatry Investig 16(4):262–269. https://doi.org/10.30773/pi.2018.12.21.2
Costello JC, Heiser LM, Georgii E, Gönen M, Menden MP, Wang NJ, Bansal M, Ammad-ud-din M, Hintsanen P, Khan SA, Mpindi J-P, Kallioniemi O, Honkela A, Aittokallio T, Wennerberg K, Collins JJ, Gallahan D, Singer D, Saez-Rodriguez J, Kaski S (2014) A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol 32(12):1202–1212. https://doi.org/10.1038/nbt.2877
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26(12):1367–1372. https://doi.org/10.1038/nbt.1511
Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E (2014) The Reactome pathway knowledgebase. Nucleic Acids Res 42(D1):D472–D477. https://doi.org/10.1093/nar/gkt1102
Cun Y, Frohlich H (2014) netClass: an R-package for network based, integrative biomarker signature discovery. Bioinformatics 30(9):1325–1326. https://doi.org/10.1093/bioinformatics/btu025
van Dam S, Craig T, de Magalhães JP (2015) GeneFriends: a human RNA-seq-based gene and transcript co-expression database. Nucleic Acids Res 43(D1):D1124–D1132. https://doi.org/10.1093/nar/gku1042
Dara S, Dhamercherla S, Jadav SS, Babu CM, Ahsan MJ (2022) Machine learning in drug discovery: a review. Artif Intell Rev 55:1947–1999. https://doi.org/10.1007/s10462-021-10058-4
Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):1. https://doi.org/10.1186/s40537-019-0217-0
Davis AP, Wiegers TC, Johnson RJ, Sciaky D, Wiegers J, Mattingly C (2022) Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res 51:D1257. https://doi.org/10.1093/nar/gkac833
De Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9:497. https://doi.org/10.1186/1471-2105-9-497
De Souza FSH, Hojo-Souza NS, Dos Santos EB, Da Silva CM, Guidoni DL (2021) Predicting the disease outcome in COVID-19 positive patients through machine learning: a retrospective cohort study with Brazilian data. Front Artif Intell 4:579931. https://doi.org/10.3389/frai.2021.579931
Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O’Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D, Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, Blatter MC, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A, Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, James J, Jimenez S, Jungo F, Keller G, Lemercier P, Lieberherr D, Masson P, Moinat M, Pedruzzi I, Poux S, Rivoire C, Roechert B, Schneider M, Stutz A, Sundaram S, Tognolli M, Bougueleret L, Argoud-Puy G, Cusin I, Duek-Roggli P, Xenarios I, Apweiler R (2011) The UniProt-GO annotation database in 2011. Nucleic Acids Res 40(D1):D565–D570. https://doi.org/10.1093/nar/gkr1048
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635
Dohmen E, Kremer LP, Bornberg-Bauer E, Kemena C (2016) DOGMA: domain-based transcriptome and proteome quality assessment. Bioinformatics 32(17):2577–2581
Duarte T, Spencer C (2016) Personalized proteomics: the future of precision medicine. Proteomes 4(4):29. https://doi.org/10.3390/proteomes4040029
Durmuşçelebi A (2019) Novel statistical approaches in clustering RNA-sequencing data, Erciyes University School of Medicine, Kayseri, Turkey. https://tez.yok.gov.tr/UlusalTezMerkezi/tezDetay.jsp?id=2pZmRB_VOVad_nKYWE9hbA&no=y2v9vd_e5TfI_1JzX78ang
Edgar R (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210. https://doi.org/10.1093/nar/30.1.207
El Amrani K (2022) sampleClassifier: Sample Classifier. R package version 1.22.0
El Bouhaddani S, Uh H-W, Jongbloed G, Hayward C, Klarić L, Kiełbasa SM, Houwing-Duistermaat J (2018) Integrating omics datasets with the OmicsPLS package. BMC Bioinform 19(1):371. https://doi.org/10.1186/s12859-018-2371-3
ENCODE Project Consortium (2004) The encode (encyclopedia of DNA elements) project. Science 306:636–640
Eraslan G, Avsec Ž, Gagneur J, Theis FJ (2019) Deep learning: new computational modelling techniques for genomics. Nat Rev Genet 20(7):389–403. https://doi.org/10.1038/s41576-019-0122-6
Fan Y, Zhang S, Ma S (2022) Survival analysis with high-dimensional omics data using a threshold gradient descent regularization-based neural network approach. Genes 13(9):1674. https://doi.org/10.3390/genes13091674
Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 9(1):1–16. https://doi.org/10.4236/jilsa.2017.91001
Feng X, Grossman R, Stein L (2011) PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinform 12(1):139. https://doi.org/10.1186/1471-2105-12-139
Fondi M, Liò P (2015) Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 171:52–64. https://doi.org/10.1016/j.micres.2015.01.003
García-Alcalde F, García-López F, Dopazo J, Conesa A (2011) Paintomics: a web based tool for the joint visualization of transcriptomics and metabolomics data. Bioinformatics 27(1):137–139. https://doi.org/10.1093/bioinformatics/btq594
Gaudet P, Michel P-A, Zahn-Zabal M, Cusin I, Duek PD, Evalet O, Gateau A, Gleizes A, Pereira M, Teixeira D, Zhang Y, Lane L, Bairoch A (2015) The neXtProt knowledgebase on human proteins: current status. Nucleic Acids Res 43(D1):D764–D770. https://doi.org/10.1093/nar/gku1178
Gligorijević V, Malod-Dognin N, Pržulj N (2016) Integrative methods for analyzing big data in precision medicine. Proteomics 16(5):741–758. https://doi.org/10.1002/pmic.201500396
Goksuluk D, Zararsiz G, Korkmaz S, Eldem V, Zararsiz GE, Ozcetin E, Ozturk A, Karaagaoglu AE (2019) MLSeq: machine learning interface for RNA-sequencing data. Comput Methods Prog Biomed 175:223–231. https://doi.org/10.1016/j.cmpb.2019.04.007
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537. https://doi.org/10.1126/science.286.5439.531
van Gool AJ, Bietrix F, Caldenhoven E, Zatloukal K, Scherer A, Litton J-E, Meijer G, Blomberg N, Smith A, Mons B, Heringa J, Koot W-J, Smit MJ, Hajduch M, Rijnders T, Ussi A (2017) Bridging the translational innovation gap through good biomarker practice. Nat Rev Drug Discov 16(9):587–588. https://doi.org/10.1038/nrd.2017.72
Graves PR, Haystead TAJ (2002) Molecular biologist’s guide to proteomics. Microbiol Mol Biol Rev 66(1):39–63. https://doi.org/10.1128/mmbr.66.1.39-63.2002
Grimplet J, Cramer GR, Dickerson JA, Mathiason K, Van Hemert J, Fennell AY (2009) VitisNet: “Omics” integration through grapevine molecular networks. PLoS One 4(12):e8365. https://doi.org/10.1371/journal.pone.0008365
Groeneveld CS, Chagas VS, Jones SJM, Robertson AG, Ponder BAJ, Meyer KB, Castro MAA (2019) RTNsurvival: an R/Bioconductor package for regulatory network survival analysis. Bioinformatics 35(21):4488–4489. https://doi.org/10.1093/bioinformatics/btz229
GuhaThakurta D, Sheikh NA, Meagher TC, Letarte S, Trager JB (2013) Applications of systems biology in cancer immunotherapy: from target discovery to biomarkers of clinical outcome. Expert Rev Clin Pharmacol 6(4):387–401. https://doi.org/10.1586/17512433.2013.811814
Günther OP, Shin H, Ng RT, McMaster WR, McManus BM, Keown PA, Tebbutt SJ, Lê Cao K-A (2014) Novel multivariate methods for integration of genomics and proteomics data: applications in a kidney transplant rejection study. Omics: J Integr Biol 18(11):682–695
Guo Y, Mahony S, Gifford DK (2012) High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol 8(8):e1002638. https://doi.org/10.1371/journal.pcbi.1002638
Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8):1072–1075
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512
Haberman Y, Tickle TL, Dexheimer PJ, Kim M-O, Tang D, Karns R, Baldassano RN, Noe JD, Rosh J, Markowitz J, Heyman MB, Griffiths AM, Crandall WV, Mack DR, Baker SS, Huttenhower C, Keljo DJ, Hyams JS, Kugathasan S, Walters TD (2014) Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. J Clin Invest 124(8):3617–3633. https://doi.org/10.1172/JCI75436
Hale EJ (2003) Application of proteomics for discovery of protein biomarkers. Brief Funct Genom Proteom 2(3):185–193. https://doi.org/10.1093/bfgp/2.3.185
Hamzeh O, Rueda L (2019) A gene-disease-based machine learning approach to identify prostate cancer biomarkers. In: Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. Association for Computing Machinery, New York, NY, pp 633–638
Hannon Lab (2015) http://hannonlab.cshl.edu/fastx_toolkit/index.html. Accessed 3 Jun 2015
Harmanci A, Rozowsky J, Gerstein M (2014) MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework. Genome Biol 15(10):474. https://doi.org/10.1186/s13059-014-0474-3
Hartl D, de Luca V, Kostikova A, Laramie J, Kennedy S, Ferrero E, Siegel R, Fink M, Ahmed S, Millholland J, Schuhmacher A, Hinder M, Piali L, Roth A (2021) Translational precision medicine: an industry perspective. J Transl Med 19(1):245. https://doi.org/10.1186/s12967-021-02910-6
Hasanzad M, Sarhangi N, Ehsani Chimeh S, Ayati N, Afzali M, Khatami F, Nikfar S, Aghaei Meybodi HR (2021) Precision medicine journey through omics approach. J Diab Metab Disord 21(1):881–888. https://doi.org/10.1007/s40200-021-00913-0
Hashimoto K, Goto S, Kawano S, Aoki-Kinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M (2006) KEGG as a glycome informatics resource. Glycobiology 16(5):63R–70R. https://doi.org/10.1093/glycob/cwj010
Hasin Y, Seldin M, Lusis A (2017) Multi-omics approaches to disease. Genome Biol 18(1):83. https://doi.org/10.1186/s13059-017-1215-1
Higdon R, Haynes W, Stanberry L, Stewart E, Yandl G, Howard C, Broomall W, Kolker N, Kolker E (2013) Unraveling the complexities of life sciences data. Big Data 1(1):42–50. https://doi.org/10.1089/big.2012.1505. Epub 2012 Nov 7. PMID: 27447037
Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J (2019) Machine learning SNP based prediction for precision medicine. Front Genet 10:267. https://doi.org/10.3389/fgene.2019.00267
Hockings JK, Pasternak AL, Erwin AL, Mason NT, Eng C, Hicks JK (2020) Pharmacogenomics: an evolving clinical tool for precision medicine. Cleve Clin J Med 87(2):91–99. https://doi.org/10.3949/ccjm.87a.19073
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W (2018) Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics 15(1):41–51. https://doi.org/10.21873/cgp.20063. PMID: 29275361; PMCID: PMC5822181
Ihlenfeldt WD, Takahashi Y, Abe H, Sasaki S (1994) Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Model 34(1):109–116. https://doi.org/10.1021/ci00017a013
Jain KK (2004) Role of oncoproteomics in the personalized management of cancer. Exp Rev Proteom 1(1):49–55. https://doi.org/10.1586/14789450.1.1.49
Jaitly N, Mayampurath A, Littlefield K, Adkins JN, Anderson GA, Smith RD (2009) Decon2LS: an open-source software package for automated processing and visualization of high resolution mass spectrometry data. BMC Bioinform 10(1):1–15
Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, Djoumbou Y, Liu Y, Deng L, Guo AC, Han B, Pon A, Wilson M, Rafatnia S, Liu P, Wishart DS (2014) SMPDB 2.0: big improvements to the small molecule pathway database. Nucleic Acids Res 42(D1):D478–D484. https://doi.org/10.1093/nar/gkt1067
Jia Z, Liu Y, Guan N, Bo X, Luo Z, Barnes MR (2016) Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery. BMC Genomics 17(1):414. https://doi.org/10.1186/s12864-016-2737-8
Joyce AR, Palsson BØ (2006) The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 7(3):198–210. https://doi.org/10.1038/nrm1857
Jung D (2022) DeepPINCS: protein Interactions and Networks with Compounds based on Sequences using Deep Learning. R package version 1.6.0
Kaissis G, Ziegelmayer S, Lohöfer F, Steiger K, Algül H, Muckenhuber A, Yen H-Y, Rummeny E, Friess H, Schmid R, Weichert W, Siveke JT, Braren R (2019) A machine learning algorithm predicts molecular subtypes in pancreatic ductal adenocarcinoma with differential response to gemcitabine-based versus FOLFIRINOX chemotherapy. PLoS One 14(10):e0218642. https://doi.org/10.1371/journal.pone.0218642
Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, Najarian K, Athey BD (2018) Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics 19(7):629–650. https://doi.org/10.2217/pgs-2018-0008
Karnovsky A, Weymouth T, Hull T, Tarcea VG, Scardoni G, Laudanna C, Sartor MA, Stringer KA, Jagadish HV, Burant C, Athey B, Omenn GS (2012) Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics 28(3):373–380. https://doi.org/10.1093/bioinformatics/btr661
Kaur P, Singh A, Chana I (2021) Computational techniques and tools for omics data analysis: state-of-the-art, challenges, and future directions. Archiv Comput Methods Eng 28:4595–4631. https://doi.org/10.1007/s11831-021-09547-0
Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B (2011) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(D1):D841–D846. https://doi.org/10.1093/nar/gkr1088
Keshavarzi Arshadi A, Webb J, Salem M, Cruz E, Calad-Thomson S, Ghadirian N, Collins J, Diez-Cecilia E, Kelly B, Goodarzi H, Yuan JS (2020) Artificial intelligence for COVID-19 drug discovery and vaccine development. Front Artif Intell 3:65. https://doi.org/10.3389/frai.2020.00065
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):1–13
Kim D, Langmead B, Salzberg SL (2015a) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–360. https://doi.org/10.1038/nmeth.3317
Kim S, Herazo-Maya JD, Kang DD, Juan-Guardela BM, Tedrow J, Martinez FJ, Sciurba FC, Tseng GC, Kaminski N (2015b) Integrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes. BMC Genomics 16(1):924. https://doi.org/10.1186/s12864-015-2170-4
Kim T, Tang O, Vernon ST, Kott KA, Koay YC, Park J, James DE, Grieve SM, Speed TP, Yang P, Figtree GA, O’Sullivan JF, Yang JYH (2020) hRUV: hierarchical approach to removal of unwanted variation for large-scale metabolomics data. bioRxiv
Koçhan N, Tutuncu GY, Smyth GK, Gandolfo LC, Giner G (2019) qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data. PeerJ 7:e8260. https://doi.org/10.7717/peerj.8260
Kochan N, Tütüncü GY, Giner G (2021) A new local covariance matrix estimation for the classification of gene expression profiles in high dimensional RNA-Seq data. Expert Syst Appl 167:114200. https://doi.org/10.1016/j.eswa.2020.114200
König IR, Fuchs O, Hansen G, von Mutius E, Kopp MV (2017) What is precision medicine? Eur Respir J 50(4):1700391. https://doi.org/10.1183/13993003.00391
Kouřil Š, de Sousa J, Václavík J, Friedecký D, Adam T (2020) CROP: correlation-based reduction of feature multiplicities in untargeted metabolomic data. Bioinformatics 36(9):2941–2942
Kraus VB (2018) Biomarkers as drug development tools: discovery, validation, qualification and use. Nat Rev Rheumatol 14(6):354–362. https://doi.org/10.1038/s41584-018-0005-9
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26
Kuhn M, Letunic I, Jensen LJ, Bork P (2015) The SIDER database of drugs and side effects. Nucleic Acids Res 44(D1):D1075–D1079. https://doi.org/10.1093/nar/gkv1075
Kumar V, Muratani M, Rayan NA, Kraus P, Lufkin T, Ng HH, Prabhakar S (2013) Uniform, optimal signal processing of mapped deep-sequencing data. Nat Biotechnol 31(7):615–622. https://doi.org/10.1038/nbt.2596
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu Y-C, Pfenning AR (2015) Integrative analysis of 111 reference human epigenomes. Nature 518(7539):317–330. https://doi.org/10.1038/nature14248
Kutmon M, van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, Evelo CT (2015) PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol 11(2):e1004085. https://doi.org/10.1371/journal.pcbi.1004085
Kwon MS, Kim Y, Lee S, Namkung J, Yun T, Yi SG, Han S, Kang M, Kim SW, Jang JY, Park T (2015) Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer. BMC Genomics 16(9):1–10
Lamb J (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313(5795):1929–1935. https://doi.org/10.1126/science.1132939
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinform 9(1):559. https://doi.org/10.1186/1471-2105-9-559
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359. https://doi.org/10.1038/nmeth.1923
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. https://doi.org/10.1186/gb-2009-10-3-r25
Lederberg J, Mccray AT (2001) Ome sweet ‘omics--a genealogical treasury of word. Scientist 15(7):8
Lee S, Lim H (2019) Review of statistical methods for survival analysis using genomic data. Genom Inform 17(4):e41. https://doi.org/10.5808/GI.2019.17.4.e41
Leung MKK, Delong A, Alipanahi B, Frey BJ (2016) Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE 104(1):176–197. https://doi.org/10.1109/jproc.2015.2494198
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Li B, Shin H, Gulbekyan G, Pustovalova O, Nikolsky Y, Hope A, Trepicchio WL (2015) Develop a drug-response modelling framework to identify cell line-derived translational biomarkers that can predict treatment outcomes to erlotinib or sorafenib. PLoS One 10(6):e0130700
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930
Lin E, Lane H-Y (2017) Machine learning and systems genomics approaches for multi-omics data. Biomark Res 5(1):2. https://doi.org/10.1186/s40364-017-0082-y
Lischer HE, Excoffier L (2012) PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics 28(2):298–299
Lo YC, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010
Long N, Park S, Anh N, Nghi T, Yoon S, Park J, Lim J, Kwon S (2019) High-throughput omics and statistical learning integration for the discovery and validation of novel diagnostic signatures in colorectal cancer. Int J Mol Sci 20(2):296. https://doi.org/10.3390/ijms20020296
Low SK, Zembutsu H, Nakamura Y (2017) Breast cancer: the translation of big genomic data to cancer precision medicine. Cancer Sci 109:497–506
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(1):2047–217X
MacEachern SJ, Forkert ND (2021) Machine learning for precision medicine. Genome 64:416–425. https://doi.org/10.1139/gen-2020-0131
Maksimovic J, Phipson B, Oshlack A (2016) A cross-package Bioconductor workflow for analysing methylation array data. F1000Research 5:1281. https://doi.org/10.12688/f1000research.8839.1
Malgerud L, Lindberg J, Wirta V, Gustafsson-Liljefors M, Karimi M, Moro CF, Stecker K, Picker A, Huelsewig C, Stein M, Bohnert R, Del Chiaro M, Haas SL, Heuchel RL, Permert J, Maeurer MJ, Brock S, Verbeke CS, Engstrand L, Jackson DB (2017) Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer. Mol Oncol 11(10):1413–1429. https://doi.org/10.1002/1878-0261.12108
Mallavarapu T, Hao J, Kim Y, Oh JH, Kang M (2019) Pathway-based deep clustering for molecular subtyping of cancer. Methods 173:24–31. https://doi.org/10.1016/j.ymeth.2019.06.017
Manchanda N, Portwood JL, Woodhouse MR, Seetharam AS, Lawrence-Dill CJ, Andorf CM, Hufford MB (2020) GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21(1):1–9
Mancinelli L, Cronin M, Sadée W (2000) Pharmacogenomics: the promise of personalized medicine. AAPS Pharm Sci 2(1):29–41. https://doi.org/10.1208/ps020104
Manoukis NC (2007) FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis. Mol Ecol Notes 7(4):592–593
Mar J, Gentleman R, Carey V (2008) MLInterfaces: uniform interfaces to R machine learning procedures for data in Bioconductor containers. R package version. 1.24.0. http://www.bioconductor.org
Marco-Ramell A, Palau-Rodriguez M, Alay A, Tulipani S, Urpi-Sarda M, Sanchez-Pla A, Andres-Lacueva C (2018) Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. BMC Bioinform 19(1):1. https://doi.org/10.1186/s12859-017-2006-0
Mayo Clinic (2018) Precision medicine and pharmacogenomics. https://www.mayoclinic.org/healthy-lifestyle/consumer-health/indepth/personalized-medicine/art-20044300. Accessed 5 Oct 2022
McGuire AL, Gabriel S, Tishkoff SA, Wonkam A, Chakravarti A, Furlong EEM, Treutlein B, Meissner A, Chang HY, López-Bigas N, Segal E, Kim J-S (2020) The road ahead in genetics and genomics. Nat Rev Genet 21(10):581–596. https://doi.org/10.1038/s41576-020-0272-6
McLean C, Kujawinski EB (2020) AutoTuner: high fidelity and robust parameter selection for metabolomics data processing. Anal Chem 92(8):5724–5732
Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ, Johnson R, Segre AV, Djebali S, Niarchou A, Consortium TG, Wright FA, Lappalainen T, Calvo M, Getz G, Dermitzakis ET (2015) The human transcriptome across tissues and individuals. Science 348(6235):660–665. https://doi.org/10.1126/science.aaa0355
Mensaert K, Denil S, Trooskens G, Van Criekinge W, Thas O, De Meyer T (2013) Next-generation technologies and data analytical approaches for epigenomics. Environ Mol Mutagen 55(3):155–170. https://doi.org/10.1002/em.21841
Mo Q, Shen R (2023) iClusterPlus: integrative clustering of multi-type genomic data. R package version 1.34.3
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118
Morgan M, Anders S, Lawrence M, Aboyoun P, Pages H, Gentleman R (2009) ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics 25(19):2607–2608
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628. https://doi.org/10.1038/nmeth.1226
Naithani N, Sinha S, Misra P, Vasudevan B, Sahu R (2021) Precision medicine: concept and tools. Med J Arm Forc India 77(3):249–257. https://doi.org/10.1016/j.mjafi.2021.06.021
Nakatani K, Nobori T (2013) Pharmacogenomics. Rinsho Byori 61(11):1018–1025
Neumann JM, Freitag H, Hartmann JS, Niehaus K, Galanis M, Griesshammer M, Kellner U, Bednarz H (2022) Subtyping non-small cell lung cancer by histology-guided spatial metabolomics. J Cancer Res Clin Oncol 148(2):351–360. https://doi.org/10.1007/s00432-021-03834-w
Nguyen T, Tagett R, Diaz D, Draghici S (2017) A novel approach for data integration and disease subtyping. Genome Res 27(12):2025–2039. https://doi.org/10.1101/gr.215129.116
Nguyen H, Shrestha S, Draghici S, Nguyen T (2018) PINSPlus: a tool for tumor subtype discovery in integrated genomic data. Bioinformatics 35(16):2843–2846. https://doi.org/10.1093/bioinformatics/bty1049
Nicholson JK, Lindon JC, Holmes E (1999) ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29(11):1181–1189. https://doi.org/10.1080/004982599238047
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res 27(1):29–34. https://doi.org/10.1093/nar/27.1.29
Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, Kinoshita K (2014) COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res 43(D1):D82–D86. https://doi.org/10.1093/nar/gku1163
Olah M, Rad R, Ostopovici L et al (2007) WOMBAT and WOMBATPK: bioactivity databases for lead and drug discovery. Chem Biol Small Mol Syst Biol Drug Des 1:760–786
Oliver S (1998) Systematic functional analysis of the yeast genome. Trends Biotechnol 16(9):373–378. https://doi.org/10.1016/s0167-7799(98)01214-1
Pang Z, Chong J, Zhou G, de Lima Morais DA, Chang L, Barrette M, Gauthier C, Jacques P-É, Li S, Xia J (2021) MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res 49:W388. https://doi.org/10.1093/nar/gkab382
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14(4):417–419. https://doi.org/10.1038/nmeth.4197
Peng J, Jury EC, Dönnes P, Ciurtin C (2021) Machine learning techniques for personalised medicine approaches in immune-mediated chronic inflammatory diseases: applications and challenges. Front Pharmacol 12:720694. https://doi.org/10.3389/fphar.2021.720694
Perou C, Sørlie T, Eisen M et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752. https://doi.org/10.1038/35021093
Petryszak R, Burdett T, Fiorelli B, Fonseca NA, Gonzalez-Porta M, Hastings E, Huber W, Jupp S, Keays M, Kryvych N, McMurry J, Marioni JC, Malone J, Megy K, Rustici G, Tang AY, Taubert J, Williams E, Mannion O, Parkinson HE (2014) Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res 42(D1):D926–D932. https://doi.org/10.1093/nar/gkt1270
Pettini F, Visibelli A, Cicaloni V, Iovinelli D, Spiga O (2021) Multi-omics model applied to cancer genetics. Int J Mol Sci 22(11):5751. https://doi.org/10.3390/ijms22115751
Planey CR, Gevaert O (2016) CoINcIDE: a framework for discovery of patient subtypes across multiple datasets. Genome Med 8(1). https://doi.org/10.1186/s13073-016-0281-4
Pluskal T, Castillo S, Villar-Briones A, Orešič M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform 11(1):1–11
Pinu FR, Beale DJ, Paten AM, Kouremenos K, Swarup S, Schirra HJ, Wishart D (2019) Systems biology and multi-omics integration: viewpoints from the metabolomics research community. Metabolites 9(4):76. https://doi.org/10.3390/metabo9040076
Priya S, Kumar A, Singh DB, Jain P, Tripathi G (2022) Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des 100:136–153. https://doi.org/10.1111/cbdd.14057
Pruitt KD, Tatusova T, Brown GR, Maglott DR (2011) NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40(D1):D130–D135. https://doi.org/10.1093/nar/gkr1079
Puchades-Carrasco L, Pineda-Lucena A (2017) Metabolomics applications in precision medicine: an oncological perspective. Curr Top Med Chem 17(24):2740. https://doi.org/10.2174/1568026617666170707120034
Qiu YL, Zheng H, Devos A, Selby H, Gevaert O (2020) A meta-learning approach for genomic survival analysis. Nat Commun 11(1):6350. https://doi.org/10.1038/s41467-020-20167-3
Quazi S (2022) Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 39(8):120. https://doi.org/10.1007/s12032-022-01711-1
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842
Rintala TJ, Federico A, Latonen L, Greco D, Fortino V (2021) A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery. Brief Bioinform 22(6):bbab314. https://doi.org/10.1093/bib/bbab314
Riquelme G, Zabalegui N, Marchi P, Jones CM, Monge ME (2020) A python-based pipeline for preprocessing LC–MS data for untargeted metabolomics workflows. Metabolites 10(10):416
Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D (2015) Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet 16(2):85–97. https://doi.org/10.1038/nrg3868
Robinson MD, Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9(2):321–332
Roden DM, McLeod HL, Relling MV, Williams MS, Mensah GA, Peterson JF, Van Driest SL (2019) Pharmacogenomics. Lancet 394(10197):521–532. https://doi.org/10.1016/s0140-6736(19)31276-0
Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot J-P, International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) (2019) Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 9(1):10351. https://doi.org/10.1038/s41598-019-46649-z
Sadakierska-Chudy A, Filip M (2015) A comprehensive view of the epigenetic landscape. Part II: Histone post-translational modification, nucleosome level, and chromatin regulation by ncRNAs. Neurotox Res 27(2):172–197. https://doi.org/10.1007/s12640-014-9508-6
Santos SS, Torres M, Galeano D, Sánchez MDM, Cernuzzi L, Paccanaro A (2022) Machine learning and network medicine approaches for drug repositioning for COVID-19. Patterns 3(1):100396. https://doi.org/10.1016/j.patter.2021.100396
Saria S, Goldenberg A (2015) Subtyping: what it is and its role in precision medicine. IEEE Intell Syst 30(4):70–75. https://doi.org/10.1109/mis.2015.60
Savitzky A, Golay MJ (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639
Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6):863–864
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092
Seiler M, Huang CC, Szalma S, Bhanot G (2010) ConsensusCluster: a software tool for unsupervised cluster discovery in numerical data. OMICS: J Integr Biol 14(1):109–113. https://doi.org/10.1089/omi.2009.0083
Shakhsheer B, Anderson M, Khatib K, Tadoori L, Joshi L, Lisacek F, Hirschman L, Mullen E (2013) SugarBind database (SugarBindDB): a resource of pathogen lectins and corresponding glycan targets. J Mol Recognit 26(9):426–431. https://doi.org/10.1002/jmr.2285
Shapiro JA (2009) Revisiting the Central Dogma in the 21st century. Ann N Y Acad Sci 1178(1):6–28. https://doi.org/10.1111/j.1749-6632.2009.04990.x
Shen R, Mo Q, Schultz N, Seshan VE, Olshen AB, Huse J, Ladanyi M, Sander C (2012) Integrative subtype discovery in glioblastoma using iCluster. PLoS One 7(4):e35236. https://doi.org/10.1371/journal.pone.0035236
Shrestha RK, Lubinsky B, Bansode VB, Moinz MB, McCormack GP, Travers SA (2014) QTrim: a novel tool for the quality trimming of sequence reads generated using the Roche/454 sequencing platform. BMC Bioinform 15(1):1–6
Sigin VO, Kalinkin AI, Kuznetsova EB, Simonova OA, Chesnokova GG, Litviakov NV, Slonimskaya EM, Tsyganov MM, Ibragimova MK, Volodin IV, Vinogradov II, Vinogradov MI, Vinogradov IY, Kutsev SI, Strelnikov VV, Zaletaev DV, Tanas AS (2020) DNA methylation markers panel can improve prediction of response to neoadjuvant chemotherapy in luminal B breast cancer. Sci Rep 10(1):9239. https://doi.org/10.1038/s41598-020-66197-1
Sinkala M, Mulder N, Martin D (2020) Machine learning and network analyses reveal disease subtypes of pancreatic cancer and their molecular characteristics. Sci Rep 10(1):1. https://doi.org/10.1038/s41598-020-58290-2
Smirnov P, Safikhani Z, El-Hachem N, Wang D, She A, Olsen C, Freeman M, Selby H, Gendoo DMA, Grossmann P, Beck AH, Aerts HJWL, Lupien M, Goldenberg A, Haibe-Kains B (2015) PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics 32(8):1244–1246. https://doi.org/10.1093/bioinformatics/btv723
Sonabend R, Király FJ, Bender A, Bischl B, Lang M (2021) mlr3proba: an R package for machine learning in survival analysis. Bioinformatics 37(17):2789–2791. https://doi.org/10.1093/bioinformatics/btab039
Spicker JS, Brunak S, Frederiksen KS, Toft H (2008) Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. Toxicol Sci 102(2):444–454. https://doi.org/10.1093/toxsci/kfn001
Stanfill BA, Nakayasu ES, Bramer LM, Thompson AM, Ansong CK, Clauss TR, Gritsenko MA, Monroe ME, Moore RJ, Orton DJ, Piehowski PD, Schepmoes AA, Smith RD, Webb-Robertson BM, Metz TO (2018) Quality control analysis in real-time (QC-ART): a tool for real-time quality control assessment of mass spectrometry-based proteomics data. Mol Cell Proteomics 17(9):1824–1836
Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, Neumann S (2019) The metaRbolomics toolbox in bioconductor and beyond. Metabolites 9(10):200. https://doi.org/10.3390/metabo9100200
Stephenson N, Shane E, Chase J, Rowland J, Ries D, Justice N, Zhang J, Chan L, Cao R (2019) Survey of machine learning techniques in drug discovery. Curr Drug Metab 20(3):185–193. https://doi.org/10.2174/1389200219666180820112457
Strbenac D, Mann GJ, Ormerod JT, Yang JYH (2015) ClassifyR: an R package for performance assessment of classification with applications to transcriptomics. Bioinformatics 31:1851–1853
Sturm M, Bertsch A, Gröpl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O (2008) OpenMS–an open-source software framework for mass spectrometry. BMC Bioinform 9(1):1–11
Sung J, Wang Y, Chandrasekaran S, Witten DM, Price ND (2012) Molecular signatures from omics data: from chaos to consensus. Biotechnol J 7(8):946–957. https://doi.org/10.1002/biot.201100305
Swan AL, Stekel DJ, Hodgman C, Allaway D, Alqahtani MH, Mobasheri A, Bacardit J (2015) A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genomics 16(S1):S2. https://doi.org/10.1186/1471-2164-16-s1-s2
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (2014) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43(Database Issue):D447–D452. https://doi.org/10.1093/nar/gku1003
Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, Fish P, Harsha B, Hathaway C, Jupe SC, Kok CY, Noble K, Ponting L, Ramshaw CC, Rye CE, Speedy HE (2019) COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 47(D1):D941–D947. https://doi.org/10.1093/nar/gky1015
Taylor RM, Dance J, Taylor RJ, Prince JT (2013) Metriculator: quality assessment for mass spectrometry-based proteomics. Bioinformatics 29(22):2948–2949
Tebani A, Afonso C, Marret S, Bekri S (2016) Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations. Int J Mol Sci 17(9):1555. https://doi.org/10.3390/ijms17091555
Teng L, He B, Wang J, Tan K (2015) 4DGenome: a comprehensive database of chromatin interactions. Bioinformatics 32(17):2727–2727. https://doi.org/10.1093/bioinformatics/btw375
Thakur R, Singh PK (2021) Molecular subtypes of pancreatic cancer: a proteomics approach. Clin Cancer Res 27(12):3272–3274. https://doi.org/10.1158/1078-0432.ccr-21-0640
The Cancer Genome Atlas Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474:609–615
The Cancer Genome Atlas Network (2012a) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487(7407):330–337. https://doi.org/10.1038/nature11252
The Cancer Genome Atlas Network (2012b) Comprehensive molecular portraits of human breast tumours. Nature 490:61–70
The UniProt Consortium (2014) UniProt: a hub for protein information. Nucleic Acids Res 43(D1):D204–D212. https://doi.org/10.1093/nar/gku989
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10):6567–6572. https://doi.org/10.1073/pnas.082099299
Tijms BM, Gobom J, Reus L, Jansen I, Hong S, Dobricic V, Kilpert F, Ten Kate M, Barkhof F, Tsolaki M, Verhey FRJ, Popp J, Martinez-Lage P, Vandenberghe R, Lleó A, Molinuevo JL, Engelborghs S, Bertram L, Lovestone S, Streffer J, Vos S, Bos I, Alzheimer’s Disease Neuroimaging Initiative (ADNI), Blennow K, Scheltens P, Teunissen CE, Zetterberg H, Visser PJ (2020) Pathophysiological subtypes of Alzheimer’s disease based on cerebrospinal fluid proteomics. Brain 143(12):3776–3792. https://doi.org/10.1093/brain/awaa325
Tong L, Wu H, Wang MD (2021) Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer. Methods 189:74–85
Tsai TH, Wang M, Ressom HW (2016) Preprocessing and analysis of LC-MS-based proteomic data. In: Statistical Analysis in Proteomics. Humana Press, New York, NY, pp 63–76
Turewicz M, Ahrens M, May C, Marcus K, Eisenacher M (2016) PAA: an R/bioconductor package for biomarker discovery with protein microarrays. Bioinformatics 32(10):1577–1579. https://doi.org/10.1093/bioinformatics/btw037
Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, Cox J (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13(9):731–740. https://doi.org/10.1038/nmeth.3901
Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA-K, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T (2015) Tissue-based map of the human proteome. Science 347(6220):1260419. https://doi.org/10.1126/science.1260419
Van Houtven J, Agten A, Boonen K, Baggerman G, Hooyberghs J, Laukens K, Valkenborg D (2019) Qcquan: a web tool for the automated assessment of protein expression and data quality of labeled mass spectrometry experiments. J Proteome Res 18(5):2221–2227
Vandenbogaert M, Li-Thiao-Té S, Kaltenbach HM, Zhang R, Aittokallio T, Schwikowski B (2008) Alignment of LC-MS images, with applications to biomarker discovery and protein identification. Proteomics 8(4):650–672
Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, Haussler D, Stuart JM (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26(12):i237–i245. https://doi.org/10.1093/bioinformatics/btq182
Vogenberg FR, Carol IB, Michael P (2010) Personalized medicine: Part 1: Evolution and development into theranostics. Pharm Therapeut 35(10):560–576
Wajid B, Iqbal H, Jamil M, Rafique H, Anwar F (2020) MetumpX—a metabolomics support package for untargeted mass spectrometry. Bioinformatics 36(5):1647–1648
Wang S, Gribskov M (2017) Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. Bioinformatics 33(3):327–333
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, JN ML, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178
Wang L, Li F, Sheng J, Wong ST (2015) A computational method for clinically relevant cancer stratification and driver mutation module discovery using personal genomics profiles. BMC Genomics 16(Suppl 7):S6. https://doi.org/10.1186/1471-2164-16-s7-s6
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A (2022) SNFtool:Similarity Network Fusion. R package version 2.3.1
Wheelock ÅM, Wheelock CE (2013) Trials and tribulations of ‘omics data analysis: assessing quality of SIMCA-based multivariate models using examples from pulmonary medicine. Mol BioSyst 9(11):2589–2596. https://doi.org/10.1039/c3mb70194h
Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE (2012) Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther 92(4):414–417. https://doi.org/10.1038/clpt.2012.96
Wilkerson MD, Neil Hayes D (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26(12):1572–1573. https://doi.org/10.1093/bioinformatics/btq170
Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, Fung C, Nikolai L, Lewis M, Coutouly MA, Forsythe I, Tang P, Shrivastava S, Jeroncic K, Stothard P, Amegbey G et al (2007) HMDB: the human metabolome database. Nucleic Acids Res 35:521–526. https://doi.org/10.1093/nar/gkl923
Witten DM (2011) Classification and clustering of sequencing data using a Poisson model. Ann Appl Stat 5(4):2493–2518. https://doi.org/10.1214/11-aoas493
Wu CT, Wang Y, Wang Y, Ebbels T, Karaman I, Graça G, Pinto R, Herrington DM, Wang Y, Yu G (2020) Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection. Bioinformatics 36(9):2862–2871
Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam TW, Li Y, Xu X, Wong GK, Wang J (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30(12):1660–1666
Xin Y, Chanrion B, O’Donnell AH, Milekic M, Costa R, Ge Y, Haghighi FG (2012) MethylomeDB: a database of DNA methylation profiles of the brain. Nucleic Acids Res 40(D1):D1245–D1249
Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ (2013) Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res 41:955–961. https://doi.org/10.1093/nar/gks1111
Yang LA, Chang YJ, Chen SH, Lin CY, Ho JM (2019) SQUAT: a sequencing quality assessment tool for data quality assessments of genome assemblies. BMC Genomics 19(9):1–12
Yaragatti M, Basilico C, Dailey L (2008) Identification of active transcriptional regulatory modules by the functional assay of DNA from nucleosome-free regions. Genome Res 18(6):930–938. https://doi.org/10.1101/gr.073460.107
Yizhak K, Benyamini T, Liebermeister W, Ruppin E, Shlomi T (2010) Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model. Bioinformatics 26(12):i255–i260. https://doi.org/10.1093/bioinformatics/btq183
Zaharia M, Bolosky WJ, Curtis K, Fox A, Patterson D, Shenker S, Stoica I, Karp RM, Sittler T (2011) Faster and more accurate sequence alignment with SNAP. arXiv:arXiv:1111.5572
Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W (2009) A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics 25(15):1952–1958. https://doi.org/10.1093/bioinformatics/btp340
Zararsız G (2015) Development and application of novel machine learning approaches for RNA-seq data classification. Hacettepe University School of Medicine, Hacettepe, Turkey. https://tez.yok.gov.tr/UlusalTezMerkezi/tezDetay.jsp?id=FUuBLcFdKB0WqIKVAVL-vA&no=o75jI9oxbTyeMIQZYeveyQ
Zararsiz G, Goksuluk D, Klaus B, Korkmaz S, Eldem V, Karabulut E, Ozturk A (2017) voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data. PeerJ 5:e3890. https://doi.org/10.7717/peerj.3890
Zeggini E, Gloyn AL, Barton AC, Wain LV (2019) Translational genomics and precision medicine: moving from the lab to the clinic. Science 365(6460):1409–1413. https://doi.org/10.1126/science.aax4588
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nussbaum C, Myers RM, Brown M, Li W, Liu XS (2008) Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9(9):R137. https://doi.org/10.1186/gb-2008-9-9-r137
Zhang Y, Lin Y-H, Johnson TD, Rozek LS, Sartor MA (2014) PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data. Bioinformatics 30(18):2568–2575. https://doi.org/10.1093/bioinformatics/btu372
Zhang Y, Wong G, Mann G, Muller S, Yang JYH (2022) SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data. GigaScience 11:giac071. https://doi.org/10.1093/gigascience/giac071
Zheng H, Ji J, Zhao L, Chen M, Shi A, Pan L, Huang Y, Zhang H, Dong B, Gao H (2016) Prediction and diagnosis of renal cell carcinoma using nuclear magnetic resonance-based serum metabolomics and self-organizing maps. Oncotarget 7(37):59189–59198. https://doi.org/10.18632/oncotarget.10830. PMID: 27463020; PMCID: PMC5312304
Zhu B, Song N, Shen R, Arora A, Machiela MJ, Song L, Landi MT, Ghosh D, Chatterjee N, Baladandayuthapani V, Zhao H (2017) Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep 7(1):1. https://doi.org/10.1038/s41598-017-17031-8
Zuo Y, Cui Y, Di Poto C, Varghese RS, Yu G, Li R, Ressom HW (2016) INDEED: integrated differential expression and differential network analysis of omic data for biomarker discovery. Methods 111:12–20. https://doi.org/10.1016/j.ymeth.2016.08.015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Cephe, A. et al. (2023). Bioinformatics and Biostatistics in Precision Medicine. In: Tuli, H.S., Yerer Aycan, M.B. (eds) Oncology: Genomics, Precision Medicine and Therapeutic Targets. Springer, Singapore. https://doi.org/10.1007/978-981-99-1529-3_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-1529-3_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1528-6
Online ISBN: 978-981-99-1529-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)