Abstract
Big data has been reported to be revolutionizing many areas of life, including science. It summarizes data that is unprecedentedly large, rapidly generated, heterogeneous, and hard to accurately interpret. This availability has also brought new challenges: How to properly annotate data to make it searchable? What are the legal and ethical hurdles when sharing data? How to store data securely, preventing loss and corruption? The life sciences are not the only disciplines that must align themselves with big data requirements to keep up with the latest developments. The large hadron collider, for instance, generates research data at a pace beyond any current biomedical research center. There are three recent major coinciding events that explain the emergence of big data in the context of research: the technological revolution for data generation, the development of tools for data analysis, and a conceptual change towards open science and data. The true potential of big data lies in pattern discovery in large datasets, as well as the formulation of new models and hypotheses. Confirmation of the existence of the Higgs boson, for instance, is one of the most recent triumphs of big data analysis in physics. Digital representations of biological systems have become more comprehensive. This, in combination with advances in machine learning, creates exciting new research possibilities. In this paper, we review the state of big data in bioanalytical research and provide an overview of the guidelines for its proper usage.
Similar content being viewed by others
References
Mayer-Schönberger V, Cukier K. Big data: a revolution that will transform how we live, work and think. In: Houghton Mifflin Harcourt Publishing Company, vol. 215. New York: Park Avenue South; 2013. p. 10003.
NGRAM Viewer. https://books.google.com/ngrams. Accessed Oct 2018
Price MO, Rider F. The scholar and the future of the research library. A problem and its solution. Columbia Law Rev. 1944;44:938.
Yao Q, Tian Y, Li P-F, Tian L-L, Qian Y-M, Li J-S. Design and development of a medical big data processing system based on Hadoop. J Med Syst. 2015;39:23.
CERN Data Centre passes the 200-petabyte milestone | CERN. https://home.cern/about/updates/2017/07/cern-data-centre-passes-200-petabyte-milestone. Accessed 16 Oct 2018.
Savage N. Big data goes green. Nature. 2018;558:S19.
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.
Sansone S-A, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019;37:358–67.
Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, et al. International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database. 2011. https://doi.org/10.1093/database/bar026.
DataCite Schema. In: DataCite Schema. https://schema.datacite.org/meta/kernel-4.1/index.html. Accessed 9 Oct 2018.
Schroeder B, Pinheiro E, Weber W-D. DRAM errors in the wild: a large-scale field study. In: Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems. New York: ACM; 2009. p. 193–204.
Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29:147–60.
Savage N. Bioinformatics: big data versus the big C. Nature. 2014;509:S66–7.
Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds for big data manipulation. Biol Direct. 2012;7:43 discussion 43.
Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods. 2016;13:741–8.
Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, et al. BALL--biochemical algorithms library 1.3. BMC Bioinformatics. 2010;11:531.
Döring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics. 2008;9:11.
Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44:D447–56.
Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011;39:D19–21.
Cochrane G, Alako B, Amid C, Bower L, Cerdeño-Tárraga A, Cleland I, et al. Facing growth in the European Nucleotide Archive. Nucleic Acids Res. 2013;41:D30–5.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41:D991–5.
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5.
The 1000 Genomes Project Consortium, Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs (Principal Investigator) RA, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Wang (Principal Investigator) J, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li Y, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Lander (Principal Investigator) ES, Altshuler DM, Gabriel (Co-Chair) SB, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Flicek (Principal Investigator) P, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Bentley (Principal Investigator) DR, Grocock R, Humphray S, James T, Kingsbury Z, Lehrach (Principal Investigator) H, Sudbrak (Project Leader), Ralf, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo M-L, Mardis (Co-Principal Investigator) (Co-Chair) ER, Wilson (Co-Principal Investigator) RK, Fulton L, Fulton R, Sherry (Principal Investigator) ST, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O’Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Schmidt (Principal Investigator) JP, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Auton (Principal Investigator) A, Campbell CL, Kong Y, Marcketta A, Gibbs (Principal Investigator) RA, Yu (Project Leader), Fuli, Antunes L, Bainbridge M, Muzny D, Sabo A, Huang Z, Wang (Principal Investigator) J, Coin LJM, Fang L, Guo X, Jin X, Li G, Li Q, Li Y, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison (Project Lead), Erik P, Kural D, Lee W-P, Fung Leong W, Stromberg M, Ward AN, Wu J, Zhang M, Daly (Principal Investigator) MJ, DePristo (Project Leader), Mark A, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, Bhatia G, del Angel G, Gabriel SB, Genovese G, Gupta N, Li H, Kashin S, Lander ES, McCarroll SA, Nemesh JC, Poplin RE, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Clark (Principal Investigator) AG, Gottipati S, Keinan A, Rodriguez-Flores JL, Korbel (Principal Investigator) JO, Rausch (Project Leader), Tobias, Fritz MH, Stütz AM, Flicek (Principal Investigator) P, Beal K, Clarke L, Datta A, Herrero J, McLaren WM, Ritchie GRS, Smith RE, Zerbino D, Zheng-Bradley X, Sabeti (Principal Investigator) PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper (Principal Investigator) DN, Ball EV, Stenson PD, Bentley (Principal Investigator) DR, Barnes B, Bauer M, Keira Cheetham R, Cox A, Eberle M, Humphray S, Kahn S, Murray L, Peden J, Shaw R, Kenny (Principal Investigator) EE, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, MacArthur (Principal Investigator) DG, Lek M, Sudbrak (Project Leader), Ralf, Amstislavskiy VS, Herwig R, Mardis (Co-Principal Investigator) ER, Ding L, Koboldt DC, Larson D, Ye K, Gravel S, Swaroop A, Chew E, Lappalainen (Principal Investigator) T, Erlich (Principal Investigator) Y, Gymrek M, Frederick Willems T, Simpson JT, Shriver (Principal Investigator) MD, Rosenfeld (Principal Investigator) JA, Bustamante (Principal Investigator) CD, Montgomery (Principal Investigator) SB, De La Vega (Principal Investigator) FM, Byrnes JK, Carroll AW, DeGorter MK, Lacroute P, Maples BK, Martin AR, Moreno-Estrada A, Shringarpure SS, Zakharia F, Halperin (Principal Investigator) E, Baran Y, Lee (Principal Investigator) C, Cerveira E, Hwang J, Malhotra (Co-Project Lead), Ankit, Plewczynski D, Radew K, Romanovitch M, Zhang (Co-Project Lead), Chengsheng, Hyland FCL, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Kurdoglu AA, Sinari SA, Squire K, Sherry (Principal Investigator) ST, Xiao C, Sebat (Principal Investigator) J, Antaki D, Gujral M, Noor A, Ye K, Burchard (Principal Investigator) EG, Hernandez (Principal Investigator) RD, Gignoux CR, Haussler (Principal Investigator) D, Katzman SJ, James Kent W, Howie B, Ruiz-Linares (Principal Investigator) A, Dermitzakis (Principal Investigator) ET, Devine (Principal Investigator) SE, Abecasis (Principal Investigator) (Co-Chair) GR, Min Kang (Project Leader), Hyun, Kidd (Principal Investigator) JM, Blackwell T, Caron S, Chen W, Emery S, Fritsche L, Fuchsberger C, Jun G, Li B, Lyons R, Scheller C, Sidore C, Song S, Sliwerska E, Taliun D, Tan A, Welch R, Kate Wing M, Zhan X, Awadalla (Principal Investigator) P, Hodgkinson A, Li Y, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) (Co-Chair) GA, Marchini (Principal Investigator) JL, Myers (Principal Investigator) S, Churchhouse C, Delaneau O, Gupta-Hinch A, Kretzschmar W, Iqbal Z, Mathieson I, Menelaou A, Rimmer A, Xifara DK, Oleksyk (Principal Investigator) TK, Fu (Principal Investigator) Y, Liu X, Xiong M, Jorde (Principal Investigator) L, Witherspoon D, Xing J, Eichler (Principal Investigator) EE, Browning (Principal Investigator) BL, Browning (Principal Investigator) SR, Hormozdiari F, Sudmant PH, Khurana (Principal Investigator) E, Durbin (Principal Investigator) RM, Hurles (Principal Investigator) ME, Tyler-Smith (Principal Investigator) C, Albers CA, Ayub Q, Balasubramaniam S, Chen Y, Colonna V, Danecek P, Jostins L, Keane TM, McCarthy S, Walter K, Xue Y, Gerstein (Principal Investigator) MB, Abyzov A, Balasubramanian S, Chen J, Clarke D, Fu Y, Harmanci AO, Jin M, Lee D, Liu J, Jasmine Mu X, Zhang J, Zhang Y, Li Y, Luo R, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison EP, Kural D, Lee W-P, Ward AN, Wu J, Zhang M, McCarroll (Principal Investigator) SA, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, del Angel G, Genovese G, Hartl C, Li H, Kashin S, Nemesh JC, Shakir K, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Degenhardt J, Korbel (Principal Investigator) (Co-Chair) JO, Fritz MH, Meiers S, Raeder B, Rausch T, Stütz AM, Flicek (Principal Investigator) P, Paolo Casale F, Clarke L, Smith RE, Stegle O, Zheng-Bradley X, Bentley (Principal Investigator) DR, Barnes B, Keira Cheetham R, Eberle M, Humphray S, Kahn S, Murray L, Shaw R, Lameijer E-W, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, Ding (Principal Investigator) L, Hall I, Ye K, Lacroute P, Lee (Principal Investigator) (Co-Chair) C, Cerveira E, Malhotra A, Hwang J, Plewczynski D, Radew K, Romanovitch M, Zhang C, Craig (Principal Investigator) DW, Homer N, Church D, Xiao C, Sebat (Principal Investigator) J, Antaki D, Bafna V, Michaelson J, Ye K, Devine (Principal Investigator) SE, Gardner (Project Leader), Eugene J, Abecasis (Principal Investigator) GR, Kidd (Principal Investigator) JM, Mills (Principal Investigator) RE, Dayama G, Emery S, Jun G, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) GA, Chen (Principle Investigator) K, Fan X, Chong Z, Chen T, Witherspoon D, Xing J, Eichler (Principal Investigator) (Co-Chair) EE, Chaisson MJ, Hormozdiari F, Huddleston J, Malig M, Nelson BJ, Sudmant PH, Parrish NF, Khurana (Principal Investigator) E, Hurles (Principal Investigator) ME, Blackburne B, Lindsay SJ, Ning Z, Walter K, Zhang Y, Gerstein (Principal Investigator) MB, Abyzov A, Chen J, Clarke D, Lam H, Jasmine Mu X, Sisu C, Zhang J, Zhang Y, Gibbs (Principal Investigator) (Co-Chair) RA, Yu (Project Leader), Fuli, Bainbridge M, Challis D, Evani US, Kovar C, Lu J, Muzny D, Nagaswamy U, Reid JG, Sabo A, Yu J, Guo X, Li W, Li Y, Wu R, Marth (Principal Investigator) (Co-Chair) GT, Garrison EP, Fung Leong W, Ward AN, del Angel G, DePristo MA, Gabriel SB, Gupta N, Hartl C, Poplin RE, Clark (Principal Investigator) AG, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Smith RE, Zheng-Bradley X, MacArthur (Principal Investigator) DG, Mardis (Principal Investigator) ER, Fulton R, Koboldt DC, Gravel S, Bustamante (Principal Investigator) CD, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Sherry (Principal Investigator) ST, Xiao C, Dermitzakis (Principal Investigator) ET, Abecasis (Principal Investigator) GR, Min Kang H, McVean (Principal Investigator) GA, Gerstein (Principal Investigator) MB, Balasubramanian S, Habegger L, Yu (Principal Investigator) H, Flicek (Principal Investigator) P, Clarke L, Cunningham F, Dunham I, Zerbino D, Zheng-Bradley X, Lage (Principal Investigator) K, Berg Jespersen J, Horn H, Montgomery (Principal Investigator) SB, DeGorter MK, Khurana (Principal Investigator) E, Tyler-Smith (Principal Investigator) (Co-Chair) C, Chen Y, Colonna V, Xue Y, Gerstein (Principal Investigator) (Co-Chair) MB, Balasubramanian S, Fu Y, Kim D, Auton (Principal Investigator) A, Marcketta A, Desalle R, Narechania A, Wilson Sayres MA, Garrison EP, Handsaker RE, Kashin S, McCarroll SA, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Zheng-Bradley X, Erlich Y, Gymrek M, Frederick Willems T, Bustamante (Principal Investigator) (Co-Chair) CD, Mendez FL, David Poznik G, Underhill PA, Lee C, Cerveira E, Malhotra A, Romanovitch M, Zhang C, Abecasis (Principal Investigator) GR, Coin (Principal Investigator) L, Shao H, Mittelman D, Tyler-Smith (Principal Investigator) (Co-Chair) C, Ayub Q, Banerjee R, Cerezo M, Chen Y, Fitzgerald TW, Louzada S, Massaia A, McCarthy S, Ritchie GR, Xue Y, Yang F, Gibbs (Principal Investigator) RA, Kovar C, Kalra D, Hale W, Muzny D, Reid JG, Wang (Principal Investigator) J, Dan X, Guo X, Li G, Li Y, Ye C, Zheng X, Altshuler DM, Flicek (Principal Investigator) (Co-Chair) P, Clarke (Project Lead), Laura, Zheng-Bradley X, Bentley (Principal Investigator) DR, Cox A, Humphray S, Kahn S, Sudbrak (Project Lead), Ralf, Albrecht MW, Lienhard M, Larson D, Craig (Principal Investigator) DW, Izatt T, Kurdoglu AA, Sherry (Principal Investigator) (Co-Chair) ST, Xiao C, Haussler (Principal Investigator) D, Abecasis (Principal Investigator) GR, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Keane TM, McCarthy S, Stalker J, Chakravarti (Co-Chair) A, Knoppers (Co-Chair) BM, Abecasis GR, Barnes KC, Beiswanger C, Burchard EG, Bustamante CD, Cai H, Cao H, Durbin RM, Gerry NP, Gharani N, Gibbs RA, Gignoux CR, Gravel S, Henn B, Jones D, Jorde L, Kaye JS, Keinan A, Kent A, Kerasidou A, Li Y, Mathias R, McVean GA, Moreno-Estrada A, Ossorio PN, Parker M, Resch AM, Rotimi CN, Royal, Charmaine D, Sandoval K, Su Y, Sudbrak R, Tian Z, Tishkoff S, Toji LH, Tyler-Smith C, Via M, Wang Y, Yang H, Yang L, Zhu J, Bodmer W, Bedoya G, Ruiz-Linares A, Cai Z, Gao Y, Chu J, Peltonen L, Garcia-Montero A, Orfao A, Dutil J, Martinez-Cruzado JC, Oleksyk TK, Barnes KC, Mathias RA, Hennis A, Watson H, McKenzie C, Qadri F, LaRocque R, Sabeti PC, Zhu J, Deng X, Sabeti PC, Asogun D, Folarin O, Happi C, Omoniwa O, Stremlau M, Tariyal R, Jallow M, Sisay Joof F, Corrah T, Rockett K, Kwiatkowski D, Kooner J, Tịnh Hiê’n T, Dunstan SJ, Thuy Hang N, Fonnie R, Garry R, Kanneh L, Moses L, Sabeti PC, Schieffelin J, Grant DS, Gallo C, Poletti G, Saleheen D, Rasheed A, Brooks LD, Felsenfeld AL, McEwen JE, Vaydylevich Y, Green ED, Duncanson A, Dunn M, Schloss JA, Wang J, Yang H, Auton A, Brooks LD, Durbin RM, Garrison EP, Min Kang H, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68.
Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data. 2017;4:170115.
Turnbull C, Scott RH, Thomas E, Jones L, Murugaesu N, Pretty FB, et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. BMJ. 2018;361:k1687.
Anonymous (2018) EU countries will cooperate in linking genomic databases across borders - digital single market - European Commission. In: Digital single market - European Commission. https://ec.europa.eu/digital-single-market/en/news/eu-countries-will-cooperate-linking-genomic-databases-across-borders. Accessed 1 Jul 2019.
Funding
This work was carried out with the support of the German Research Foundation (DFG) within project INF, SFB/TR 209 “Liver Cancer.”
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Research involving human participants and/or animals
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fillinger, S., de la Garza, L., Peltzer, A. et al. Challenges of big data integration in the life sciences. Anal Bioanal Chem 411, 6791–6800 (2019). https://doi.org/10.1007/s00216-019-02074-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-019-02074-9