Challenges of big data integration in the life sciences


Big data has been reported to be revolutionizing many areas of life, including science. It summarizes data that is unprecedentedly large, rapidly generated, heterogeneous, and hard to accurately interpret. This availability has also brought new challenges: How to properly annotate data to make it searchable? What are the legal and ethical hurdles when sharing data? How to store data securely, preventing loss and corruption? The life sciences are not the only disciplines that must align themselves with big data requirements to keep up with the latest developments. The large hadron collider, for instance, generates research data at a pace beyond any current biomedical research center. There are three recent major coinciding events that explain the emergence of big data in the context of research: the technological revolution for data generation, the development of tools for data analysis, and a conceptual change towards open science and data. The true potential of big data lies in pattern discovery in large datasets, as well as the formulation of new models and hypotheses. Confirmation of the existence of the Higgs boson, for instance, is one of the most recent triumphs of big data analysis in physics. Digital representations of biological systems have become more comprehensive. This, in combination with advances in machine learning, creates exciting new research possibilities. In this paper, we review the state of big data in bioanalytical research and provide an overview of the guidelines for its proper usage.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Mayer-Schönberger V, Cukier K. Big data: a revolution that will transform how we live, work and think. In: Houghton Mifflin Harcourt Publishing Company, vol. 215. New York: Park Avenue South; 2013. p. 10003.

    Google Scholar 

  2. 2.

    NGRAM Viewer. Accessed Oct 2018

  3. 3.

    Price MO, Rider F. The scholar and the future of the research library. A problem and its solution. Columbia Law Rev. 1944;44:938.

    Article  Google Scholar 

  4. 4.

    Yao Q, Tian Y, Li P-F, Tian L-L, Qian Y-M, Li J-S. Design and development of a medical big data processing system based on Hadoop. J Med Syst. 2015;39:23.

    Article  Google Scholar 

  5. 5.

    CERN Data Centre passes the 200-petabyte milestone | CERN. Accessed 16 Oct 2018.

  6. 6.

    Savage N. Big data goes green. Nature. 2018;558:S19.

    CAS  Article  Google Scholar 

  7. 7.

    Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.

    Article  Google Scholar 

  8. 8.

    Sansone S-A, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019;37:358–67.

    CAS  Article  Google Scholar 

  9. 9.

    Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, et al. International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database. 2011.

    PubMed  Google Scholar 

  10. 10.

    DataCite Schema. In: DataCite Schema. Accessed 9 Oct 2018.

  11. 11.

    Schroeder B, Pinheiro E, Weber W-D. DRAM errors in the wild: a large-scale field study. In: Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems. New York: ACM; 2009. p. 193–204.

    Google Scholar 

  12. 12.

    Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29:147–60.

    Article  Google Scholar 

  13. 13.

    Savage N. Bioinformatics: big data versus the big C. Nature. 2014;509:S66–7.

    CAS  Article  Google Scholar 

  14. 14.

    Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds for big data manipulation. Biol Direct. 2012;7:43 discussion 43.

    Article  Google Scholar 

  15. 15.

    Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods. 2016;13:741–8.

    Article  Google Scholar 

  16. 16.

    Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, et al. BALL--biochemical algorithms library 1.3. BMC Bioinformatics. 2010;11:531.

    Article  Google Scholar 

  17. 17.

    Döring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics. 2008;9:11.

    Article  Google Scholar 

  18. 18.

    Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44:D447–56.

    Article  Google Scholar 

  19. 19.

    Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011;39:D19–21.

    CAS  Article  Google Scholar 

  20. 20.

    Cochrane G, Alako B, Amid C, Bower L, Cerdeño-Tárraga A, Cleland I, et al. Facing growth in the European Nucleotide Archive. Nucleic Acids Res. 2013;41:D30–5.

    CAS  Article  Google Scholar 

  21. 21.

    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41:D991–5.

    CAS  Article  Google Scholar 

  22. 22.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5.

    Article  Google Scholar 

  23. 23.

    The 1000 Genomes Project Consortium, Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs (Principal Investigator) RA, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Wang (Principal Investigator) J, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li Y, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Lander (Principal Investigator) ES, Altshuler DM, Gabriel (Co-Chair) SB, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Flicek (Principal Investigator) P, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Bentley (Principal Investigator) DR, Grocock R, Humphray S, James T, Kingsbury Z, Lehrach (Principal Investigator) H, Sudbrak (Project Leader), Ralf, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo M-L, Mardis (Co-Principal Investigator) (Co-Chair) ER, Wilson (Co-Principal Investigator) RK, Fulton L, Fulton R, Sherry (Principal Investigator) ST, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O’Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Schmidt (Principal Investigator) JP, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Auton (Principal Investigator) A, Campbell CL, Kong Y, Marcketta A, Gibbs (Principal Investigator) RA, Yu (Project Leader), Fuli, Antunes L, Bainbridge M, Muzny D, Sabo A, Huang Z, Wang (Principal Investigator) J, Coin LJM, Fang L, Guo X, Jin X, Li G, Li Q, Li Y, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison (Project Lead), Erik P, Kural D, Lee W-P, Fung Leong W, Stromberg M, Ward AN, Wu J, Zhang M, Daly (Principal Investigator) MJ, DePristo (Project Leader), Mark A, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, Bhatia G, del Angel G, Gabriel SB, Genovese G, Gupta N, Li H, Kashin S, Lander ES, McCarroll SA, Nemesh JC, Poplin RE, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Clark (Principal Investigator) AG, Gottipati S, Keinan A, Rodriguez-Flores JL, Korbel (Principal Investigator) JO, Rausch (Project Leader), Tobias, Fritz MH, Stütz AM, Flicek (Principal Investigator) P, Beal K, Clarke L, Datta A, Herrero J, McLaren WM, Ritchie GRS, Smith RE, Zerbino D, Zheng-Bradley X, Sabeti (Principal Investigator) PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper (Principal Investigator) DN, Ball EV, Stenson PD, Bentley (Principal Investigator) DR, Barnes B, Bauer M, Keira Cheetham R, Cox A, Eberle M, Humphray S, Kahn S, Murray L, Peden J, Shaw R, Kenny (Principal Investigator) EE, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, MacArthur (Principal Investigator) DG, Lek M, Sudbrak (Project Leader), Ralf, Amstislavskiy VS, Herwig R, Mardis (Co-Principal Investigator) ER, Ding L, Koboldt DC, Larson D, Ye K, Gravel S, Swaroop A, Chew E, Lappalainen (Principal Investigator) T, Erlich (Principal Investigator) Y, Gymrek M, Frederick Willems T, Simpson JT, Shriver (Principal Investigator) MD, Rosenfeld (Principal Investigator) JA, Bustamante (Principal Investigator) CD, Montgomery (Principal Investigator) SB, De La Vega (Principal Investigator) FM, Byrnes JK, Carroll AW, DeGorter MK, Lacroute P, Maples BK, Martin AR, Moreno-Estrada A, Shringarpure SS, Zakharia F, Halperin (Principal Investigator) E, Baran Y, Lee (Principal Investigator) C, Cerveira E, Hwang J, Malhotra (Co-Project Lead), Ankit, Plewczynski D, Radew K, Romanovitch M, Zhang (Co-Project Lead), Chengsheng, Hyland FCL, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Kurdoglu AA, Sinari SA, Squire K, Sherry (Principal Investigator) ST, Xiao C, Sebat (Principal Investigator) J, Antaki D, Gujral M, Noor A, Ye K, Burchard (Principal Investigator) EG, Hernandez (Principal Investigator) RD, Gignoux CR, Haussler (Principal Investigator) D, Katzman SJ, James Kent W, Howie B, Ruiz-Linares (Principal Investigator) A, Dermitzakis (Principal Investigator) ET, Devine (Principal Investigator) SE, Abecasis (Principal Investigator) (Co-Chair) GR, Min Kang (Project Leader), Hyun, Kidd (Principal Investigator) JM, Blackwell T, Caron S, Chen W, Emery S, Fritsche L, Fuchsberger C, Jun G, Li B, Lyons R, Scheller C, Sidore C, Song S, Sliwerska E, Taliun D, Tan A, Welch R, Kate Wing M, Zhan X, Awadalla (Principal Investigator) P, Hodgkinson A, Li Y, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) (Co-Chair) GA, Marchini (Principal Investigator) JL, Myers (Principal Investigator) S, Churchhouse C, Delaneau O, Gupta-Hinch A, Kretzschmar W, Iqbal Z, Mathieson I, Menelaou A, Rimmer A, Xifara DK, Oleksyk (Principal Investigator) TK, Fu (Principal Investigator) Y, Liu X, Xiong M, Jorde (Principal Investigator) L, Witherspoon D, Xing J, Eichler (Principal Investigator) EE, Browning (Principal Investigator) BL, Browning (Principal Investigator) SR, Hormozdiari F, Sudmant PH, Khurana (Principal Investigator) E, Durbin (Principal Investigator) RM, Hurles (Principal Investigator) ME, Tyler-Smith (Principal Investigator) C, Albers CA, Ayub Q, Balasubramaniam S, Chen Y, Colonna V, Danecek P, Jostins L, Keane TM, McCarthy S, Walter K, Xue Y, Gerstein (Principal Investigator) MB, Abyzov A, Balasubramanian S, Chen J, Clarke D, Fu Y, Harmanci AO, Jin M, Lee D, Liu J, Jasmine Mu X, Zhang J, Zhang Y, Li Y, Luo R, Zhu H, Alkan C, Dal E, Kahveci F, Marth (Principal Investigator) GT, Garrison EP, Kural D, Lee W-P, Ward AN, Wu J, Zhang M, McCarroll (Principal Investigator) SA, Handsaker (Project Leader), Robert E, Altshuler DM, Banks E, del Angel G, Genovese G, Hartl C, Li H, Kashin S, Nemesh JC, Shakir K, Yoon (Principal Investigator) SC, Lihm J, Makarov V, Degenhardt J, Korbel (Principal Investigator) (Co-Chair) JO, Fritz MH, Meiers S, Raeder B, Rausch T, Stütz AM, Flicek (Principal Investigator) P, Paolo Casale F, Clarke L, Smith RE, Stegle O, Zheng-Bradley X, Bentley (Principal Investigator) DR, Barnes B, Keira Cheetham R, Eberle M, Humphray S, Kahn S, Murray L, Shaw R, Lameijer E-W, Batzer (Principal Investigator) MA, Konkel MK, Walker JA, Ding (Principal Investigator) L, Hall I, Ye K, Lacroute P, Lee (Principal Investigator) (Co-Chair) C, Cerveira E, Malhotra A, Hwang J, Plewczynski D, Radew K, Romanovitch M, Zhang C, Craig (Principal Investigator) DW, Homer N, Church D, Xiao C, Sebat (Principal Investigator) J, Antaki D, Bafna V, Michaelson J, Ye K, Devine (Principal Investigator) SE, Gardner (Project Leader), Eugene J, Abecasis (Principal Investigator) GR, Kidd (Principal Investigator) JM, Mills (Principal Investigator) RE, Dayama G, Emery S, Jun G, Shi (Principal Investigator) X, Quitadamo A, Lunter (Principal Investigator) G, McVean (Principal Investigator) GA, Chen (Principle Investigator) K, Fan X, Chong Z, Chen T, Witherspoon D, Xing J, Eichler (Principal Investigator) (Co-Chair) EE, Chaisson MJ, Hormozdiari F, Huddleston J, Malig M, Nelson BJ, Sudmant PH, Parrish NF, Khurana (Principal Investigator) E, Hurles (Principal Investigator) ME, Blackburne B, Lindsay SJ, Ning Z, Walter K, Zhang Y, Gerstein (Principal Investigator) MB, Abyzov A, Chen J, Clarke D, Lam H, Jasmine Mu X, Sisu C, Zhang J, Zhang Y, Gibbs (Principal Investigator) (Co-Chair) RA, Yu (Project Leader), Fuli, Bainbridge M, Challis D, Evani US, Kovar C, Lu J, Muzny D, Nagaswamy U, Reid JG, Sabo A, Yu J, Guo X, Li W, Li Y, Wu R, Marth (Principal Investigator) (Co-Chair) GT, Garrison EP, Fung Leong W, Ward AN, del Angel G, DePristo MA, Gabriel SB, Gupta N, Hartl C, Poplin RE, Clark (Principal Investigator) AG, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Smith RE, Zheng-Bradley X, MacArthur (Principal Investigator) DG, Mardis (Principal Investigator) ER, Fulton R, Koboldt DC, Gravel S, Bustamante (Principal Investigator) CD, Craig (Principal Investigator) DW, Christoforides A, Homer N, Izatt T, Sherry (Principal Investigator) ST, Xiao C, Dermitzakis (Principal Investigator) ET, Abecasis (Principal Investigator) GR, Min Kang H, McVean (Principal Investigator) GA, Gerstein (Principal Investigator) MB, Balasubramanian S, Habegger L, Yu (Principal Investigator) H, Flicek (Principal Investigator) P, Clarke L, Cunningham F, Dunham I, Zerbino D, Zheng-Bradley X, Lage (Principal Investigator) K, Berg Jespersen J, Horn H, Montgomery (Principal Investigator) SB, DeGorter MK, Khurana (Principal Investigator) E, Tyler-Smith (Principal Investigator) (Co-Chair) C, Chen Y, Colonna V, Xue Y, Gerstein (Principal Investigator) (Co-Chair) MB, Balasubramanian S, Fu Y, Kim D, Auton (Principal Investigator) A, Marcketta A, Desalle R, Narechania A, Wilson Sayres MA, Garrison EP, Handsaker RE, Kashin S, McCarroll SA, Rodriguez-Flores JL, Flicek (Principal Investigator) P, Clarke L, Zheng-Bradley X, Erlich Y, Gymrek M, Frederick Willems T, Bustamante (Principal Investigator) (Co-Chair) CD, Mendez FL, David Poznik G, Underhill PA, Lee C, Cerveira E, Malhotra A, Romanovitch M, Zhang C, Abecasis (Principal Investigator) GR, Coin (Principal Investigator) L, Shao H, Mittelman D, Tyler-Smith (Principal Investigator) (Co-Chair) C, Ayub Q, Banerjee R, Cerezo M, Chen Y, Fitzgerald TW, Louzada S, Massaia A, McCarthy S, Ritchie GR, Xue Y, Yang F, Gibbs (Principal Investigator) RA, Kovar C, Kalra D, Hale W, Muzny D, Reid JG, Wang (Principal Investigator) J, Dan X, Guo X, Li G, Li Y, Ye C, Zheng X, Altshuler DM, Flicek (Principal Investigator) (Co-Chair) P, Clarke (Project Lead), Laura, Zheng-Bradley X, Bentley (Principal Investigator) DR, Cox A, Humphray S, Kahn S, Sudbrak (Project Lead), Ralf, Albrecht MW, Lienhard M, Larson D, Craig (Principal Investigator) DW, Izatt T, Kurdoglu AA, Sherry (Principal Investigator) (Co-Chair) ST, Xiao C, Haussler (Principal Investigator) D, Abecasis (Principal Investigator) GR, McVean (Principal Investigator) GA, Durbin (Principal Investigator) RM, Balasubramaniam S, Keane TM, McCarthy S, Stalker J, Chakravarti (Co-Chair) A, Knoppers (Co-Chair) BM, Abecasis GR, Barnes KC, Beiswanger C, Burchard EG, Bustamante CD, Cai H, Cao H, Durbin RM, Gerry NP, Gharani N, Gibbs RA, Gignoux CR, Gravel S, Henn B, Jones D, Jorde L, Kaye JS, Keinan A, Kent A, Kerasidou A, Li Y, Mathias R, McVean GA, Moreno-Estrada A, Ossorio PN, Parker M, Resch AM, Rotimi CN, Royal, Charmaine D, Sandoval K, Su Y, Sudbrak R, Tian Z, Tishkoff S, Toji LH, Tyler-Smith C, Via M, Wang Y, Yang H, Yang L, Zhu J, Bodmer W, Bedoya G, Ruiz-Linares A, Cai Z, Gao Y, Chu J, Peltonen L, Garcia-Montero A, Orfao A, Dutil J, Martinez-Cruzado JC, Oleksyk TK, Barnes KC, Mathias RA, Hennis A, Watson H, McKenzie C, Qadri F, LaRocque R, Sabeti PC, Zhu J, Deng X, Sabeti PC, Asogun D, Folarin O, Happi C, Omoniwa O, Stremlau M, Tariyal R, Jallow M, Sisay Joof F, Corrah T, Rockett K, Kwiatkowski D, Kooner J, Tịnh Hiê’n T, Dunstan SJ, Thuy Hang N, Fonnie R, Garry R, Kanneh L, Moses L, Sabeti PC, Schieffelin J, Grant DS, Gallo C, Poletti G, Saleheen D, Rasheed A, Brooks LD, Felsenfeld AL, McEwen JE, Vaydylevich Y, Green ED, Duncanson A, Dunn M, Schloss JA, Wang J, Yang H, Auton A, Brooks LD, Durbin RM, Garrison EP, Min Kang H, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68.

  24. 24.

    Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data. 2017;4:170115.

    Article  Google Scholar 

  25. 25.

    Turnbull C, Scott RH, Thomas E, Jones L, Murugaesu N, Pretty FB, et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. BMJ. 2018;361:k1687.

    Article  Google Scholar 

  26. 26.

    Anonymous (2018) EU countries will cooperate in linking genomic databases across borders - digital single market - European Commission. In: Digital single market - European Commission. Accessed 1 Jul 2019.

Download references


This work was carried out with the support of the German Research Foundation (DFG) within project INF, SFB/TR 209 “Liver Cancer.”

Author information



Corresponding author

Correspondence to Sven Nahnsen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fillinger, S., de la Garza, L., Peltzer, A. et al. Challenges of big data integration in the life sciences. Anal Bioanal Chem 411, 6791–6800 (2019).

Download citation


  • Big data
  • Bioanalytics
  • Data integration
  • Bioinformatics
  • Scalability