Skip to main content
Log in

Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review

  • Review
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Biological big data are a massive amount of data generated from multi-omics experiments, such as genomics, transcriptomics, proteomics, metabolomics, phenomics, glycomics, epigenomics, and other omics. These data are used to study biological processes and to gain insights into how living systems work. It can also be used to develop new treatments for diseases and understand the causes of certain conditions. The storage and analysis of these data present several challenges owing to their sheer size and complexity. Storing these data efficiently requires a large amount of storage space and processing power. Furthermore, there are certain limitations in terms of the kind of insights that can be gained from multi-omics data because of their complexity. Despite these challenges, biological big data offers great potential for advancing our understanding of biology and developing new treatments for diseases. Big-data research is a rapidly growing field, with numerous applications. As the amount of data continues to increase, it is important to understand its storage, utility, limitations, and challenges. In this review article, various sources of big-data research and their storage capacities, limitations, and challenges are discussed. Factors affecting the data quality and accuracy have been reported. It will be helpful for researchers to understand the available big data in biology for their further utilization and integration into novel discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security and privacy. J Big Data 5:1. https://doi.org/10.1186/s40537-017-0110-7

    Article  Google Scholar 

  2. Abriata LA (2017) Structural database resources for biological macromolecules. Brief Bioinform 18:659–669. https://doi.org/10.1093/bib/bbw049

    Article  Google Scholar 

  3. Agapito G, Pastrello C, Guzzi PH, Jurisica I, Cannataro M (2020) BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways. Bioinformatics 36:4377–4378. https://doi.org/10.1093/bioinformatics/btaa529

    Article  Google Scholar 

  4. Alpert AJ (1990) Hydrophilic-interaction chromatography for the separation of peptides, nucleic-acids and other polar compounds. J Chromatogr 499:177–196

    Article  Google Scholar 

  5. Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS (2011) lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39:D146–D151. https://doi.org/10.1093/nar/gkq1138

    Article  Google Scholar 

  6. Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P, Felts B, Nulton J, Mahaffy J, Rohwer F (2005) PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinform 6:41. https://doi.org/10.1186/1471-2105-6-41

    Article  Google Scholar 

  7. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:D115-119. https://doi.org/10.1093/nar/gkh131

    Article  Google Scholar 

  8. Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M (2016) PGP repository: a plant phenomics and genomics data publication infrastructure. Database 2016:baw033

    Article  Google Scholar 

  9. Arumugam M, Harrington ED, Foerstner KU, Raes J, Bork P (2010) SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26:2977–2978. https://doi.org/10.1093/bioinformatics/btq536

    Article  Google Scholar 

  10. Atas E, Singer A, Meller A (2012) DNA sequencing and bar-coding using solid-state nanopores. Electrophoresis 33:3437–3447. https://doi.org/10.1002/elps.201200266

    Article  Google Scholar 

  11. Avner BS, Fialho AM, Chakrabarty AM (2012) Overcoming drug resistance in multi-drug resistant cancers and microorganisms: a conceptual framework. Bioengineered 3:262. https://doi.org/10.4161/bioe.21130

    Article  Google Scholar 

  12. Axtell MJ, Jan C, Rajagopalan R, Bartel DP (2006) A two-hit trigger for siRNA biogenesis in plants. Cell 127:565–577

    Article  Google Scholar 

  13. Bai JPF, Alekseyenko AV, Statnikov A, Wang I-M, Wong PH (2013) Strategic applications of gene expression: from drug discovery/development to bedside. AAPS J 15:427–437. https://doi.org/10.1208/s12248-012-9447-1

    Article  Google Scholar 

  14. Bai W, Yang W, Wang W, Wang Y, Liu C, Jiang Q, Hua J, Liao M (2017) GED: a manually curated comprehensive resource for epigenetic modification of gametogenesis. Brief Bioinform 18:98–104. https://doi.org/10.1093/bib/bbw007

    Article  Google Scholar 

  15. Bainbridge MN et al (2006) Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genom 7:246

    Article  Google Scholar 

  16. Baldock RA (2007) The Edinburgh mouse atlas project: data mapping and spatial organisation. FASEB J 21:A201–A201. https://doi.org/10.1096/fasebj.21.5.A201-b

    Article  Google Scholar 

  17. Baqader NO, Radulovic M, Crawford M, Stoeber K, Godovac-Zimmermann J (2014) Nuclear cytoplasmic trafficking of proteins is a major response of human fibroblasts to oxidative stress. J Proteome Res 13:4398–4423

    Article  Google Scholar 

  18. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995. https://doi.org/10.1093/nar/gks1193

    Article  Google Scholar 

  19. Batth TS, Francavilla C, Olsen JV (2014) Off-line high pH reversed-phase fractionation for in depth phosphoproteomics. J Proteome Res 13:6176–6186

    Article  Google Scholar 

  20. Bennett S (2004) Solexa Ltd. Pharmacogenomics 5:433–438. https://doi.org/10.1517/14622416.5.4.433

    Article  Google Scholar 

  21. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J et al (2013) GenBank. Nucleic Acids Res 41:D36-42. https://doi.org/10.1093/nar/gks1195

    Article  Google Scholar 

  22. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E, Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang G-D, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Sohna ES, Spence J, Stevens EJ, Sutton K, Szajkowski N, Tregidgo L, Turcatti CL, vandeVondele G, Verhovsky S, Virk Y, Wakelin SM, Walcott S, Wang GC, Worsley J, Yan GJ, Yau J, Zuerlein L, Rogers M, Jane Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59. https://doi.org/10.1038/nature07517

    Article  Google Scholar 

  23. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The protein data bank. Nucleic Acids Res 28:235–242. https://doi.org/10.1093/nar/28.1.235

    Article  Google Scholar 

  24. Bhattacharya A, Ziebarth JD, Cui Y (2014) PolymiRTS database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Res 42:D86-91. https://doi.org/10.1093/nar/gkt1028

    Article  Google Scholar 

  25. Bird SS, Marur VR, Sniatynski MJ et al (2011) Serum lipidomics profiling using LC-MS and high-energy collisional dissociation fragmentation: focus on triglyceride detection and characterization. Anal Chem 83:6648–6657

    Article  Google Scholar 

  26. Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L et al (2004) An overview of ensembl. Genome Res 14:925–928. https://doi.org/10.1101/gr.1860604

    Article  Google Scholar 

  27. Blake VC, Birkett C, Matthews DE, Hane DL, Bradbury P, Jannink J-L (2016) The triticeae toolbox: combining phenotype and genotype data to advance small-grains breeding. Plant Genome. https://doi.org/10.3835/plantgenome2014.12.0099

    Article  Google Scholar 

  28. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P (2007) CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinform 8:209. https://doi.org/10.1186/1471-2105-8-209

    Article  Google Scholar 

  29. Boersema PJ, Raijmakers R, Lemeer S, Mohammed S, Heck AJR (2009) Multiplex peptide stable isotope dimethyl labeling for quantitative proteomics. Nat Protoc 4:484–494

    Article  Google Scholar 

  30. Bono H (2020) All of gene expression (AOE): An integrated index for public gene expression databases. PLoS ONE 15:e0227076. https://doi.org/10.1371/journal.pone.0227076

    Article  Google Scholar 

  31. Bowers J, Mitchell J, Beer E, Buzby PR, Causey M, Efcavitch JW, Jarosz M, Krzymanska-Olejnik E, Kung L, Lipson D, Lowman GM, Marappan S, McInerney P, Platt A, Roy A, Siddiqi SM, Steinmann K, Thompson JF (2009) Virtual Terminator nucleotides for next generation DNA sequencing. Nat Methods 6:593–595. https://doi.org/10.1038/nmeth.1354

    Article  Google Scholar 

  32. Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6:673–676. https://doi.org/10.1038/nmeth.1358

    Article  Google Scholar 

  33. Breker M, Schuldiner M (2014) The emergence of proteome-wide technologies: systematic analysis of proteins comes of age. Nat Rev Mol Cell Biol 15:453–464

    Article  Google Scholar 

  34. Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochimica et Biophysica Acta (BBA) Mol Basis Disease From Genome Funct 1842:1932–1941. https://doi.org/10.1016/j.bbadis.2014.06.015

    Article  Google Scholar 

  35. Burger A, Baldock R, Yang Y, Waterhouse A, Houghton D, Burton N, Davidson D (2002) The Edinburgh mouse atlas and gene-expression database: a spatio-temporal database for biological research. In: proceedings 14th international conference on scientific and statistical database management. Presented at the proceedings 14th international conference on scientific and statistical database management, pp 239. https://doi.org/10.1109/SSDM.2002.1029726

  36. Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L et al (2022) RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 51:D488-508. https://doi.org/10.1093/nar/gkac1077

    Article  Google Scholar 

  37. Cases I, Pisano DG, Andres E, Carro A, Fernandez JM, Gomez-Lopez G et al (2007) CARGO: a web portal to integrate customized biological information. Nucleic Acids Res 35:W16-20

    Article  Google Scholar 

  38. Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Berhanu Lemma R, Turchi L, Blanc-Mathieu R et al (2022) JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 50:D165–D173. https://doi.org/10.1093/nar/gkab1113

    Article  Google Scholar 

  39. Chaisson MJ et al (2009) Resolving the complexity of the human genome using single-molecule sequencing. Nature 517:265–270

    Google Scholar 

  40. Champagne A, Boutry M (2013) Proteomics of nonmodel plant species. Proteomics 13:663–673

    Article  Google Scholar 

  41. Chan C-KK, Hsu AL, Halgamuge SK, Tang S-L (2008) Binning sequences using very sparse labels within a metagenome. BMC Bioinform 9:215. https://doi.org/10.1186/1471-2105-9-215

    Article  Google Scholar 

  42. Chapin N, Sen R (2023) Chapter 12—COVID-19 phenomics. In: Barh D, Azevedo V (eds) Omics approaches and technologies in COVID-19. Academic Press, New York, pp 191–218. https://doi.org/10.1016/B978-0-323-91794-0.00014-7

    Chapter  Google Scholar 

  43. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L et al (2007) MINT: the molecular INTeraction database. Nucleic Acids Res 35:D572. https://doi.org/10.1093/nar/gkl950

    Article  Google Scholar 

  44. Chen G, Ning B, Shi T (2019) Single-cell RNA-Seq technologies and related computational data analysis. Front Genet 10

  45. Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M et al (2019) LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res 47:D140–D144. https://doi.org/10.1093/nar/gky1051

    Article  Google Scholar 

  46. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET et al (1998) SGD: saccharomyces genome database. Nucleic Acids Res 26:73–79

    Article  Google Scholar 

  47. Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genom 7:272

    Article  Google Scholar 

  48. Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159. https://doi.org/10.1101/gr.1917404

    Article  Google Scholar 

  49. Choi M, Carver J, Chiva C, Tzouros M, Huang T, Tsai T-H, Pullman B, Bernhardt OM, Hüttenhain R, Teo GC, Perez-Riverol Y, Muntel J, Müller M, Goetze S, Pavlou M, Verschueren E, Wollscheid B, Nesvizhskii AI, Reiter L, Dunkley T, Sabidó E, Bandeira N, Vitek O (2020) MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets. Nat Methods 17:981–984. https://doi.org/10.1038/s41592-020-0955-0

    Article  Google Scholar 

  50. Choksi NY, Jahnke GD, St Hilaire C, Shelby M (2003) Role of thyroid hormones in human and laboratory animal reproductive health. Birth Defects Res B Dev Reprod Toxicol 68:479–491

    Article  Google Scholar 

  51. Choubey J, Choudhari JK, Sahariah BP, Verma MK, Banerjee A (2021) Chapter 25—molecular tools: advance approaches to analyze diversity of microbial community. In: Shah MP, Sarkar A, Mandal S (eds) Wastewater treatment. Elsevier, pp 507–520. https://doi.org/10.1016/B978-0-12-821881-5.00025-8

  52. Choubey J, Choudhari JK, Verma MK, Chatterjee T, Sahariah BP (2022) Metagenomics and metatranscriptomic analysis of wastewater. In: Microbial community studies in industrial wastewater treatment. CRC Press

  53. Choudhari JK, Chatterjee T, Gupta S, Garcia-Garcia JG, Vera-González J (2021) Network biology approaches in ophthalmological diseases: a case study of glaucoma. In: Wolkenhauer O (ed) Systems medicine. Academic Press, Oxford, pp 190–202. https://doi.org/10.1016/B978-0-12-801238-3.11586-7

    Chapter  Google Scholar 

  54. Choudhari JK, Choubey J, Verma MK, Chatterjee T, Sahariah BP (2022) Chapter 10—metagenomics: the boon for microbial world knowledge and current challenges. In: Singh DB, Pathak RK (eds) Bioinformatics. Academic Press, New York, pp 159–175. https://doi.org/10.1016/B978-0-323-89775-4.00022-5

    Chapter  Google Scholar 

  55. Chuh KN, Pratt MR (2015) Chemical methods for the proteome-wide identification of posttranslationally modified proteins. Curr Opin Chem 24:27–37

    Article  Google Scholar 

  56. Churbanov A, Ryan R, Hasan N, Bailey D, Chen H, Milligan B, Houde P (2012) HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. Bioinformatics 28:2797–2803. https://doi.org/10.1093/bioinformatics/bts524

    Article  Google Scholar 

  57. Cirillo D, Valencia A (2019) Big data analytics for personalized medicine. Current opinion in biotechnology, systems biology. NanoBiotechnology 58:161–167. https://doi.org/10.1016/j.copbio.2019.03.004

    Article  Google Scholar 

  58. Clark TA, Murray IA, Morgan RD, Kislyuk AO, Spittle KE, Boitano M, Fomenkov A, Roberts RJ, Korlach J (2012) Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucl Acids Res 40:e29. https://doi.org/10.1093/nar/gkr1146

    Article  Google Scholar 

  59. Clarke J, Wu H-C, Jayasinghe L, Patel A, Reid S, Bayley H (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270

    Article  Google Scholar 

  60. Conlon MA, Bird AR (2014) The impact of diet and lifestyle on gut microbiota and human health. Nutrients 7:17–44. https://doi.org/10.3390/nu7010017

    Article  Google Scholar 

  61. Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR (2011) RBPDB: a database of RNA-binding specificities. Nucleic Acids Res 39:D301–D308. https://doi.org/10.1093/nar/gkq1069

    Article  Google Scholar 

  62. Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteom 13:2513–2526

    Article  Google Scholar 

  63. Cui L, Lee YH, Kumar Y et al (2013) Serum metabolome and lipidome changes in adult patients with primary dengue infection. PLoS Negl Trop Dis 7:8

    Article  Google Scholar 

  64. Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. Journal of Big Data 6:1–25

    Article  Google Scholar 

  65. Davani-Davari D, Negahdaripour M, Karimzadeh I, Seifan M, Mohkam M, Masoumi SJ, Berenjian A, Ghasemi Y (2019) Prebiotics: definition, types, sources, mechanisms, and clinical applications. Foods 8:92. https://doi.org/10.3390/foods8030092

    Article  Google Scholar 

  66. Davis S, Meltzer PS (2007) GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23:1846–1847. https://doi.org/10.1093/bioinformatics/btm254

    Article  Google Scholar 

  67. Deamer D, Akeson M, Branton D (2016) Three decades of nanopore sequencing. Nat Biotechnol 34:518–524. https://doi.org/10.1038/nbt.3423

    Article  Google Scholar 

  68. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW (2009) TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinform 10:56. https://doi.org/10.1186/1471-2105-10-56

    Article  Google Scholar 

  69. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, Banfield JF (2009) Community-wide analysis of microbial genome sequence signatures. Genome Biol 10:R85. https://doi.org/10.1186/gb-2009-10-8-r85

    Article  Google Scholar 

  70. Drmanac R et al (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327:78–81

    Article  Google Scholar 

  71. Eid J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138

    Article  Google Scholar 

  72. ElSayed IA, ElDahshan K, Hefny H, ElSayed EK (2021) Big data and its future in computational biology: a literature review. J Comput Sci 17:1222–1228. https://doi.org/10.3844/jcssp.2021.1222.1228

    Article  Google Scholar 

  73. Fabre J, Dauzat M, Nègre V, Wuyts N, Tireau A, Gennari E, Neveu P, Tisné S, Massonnet C, Hummel I (2011) PHENOPSIS DB: an Information System for Arabidopsis thalianaphenotypic data in an environmental context. BMC Plant Biol 11:1–7

    Article  Google Scholar 

  74. Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H (2017) Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform 18:142. https://doi.org/10.1186/s12859-017-1559-2

    Article  Google Scholar 

  75. Fan J, Han F, Liu H (2014) Challenges of big data analysis. Natl Sci Rev 1:293–314. https://doi.org/10.1093/nsr/nwt032

    Article  Google Scholar 

  76. Farag MA, Porzel A, Schmidt J (2011) Profiling and fingerprinting of commercial cultivars of Humulus lupulus L. (hop): a comparison of MS and NMR methods in metabolomics. Metabolomics 8:492–507

    Article  Google Scholar 

  77. Fehlmann T, Reinheimer S, Geng C, Su X, Drmanac S, Alexeev A, Zhang C, Backes C, Ludwig N, Hart M, An D, Zhu Z, Xu C, Chen A, Ni M, Liu J, Li Y, Poulter M, Li Y, Stähler C, Drmanac R, Xu X, Meese E, Keller A (2016) cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs. Clin Epigenetics 8:123. https://doi.org/10.1186/s13148-016-0287-1

    Article  Google Scholar 

  78. Feng X, Liu X, Luo QBFL (2008) Mass spectrometry in systems biology: an overview. Mass Spectrom Rev 27:635–660

    Article  Google Scholar 

  79. Fernández-Torras A, Duran-Frigola M, Bertoni M, Locatelli M, Aloy P (2022) Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. Nat Commun 13:5304. https://doi.org/10.1038/s41467-022-33026-0

    Article  Google Scholar 

  80. Fiehn O (2012) Metabolomics–the link between genotypes and phenotypes. Plant Mol Biol 2002:801–807

    Google Scholar 

  81. Fiehn O, Robertson D, Griffin J, van der Werf M, Nikolau B, Morrison N, Sumner LW, Goodacre R, Hardy NW, Taylor C, Fostel J, Kristal B, Kaddurah-Daouk R, Mendes P, van Ommen B, Lindon JC, Sansone S-A (2007) The metabolomics standards initiative (MSI). Metabolomics 3:175–178. https://doi.org/10.1007/s11306-007-0070-6

    Article  Google Scholar 

  82. Filipowicz W, Bhattacharyya SN, Sonenberg N (2008) Mechanisms of posttranscriptional regulation by microRNAs: are the answers in sight. Nat 9:102–114

    Google Scholar 

  83. Finger JH, Smith CM, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2011) The mouse gene expression database (GXD): 2011 update. Nucl Acids Res 39:D835–D841. https://doi.org/10.1093/nar/gkq1132

    Article  Google Scholar 

  84. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M (2014) Pfam: the protein families database. Nucl Acids Res 42:D222-230. https://doi.org/10.1093/nar/gkt1223

    Article  Google Scholar 

  85. Floegel A, Stefan N, Yu Z et al (2013) Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach. Diabetes 62:639–648

    Article  Google Scholar 

  86. Flood PJ, Kruijer W, Schnabel SK, van der Schoor R, Jalink H, Snel JFH, Harbinson J, Aarts MGM (2016) Phenomics for photosynthesis, growth and reflectance in Arabidopsis thaliana reveals circadian and long-term fluctuations in heritability. Plant Methods 12:14. https://doi.org/10.1186/s13007-016-0113-y

    Article  Google Scholar 

  87. Froebel LK, Jalukar S, Lavergne TA, Lee JT, Duong T (2019) Administration of dietary prebiotics improves growth performance and reduces pathogen colonization in broiler chickens. Poult Sci 98:6668–6676. https://doi.org/10.3382/ps/pez537

    Article  Google Scholar 

  88. Garg P, Jaiswal P (2016) Databases and bioinformatics tools for rice research. Curr Plant Biol 7–8:39–52. https://doi.org/10.1016/j.cpb.2016.12.006

    Article  Google Scholar 

  89. Gelly J-C, Orgeur M, Jacq C, Lelandais G (2011) MitoGenesisDB: an expression data mining tool to explore spatio-temporal dynamics of mitochondrial biogenesis. Nucl Acids Res 39:D1079–D1084. https://doi.org/10.1093/nar/gkq781

    Article  Google Scholar 

  90. Gillet LC et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell 11:0111.016717

    Google Scholar 

  91. Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26:2631–2632

    Article  Google Scholar 

  92. Goñi JR, Fenollosa C, Pérez A, Torrents D, Orozco M (2008) DNAlive: a tool for the physical analysis of DNA at the genomic scale. Bioinformatics 24:1731–1732

    Article  Google Scholar 

  93. Gonzalez-Galarza FF, McCabe A, dos Santos EJM, Jones J, Takeshita L, Ortega-Rivera ND et al (2020) Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res 48:D783–D788. https://doi.org/10.1093/nar/gkz1029

    Article  Google Scholar 

  94. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351

    Article  Google Scholar 

  95. Gowda GAN, Raftery D (2021) NMR based metabolomics. Adv Exp Med Biol 1280:19–37. https://doi.org/10.1007/978-3-030-51652-9_2

    Article  Google Scholar 

  96. Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucl Acids Res 38:D843–D846

    Article  Google Scholar 

  97. Greene CS, Tan J, Ung M, Moore JH, Cheng C (2014) Big data bioinformatics. J Cell Physiol 229:1896–1900. https://doi.org/10.1002/jcp.24662

    Article  Google Scholar 

  98. Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR (2003) Rfam: an RNA family database. Nucleic Acids Res 31:439. https://doi.org/10.1093/nar/gkg006

    Article  Google Scholar 

  99. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34:D140. https://doi.org/10.1093/nar/gkj112

    Article  Google Scholar 

  100. Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge structural database. Acta Crystallogr B Struct Sci Cryst Eng Mater 72:171–179. https://doi.org/10.1107/S2052520616003954

    Article  Google Scholar 

  101. Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E, Conroy MJ et al (2014) PDBe: protein data bank in Europe. Nucleic Acids Res 42:D285–D291. https://doi.org/10.1093/nar/gkt1180

    Article  Google Scholar 

  102. Haleem A, Javaid M, Khan IH, Vaishya R (2020) Significant applications of big data in COVID-19 pandemic. Indian J Orthop 54:526–528. https://doi.org/10.1007/s43465-020-00129-z

    Article  Google Scholar 

  103. Haudry Y, Berube H, Letunic I, Weeber P-D, Gagneur J, Girardot C, Kapushesky M, Arendt D, Bork P, Brazma A, Furlong EEM, Wittbrodt J, Henrich T (2008) 4DXpress: a database for cross-species expression pattern comparisons. Nucl Acids Res 36:D847-853. https://doi.org/10.1093/nar/gkm797

    Article  Google Scholar 

  104. Haverland NA, Fox HS, Ciborowski P (2014) Quantitative proteomics by SWATH MS reveals altered expression of nucleic acid binding and regulatory proteins in HIV 1 infected macrophages. J Proteome Res 13:2109–2119

    Article  Google Scholar 

  105. Heather JM, Chain B (2016) The sequence of sequencers: the history of sequencing DNA. Genomics 107:1–8. https://doi.org/10.1016/j.ygeno.2015.11.003

    Article  Google Scholar 

  106. Hendlich M, Bergner A, Günther J, Klebe G (2003) Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J Mol Biol 326:607–620. https://doi.org/10.1016/s0022-2836(02)01408-0

    Article  Google Scholar 

  107. Henrich T, Ramialison M, Quiring R, Wittbrodt B, Furutani-Seiki M, Wittbrodt J, Kondoh H (2003) MEPD: a Medaka gene expression pattern database. Nucl Acids Res 31:72–74

    Article  Google Scholar 

  108. Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD (2020) Computational methods for single-cell RNA sequencing. Annu Rev Biomed Data Sci 3:339–364. https://doi.org/10.1146/annurev-biodatasci-012220-100601

    Article  Google Scholar 

  109. Hillier L, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6:807–828

    Article  Google Scholar 

  110. Hoch JC, Baskaran K, Burr H, Chin J, Eghbalnia HR, Fujiwara T et al (2023) Biological magnetic resonance data bank. Nucleic Acids Res 51:D368–D376. https://doi.org/10.1093/nar/gkac1050

    Article  Google Scholar 

  111. Holmes DE (2017) The data explosion. In: Holmes DE (ed) Big data: a very short introduction. Oxford University Press, Oxford. https://doi.org/10.1093/actrade/9780198779575.003.0001

    Chapter  Google Scholar 

  112. Houwing S et al (2007) A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in zebrafish. Cell 129:69–82

    Article  Google Scholar 

  113. Hu Y, Yang L, Lu Y, Wang Y, Jiang J, Liu Y, Cao Q (2022) Systems network pharmacology-based prediction and analysis of potential targets and pharmacological mechanism of Actinidia chinensis planch. Root extract for application in hepatocellular carcinoma. Evid Based Complement Alternat Med 2022:2116006. https://doi.org/10.1155/2022/2116006

    Article  Google Scholar 

  114. Huang S-SC, Ecker JR (2018) Piecing together cis-regulatory networks: insights from epigenomics studies in plants. Wiley Interdiscip Rev Syst Biol Med 10:e1411. https://doi.org/10.1002/wsbm.1411

    Article  Google Scholar 

  115. Huang H-Y, Lin Y-C-D, Li J, Huang K-Y, Shrestha S, Hong H-C et al (2020) miRTarBase updates to the experimentally validated microRNA–target interaction database. Nucleic Acids Res 2020(48):D148–D154. https://doi.org/10.1093/nar/gkz896

    Article  Google Scholar 

  116. Hucka M, Bergmann FT, Dräger A, Hoops S, Keating SM, Le Novére N, Myers CJ, Olivier BG, Sahle S, Schaff JC, Smith LP, Waltemath D, Wilkinson DJ (2015) Systems biology markup language (SBML) level 2 version 5: structures and facilities for model definitions. J Integr Bioinform 12:271. https://doi.org/10.2390/biecoll-jib-2015-271

    Article  Google Scholar 

  117. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS et al (2006) The PROSITE database. Nucleic Acids Res 34:D227–D230. https://doi.org/10.1093/nar/gkj063

    Article  Google Scholar 

  118. Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-Beltran A, Hunter C, Jones P, Leinonen R, McAnulla C, Maguire E (2014) EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucl Acids Res 42:D600–D606

    Article  Google Scholar 

  119. Huson DH, Weber N (2013) Microbial community analysis using MEGAN. Methods Enzymol 531:465–485. https://doi.org/10.1016/B978-0-12-407863-5.00021-6

    Article  Google Scholar 

  120. Imker HJ (2018) 25 Years of molecular biology databases: a study of proliferation, impact, and maintenance. Front Res Metrics Analyt 3

  121. Jaiswal P, Cooper L, Elser JL, Meier A, Laporte M-A, Mungall C, Smith B, Johnson EKS, Seymour M, Preece J (2016) Planteome: a resource for common reference ontologies and applications for plant biology

  122. Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22:1601–1606

    Article  Google Scholar 

  123. Jirtle RL (2014) The Agouti mouse: a biosensor for environmental epigenomics studies investigating the developmental origins of health and disease. Epigenomics 6:447–450. https://doi.org/10.2217/epi.14.58

    Article  Google Scholar 

  124. Jones-Rhoades MW, Borevitz JO, Preuss D (2007) Genome-wide expression profiling of the Arabidopsis female gametophyte identifies families of small. secreted proteins. PLoS Genet 3:1848–1861

    Article  Google Scholar 

  125. Kadota K, Nishimura S-I, Bono H, Nakamura S, Hayashizaki Y, Okazaki Y, Takahashi K (2003) Detection of genes with tissue-specific expression patterns using Akaike’s information criterion procedure. Physiol Genom 12:251–259. https://doi.org/10.1152/physiolgenomics.00153.2002

    Article  Google Scholar 

  126. Kahraman A, Avramov A, Nashev LG, Popov D, Ternes R, Pohlenz H-D, Weiss B (2005) PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics 21:418–420

    Article  Google Scholar 

  127. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucl Acids Res 28:27–30. https://doi.org/10.1093/nar/28.1.27

    Article  Google Scholar 

  128. Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A (2010) Gene expression atlas at the European bioinformatics institute. Nucl Acids Res 38:D690–D698. https://doi.org/10.1093/nar/gkp936

    Article  Google Scholar 

  129. Karolchik D, Hinrichs AS, Kent WJ (2009) The UCSC genome browser. Curr Protoc Bioinformatics CHAPTER:Unit1.4. https://doi.org/10.1002/0471250953.bi0104s28

  130. Karow J (2015) Qiagen launches GeneReader NGS System at AMP; presents performance evaluation by broad. GenomeWeb, molecular-diagnostics/qiagen-launches-genereader-ngs-system-amp-presents-performance-evaluation 10:12885–017.

  131. Kato K, Ishiwa A (2015) The role of carbohydrates in infection strategies of enteric pathogens. Trop Med Health 43:41–52. https://doi.org/10.2149/tmh.2014-25

    Article  Google Scholar 

  132. Kaur AP, Bhardwaj S, Dhanjal DS, Nepovimova E, Cruz-Martins N, Kuča K, Chopra C, Singh R, Kumar H, Șen F, Kumar V, Verma R, Kumar D (2021) Plant prebiotics and their role in the amelioration of diseases. Biomolecules 11:234. https://doi.org/10.3390/biom11030440

    Article  Google Scholar 

  133. Kechagia M, Basoulis D, Konstantopoulou S, Dimitriadi D, Gyftopoulou K, Skarmoutsou N, Fakiri EM (2013) Health benefits of probiotics: a review. ISRN Nutr 2013:481651. https://doi.org/10.5402/2013/481651

    Article  Google Scholar 

  134. Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233. https://doi.org/10.1007/978-1-4939-3369-3_13

    Article  Google Scholar 

  135. Kellman BP, Lewis NE (2021) Big-data glycomics: tools to connect glycan biosynthesis to extracellular communication. Trends Biochem Sci 46:284–300. https://doi.org/10.1016/j.tibs.2020.10.004

    Article  Google Scholar 

  136. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S et al (2009) Human protein reference database—2009 update. Nucleic Acids Res 37:D767–D772. https://doi.org/10.1093/nar/gkn892

    Article  Google Scholar 

  137. Khan N, Yaqoob I, Hashem IAT, Inayat Z, Mahmoud Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:e712826. https://doi.org/10.1155/2014/712826

    Article  Google Scholar 

  138. Khoroshevskyi O, LeRoy N, Reuter VP, Sheffield NC (2023) GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA. Bioinformatics 39:btad069. https://doi.org/10.1093/bioinformatics/btad069

    Article  Google Scholar 

  139. Kim M-S, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S (2014) A draft map of the human proteome. Nature 509:575–581

    Article  Google Scholar 

  140. Kind T, Scholz M, Fiehn O (2009) How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS ONE 4(5):e5440

    Article  Google Scholar 

  141. Kircher M, Kelso J (2010) High-throughput DNA sequencing—concepts and limitations. BioEssays 32:524–536

    Article  Google Scholar 

  142. Knudsen M, Wiuf C (2010) The CATH database. Hum Genom 4:207–212. https://doi.org/10.1186/1479-7364-4-3-207

    Article  Google Scholar 

  143. Koslicki D, Foucart S, Rosen G (2014) WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS ONE 9:e91784. https://doi.org/10.1371/journal.pone.0091784

    Article  Google Scholar 

  144. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J (2008) Phylogenetic classification of short environmental DNA fragments. Nucl Acids Res 36:2230–2239. https://doi.org/10.1093/nar/gkn038

    Article  Google Scholar 

  145. Kristensen AR, Gsponer J, Foster LJA (2012) high-throughput approach for measuring temporal changes in the interactome. Nat Methods 9:907–909

    Article  Google Scholar 

  146. Kulak NA, Pichler G, Paron I, Nagaraj N, Mann MM (2014) encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324

    Article  Google Scholar 

  147. Kurc T, Qi X, Wang D, Wang F, Teodoro G, Cooper L, Nalisnik M, Yang L, Saltz J, Foran DJ (2015) Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies. BMC Bioinform 16:399. https://doi.org/10.1186/s12859-015-0831-6

    Article  Google Scholar 

  148. Kv V, Sa D, Jd D (2009) Next-generation sequencing: from basic research to diagnostics. Clin Chem. https://doi.org/10.1373/clinchem.2008.112789

    Article  Google Scholar 

  149. Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CS-O, Aparicio S, Baaijens J, Balvert M, de Barbanson B, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo T-H, Lelieveldt BPF, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Raczkowski L, Reinders M, de Ridder J, Saliba A-E, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A (2020) Eleven grand challenges in single-cell data science. Genome Biol 21:31. https://doi.org/10.1186/s13059-020-1926-6

    Article  Google Scholar 

  150. Langevin SM, Kelsey KT (2013) The fate is not always written in the genes: epigenomics in epidemiologic studies. Environ Mol Mutagen 54:533–541. https://doi.org/10.1002/em.21762

    Article  Google Scholar 

  151. Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, ur-Rehman S, et al (2015) The European genome-phenome archive of human data consented for biomedical research. Nat Genet 47:692–695. https://doi.org/10.1038/ng.3312

    Article  Google Scholar 

  152. Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J et al (2013) dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res 41:D936–D941. https://doi.org/10.1093/nar/gks1213

    Article  Google Scholar 

  153. Larance M, Ahmad Y, Kirkwood KJ, Ly T, Lamond AI (2013) Global subcellular characterization of protein degradation using quantitative proteomics. Mol Cell 12:638–650

    Google Scholar 

  154. Larmande P, Gay C, Lorieux M, Périn C, Bouniol M, Droc G, Sallaud C, Perez P, Barnola I, Biderre-Petit C, Martin J, Morel JB, Johnson AAT, Bourgis F, Ghesquière A, Ruiz M, Courtois B, Guiderdoni E (2008) Oryza Tag Line, a phenotypic mutant database for the Genoplante rice insertion line library. Nucl Acids Res 36:D1022-1027. https://doi.org/10.1093/nar/gkm762

    Article  Google Scholar 

  155. Larsen JEP, Lund O, Nielsen M (2006) Improved method for predicting linear B-cell epitopes. Immunome Res 2:1–7

    Article  Google Scholar 

  156. Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucl Acids Res 32:D393–D397

    Article  Google Scholar 

  157. Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158-162. https://doi.org/10.1093/nar/gkj002

    Article  Google Scholar 

  158. Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714. https://doi.org/10.1093/bioinformatics/btn025

    Article  Google Scholar 

  159. Li Y, Chen L (2014) Big biological data: challenges and opportunities. Genom Proteom Bioinform 12:187–189. https://doi.org/10.1016/j.gpb.2014.10.001

    Article  Google Scholar 

  160. Liang K, Sakakibara Y (2021) MetaVelvet-DL: a MetaVelvet deep learning extension for de novo metagenome assembly. BMC Bioinform 22:427. https://doi.org/10.1186/s12859-020-03737-6

    Article  Google Scholar 

  161. Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M (2011) Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genom 12(Suppl 2):S4. https://doi.org/10.1186/1471-2164-12-S2-S4

    Article  Google Scholar 

  162. Liu Q, Guo Y, Li J, Long J, Zhang B, Shyr Y (2012) Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data. BMC Genom 13(Suppl 8):S8. https://doi.org/10.1186/1471-2164-13-S8-S8

    Article  Google Scholar 

  163. Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: A database for tissue-specific gene expression and regulation. BMC Bioinform 9:271. https://doi.org/10.1186/1471-2105-9-271

    Article  Google Scholar 

  164. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI (2006) OPM: orientations of proteins in membranes database. Bioinformatics 22:623–625. https://doi.org/10.1093/bioinformatics/btk023

    Article  Google Scholar 

  165. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569

    Article  Google Scholar 

  166. Luan H, Geczy P, Lai H, Gobert J, Yang SJH, Ogata H, Baltes J, Guerra R, Li P, Tsai C-C (2020) Challenges and future directions of big data and artificial intelligence in education. Front Psychol 11

  167. Luo C, Rodriguez-r LM, Konstantinidis KT (2014) MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucl Acids Res 42:e73–e73

    Article  Google Scholar 

  168. Ly T, Endo A, Brenes A, Gierlinski M, Afzal V, Pawellek A, Lamond AI (2018) Proteome-wide analysis of protein abundance and turnover remodelling during oncogenic transformation of human breast epithelial cells. Wellcome Open Res 3:51. https://doi.org/10.12688/wellcomeopenres.14392.1

    Article  Google Scholar 

  169. MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103. https://doi.org/10.1186/gb-2009-10-10-r103

    Article  Google Scholar 

  170. MacDonald NJ, Parks DH, Beiko RG (2012) Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucl Acids Res 40:e111. https://doi.org/10.1093/nar/gks335

    Article  Google Scholar 

  171. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucl Acids Res 47:W636–W641. https://doi.org/10.1093/nar/gkz268

    Article  Google Scholar 

  172. Markowitz VM, Chen I-MA, Chu K, Szeto E, Palaniappan K, Pillay M, Ratner A, Huang J, Pagani I, Tringe S, Huntemann M, Billis K, Varghese N, Tennessen K, Mavromatis K, Pati A, Ivanova NN, Kyrpides NC (2014) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucl Acids Res 42:D568-573. https://doi.org/10.1093/nar/gkt919

    Article  Google Scholar 

  173. Markowitz VM, Chen I-MA, Chu K, Szeto E, Palaniappan K, Jacob B et al (2012) IMG/M-HMP: a metagenome comparative analysis system for the human microbiome project. PLoS ONE 7:e40151. https://doi.org/10.1371/journal.pone.0040151

    Article  Google Scholar 

  174. Marx V (2013) Biology: the big challenges of big data. Nature 498:255–260

    Article  Google Scholar 

  175. Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T (2017) DNA data bank of Japan. Nucl Acids Res 45:D25–D31. https://doi.org/10.1093/nar/gkw1001

    Article  Google Scholar 

  176. McClatchy DB, Liao LJ, Lee JH, Park SK, Yates JR (2012) Dynamics of subcellular proteomes during brain development. J Proteome Res 11:2467–2479

    Article  Google Scholar 

  177. McGeary SE, Lin KS, Shi CY, Pham TM, Bisaria N, Kelley GM et al (2019) The biochemical basis of microRNA targeting efficacy. Science (New York, NY) 366:234. https://doi.org/10.1126/science.aav1741

    Article  Google Scholar 

  178. McHardy AC, Martín HG, Tsirigos A, Hugenholtz P, Rigoutsos I (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72. https://doi.org/10.1038/nmeth976

    Article  Google Scholar 

  179. Mcwilliam H, Valentin F, Goujon M, Li W, Narayanasamy M, Martin J, Miyar T, Lopez R (2009) Web services at the European bioinformatics institute-2009. Nucleic Acids Res 37:W6–W10. https://doi.org/10.1093/nar/gkp302

    Article  Google Scholar 

  180. Merchant CA, Healy K, Wanunu M, Ray V, Peterman N, Bartel J, Fischbein MD, Venta K, Luo Z, Johnson ATC, Drndić M (2010) DNA translocation through graphene nanopores. Nano Lett 10:2915–2921. https://doi.org/10.1021/nl101046t

    Article  Google Scholar 

  181. Merelli I, Pérez-Sánchez H, Gesing S, D’Agostino D (2014) Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives. Biomed Res Int 2014:e134023. https://doi.org/10.1155/2014/134023

    Article  Google Scholar 

  182. Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M et al (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30:31–34

    Article  Google Scholar 

  183. Meyers BC, Souret FF, Lu C, Green PJ (2006) Sweating the small stuff: microRNA discovery in plants. Curr Opin 17:139–146

    Google Scholar 

  184. Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S et al (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res 33:D284–D288

    Article  Google Scholar 

  185. Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090

    Article  Google Scholar 

  186. Mir RR, Reynolds M, Pinto F, Khan MA, Bhat MA (2019) High-throughput phenotyping for crop improvement in the genomics era. In: Plant science, the 4th international plant phenotyping symposium 282, pp 60–72. https://doi.org/10.1016/j.plantsci.2019.01.007

  187. Mohammed MH, Ghosh TS, Singh NK, Mande SS (2011) SPHINX–an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27:22–30. https://doi.org/10.1093/bioinformatics/btq608

    Article  Google Scholar 

  188. Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS (2009) SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25:1722–1730. https://doi.org/10.1093/bioinformatics/btp317

    Article  Google Scholar 

  189. Moraes G, de Almeida LC (2020) Chapter 11—nutrition and functional aspects of digestion in fish. In: Baldisserotto B, Urbinati EC, Cyrino JEP (eds) Biology and physiology of freshwater neotropical fish. Academic Press, New York, pp 251–271. https://doi.org/10.1016/B978-0-12-815872-2.00011-7

    Chapter  Google Scholar 

  190. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–264

    Article  Google Scholar 

  191. Naegle KM, White FM, Lauffenburger DA, Yaffe MB (2012) Robust co regulation of tyrosine phosphorylation sites on proteins reveals novel protein interactions. Mol Biosyst 8:2771–2782

    Article  Google Scholar 

  192. Nieduszynski CA, Hiraga S, Ak P, Benham CJ, Donaldson AD (2007) OriDB: a DNA replication origin database. Nucleic Acids Res 35:D40–D46

    Article  Google Scholar 

  193. Nikolskiy I, Mahieu NG, Y-j C et al (2013) An untargeted metabolomic workflow to improve structural characterization of metabolites. Anal Chem 85:7713–7719

    Article  Google Scholar 

  194. O’Donoghue SI (2021) Grand challenges in bioinformatics data visualization. Front Bioinform 1

  195. Ohtsu K et al (2007) Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.). Plant J 52:391–404

    Article  Google Scholar 

  196. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733-745. https://doi.org/10.1093/nar/gkv1189

    Article  Google Scholar 

  197. Örd T, Õunap K, Stolze LK, Aherrahrou R, Nurminen V, Toropainen A, Selvarajan I, Lönnberg T, Aavik E, Ylä-Herttuala S, Civelek M, Romanoski CE, Kaikkonen MU (2021) Single-cell epigenomics and functional fine-mapping of atherosclerosis GWAS Loci. Circ Res 129:240–258. https://doi.org/10.1161/CIRCRESAHA.121.318971

    Article  Google Scholar 

  198. Pal S, Mondal S, Das G, Khatua S, Ghosh Z (2020) Big data in biology: the hope and present-day challenges in it. Gene Rep 21:100869. https://doi.org/10.1016/j.genrep.2020.100869

    Article  Google Scholar 

  199. Papatheodorou I, Fonseca NA, Keays M, Tang YA, Barrera E, Bazant W, Burke M, Füllgrabe A, Fuentes AM-P, George N, Huerta L, Koskinen S, Mohammed S, Geniza M, Preece J, Jaiswal P, Jarnuczak AF, Huber W, Stegle O, Vizcaino JA, Brazma A, Petryszak R (2018) Expression Atlas: gene and protein expression across multiple studies and organisms. Nucl Acids Res 46:D246–D251. https://doi.org/10.1093/nar/gkx1158

    Article  Google Scholar 

  200. Park SK et al (2014) Census 2: isobaric labeling data analysis. Bioinformatics 30:2208–2209

    Article  Google Scholar 

  201. Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl Acids Res 39:D1002–D1004. https://doi.org/10.1093/nar/gkq1040

    Article  Google Scholar 

  202. Pati A, Heath LS, Kyrpides NC, Ivanova N (2011) ClaMS: a classifier for metagenomic sequences. Stand Genomic Sci 5:248–253. https://doi.org/10.4056/sigs.2075298

    Article  Google Scholar 

  203. Patti GJ, Yanes O, Siuzdak G (2012) Metabolomics the apogee of the omics trilogy. Nat Rev Mol Cell Biol 13:263–269

    Article  Google Scholar 

  204. Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA et al (2023) InterPro in 2022. Nucleic Acids Res 51:D418–D427. https://doi.org/10.1093/nar/gkac993

    Article  Google Scholar 

  205. Peterlongo P, Chikhi R (2012) Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinform 13:48. https://doi.org/10.1186/1471-2105-13-48

    Article  Google Scholar 

  206. Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98:9748–9753. https://doi.org/10.1073/pnas.171285098

    Article  MathSciNet  Google Scholar 

  207. Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H et al (2011) ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 39:D465–D474. https://doi.org/10.1093/nar/gkq1091

    Article  Google Scholar 

  208. Pj F, Jh M, Hr K (2021) The phenomics and genetics of addictive and affective comorbidity in opioid use disorder. Drug Alcohol Depend 221:234. https://doi.org/10.1016/j.drugalcdep.2021.108602

    Article  Google Scholar 

  209. Pollet N, Schmidt HA, Gawantka V, Vingron M, Niehrs C (2000) Axeldb: a Xenopus laevis database focusing on gene expression. Nucl Acids Res 28:139–140. https://doi.org/10.1093/nar/28.1.139

    Article  Google Scholar 

  210. Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P (2014) eggNOG v4.0: nested orthology inference across 3686 organisms. Nucl Acids Res 42:D231–D239. https://doi.org/10.1093/nar/gkt1253

    Article  Google Scholar 

  211. Prestat E, David MM, Hultman J, Taş N, Lamendella R, Dvornik J, Mackelprang R, Myrold DD, Jumpponen A, Tringe SG, Holman E, Mavromatis K, Jansson JK (2014) FOAM (functional ontology assignments for metagenomes): a hidden Markov model (HMM) database with environmental focus. Nucl Acids Res 42:e145. https://doi.org/10.1093/nar/gku702

    Article  Google Scholar 

  212. Raghavendra P, Pullaiah T (2018) Chapter 7—pathogen identification using novel sequencing methods. In: Raghavendra P, Pullaiah T (eds) Advances in cell and molecular diagnostics. Academic Press, New York, pp 161–202. https://doi.org/10.1016/B978-0-12-813679-9.00007-5

    Chapter  Google Scholar 

  213. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3. https://doi.org/10.1186/2047-2501-2-3

    Article  Google Scholar 

  214. Rangwala SH, Kuznetsov A, Ananiev V, Asztalos A, Borodin E, Evgeniev V et al (2021) Accessing NCBI data using the NCBI sequence viewer and genome data viewer (GDV). Genome Res 31:159–169. https://doi.org/10.1101/gr.266932.120

    Article  Google Scholar 

  215. Renuse S, Chaerkady R, Pandey A (2011) Proteogenomics. Proteomics 11:620–630

    Article  Google Scholar 

  216. Reuter JA, Spacek D, Snyder MP (2015) High-throughput sequencing technologies. Mol Cell 58:586–597. https://doi.org/10.1016/j.molcel.2015.05.004

    Article  Google Scholar 

  217. Rhee J-S, Yu IT, Kim B-M, Jeong C-B, Lee K-W, Kim M-J, Lee S-J, Park GS, Lee J-S (2013) Copper induces apoptotic cell death through reactive oxygen species-triggered oxidative stress in the intertidal copepod Tigriopus japonicus. Aquat Toxicol 132–133:182–189. https://doi.org/10.1016/j.aquatox.2013.02.013

    Article  Google Scholar 

  218. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucl Acids Res 38:e191. https://doi.org/10.1093/nar/gkq747

    Article  Google Scholar 

  219. Rigden DJ, Fernández XM (2022) The 2022 nucleic acids research database issue and the online molecular biology database collection. Nucl Acids Res 50:D1–D10. https://doi.org/10.1093/nar/gkab1195

    Article  Google Scholar 

  220. Rigden DJ, Fernández XM (2021) The 2021 nucleic acids research database issue and the online molecular biology database collection. Nucl Acids Res 49:D1–D9. https://doi.org/10.1093/nar/gkaa1216

    Article  Google Scholar 

  221. Ristevski B, Chen M (2018) Big data analytics in medicine and healthcare. J Integr Bioinform 15:20170030. https://doi.org/10.1515/jib-2017-0030

    Article  Google Scholar 

  222. RNAcentral (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res 45:128–134. https://doi.org/10.1093/nar/gkw1008

  223. Robinson C (1994) The European Bioinformatics Institute (EBI)—open for business. Trends Biotechnol 12:391–392. https://doi.org/10.1016/0167-7799(94)90024-8

    Article  Google Scholar 

  224. Robison K (2022) 2022: a wild year for short reads in genome sequencing? GEN Biotechnol 1:40–42

    Article  Google Scholar 

  225. Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM (eds) (2012) Bioinformatics for high throughput sequencing. Springer, New York. https://doi.org/10.1007/978-1-4614-0782-9

    Book  Google Scholar 

  226. Rosen GL, Reichenberger ER, Rosenfeld AM (2011) NBC: the Naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27:127–129. https://doi.org/10.1093/bioinformatics/btq619

    Article  Google Scholar 

  227. Roux KJ, Kim DI, Raida M, Burke BA (2012) promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. J Cell Biol 196:801–810

    Article  Google Scholar 

  228. Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y et al (2008) TreeFam: 2008 update. Nucleic Acids Res 36:D735–D740. https://doi.org/10.1093/nar/gkm1005

    Article  Google Scholar 

  229. Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M et al (2010) GeneCards version 3: the human gene integrator. Database (Oxford) 2010:baq020. https://doi.org/10.1093/database/baq020

    Article  Google Scholar 

  230. Sai Lakshmi S, Agrawal S (2008) piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res 36:D173–D177. https://doi.org/10.1093/nar/gkm696

    Article  Google Scholar 

  231. Saito T, Ariizumi T, Okabe Y, Asamizu E, Hiwasa-Tanase K, Fukuda N, Mizoguchi T, Yamazaki Y, Aoki K, Ezura H (2011) TOMATOMA: a novel tomato mutant database distributing Micro-Tom mutant collections. Plant Cell Physiol 52:283–296

    Article  Google Scholar 

  232. Salek RM, Steinbeck C, Viant MR et al (2013) The role of reporting standards for metabolite annotation and identification in metabolomic studies. Gigascience 2:1

    Article  Google Scholar 

  233. Sallet E, Gouzy J, Schiex T (2019) EuGene: an automated integrative gene finder for eukaryotes and prokaryotes. Methods Mol Biol 1962:97–120. https://doi.org/10.1007/978-1-4939-9173-0_6

    Article  Google Scholar 

  234. Samaras P, Schmidt T, Frejno M, Gessulat S, Reinecke M, Jarzab A, Zecha J, Mergner J, Giansanti P, Ehrlich H-C, Aiche S, Rank J, Kienegger H, Krcmar H, Kuster B, Wilhelm M (2020) ProteomicsDB: a multi-omics and multi-organism resource for life science research. Nucl Acids Res 48:D1153–D1163. https://doi.org/10.1093/nar/gkz974

    Article  Google Scholar 

  235. Sato K, Sakakibara Y (2015) MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res 22:69–77

    Article  Google Scholar 

  236. Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Sherry ST, Yankie L, Karsch-Mizrachi I (2023) GenBank 2023 update. Nucl Acids Res 51:D141–D144. https://doi.org/10.1093/nar/gkac1012

    Article  Google Scholar 

  237. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH (2009) PID: the pathway interaction database. Nucl Acids Res 37:D674–D679

    Article  Google Scholar 

  238. Schatz MC (2015) Biological data sciences in genome research. Genome Res 25:1417–1422. https://doi.org/10.1101/gr.191684.115

    Article  Google Scholar 

  239. Schicho R, Shaykhutdinov R, Ngo J et al (2012) Quantitative metabolomic profiling of serum, plasma, and urine by (1)H NMR spectroscopy discriminates between patients with inflammatory bowel disease and healthy individuals. J Proteome Res 11:3344–3357

    Article  Google Scholar 

  240. Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5:e75. https://doi.org/10.1371/journal.pbio.0050075

    Article  Google Scholar 

  241. Sethupathy P, Corda B, Hatzigeorgiou AG (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 12:192–197. https://doi.org/10.1261/rna.2239606

    Article  Google Scholar 

  242. Sharon D, Tilgner H, Grubert F, Snyder MA (2013) single-molecule long-read survey of the human transcriptome. Nat 31:1009–1014

    Google Scholar 

  243. Sharon N, Ofek I (2000) Safe as mother’s milk: carbohydrates as future anti-adhesion drugs for bacterial diseases. Glycoconj J 17:659–664. https://doi.org/10.1023/a:1011091029973

    Article  Google Scholar 

  244. Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP, Dickerson JA (2005) BarleyBase–an expression profiling database for plant genomics. Nucl Acids Res 33:D614-618. https://doi.org/10.1093/nar/gki123

    Article  Google Scholar 

  245. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol İ (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123. https://doi.org/10.1101/gr.089532.108

    Article  Google Scholar 

  246. Slatko BE, Gardner AF, Ausubel FM (2018) Overview of next generation sequencing technologies. Curr Protoc Mol Biol 122:e59. https://doi.org/10.1002/cpmb.59

    Article  Google Scholar 

  247. Slavin J (2013) Fiber and prebiotics: mechanisms and health benefits. Nutrients 5:1417–1435. https://doi.org/10.3390/nu5041417

    Article  Google Scholar 

  248. Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL (2018) WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucl Acids Res 46:D661–D667. https://doi.org/10.1093/nar/gkx1064

    Article  Google Scholar 

  249. Smigielski EM, Sirotkin K, Ward M, Sherry ST (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28:352–355

    Article  Google Scholar 

  250. Sreenivasan VKA, Henck J, Spielmann M (2022) Single-cell sequencing: promises and challenges for human genetics. Med Gen 34:261–273. https://doi.org/10.1515/medgen-2022-2156

    Article  Google Scholar 

  251. Stehr H, Duarte JM, Lappe M, Bhak J, Bolser DM (2010) PDBWiki: added value through community annotation of the protein data bank. Database (Oxford) 2010:baq009. https://doi.org/10.1093/database/baq009

    Article  Google Scholar 

  252. Su X, Xu J, Ning K (2012) Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Syst Biol 6(Suppl 1):S16. https://doi.org/10.1186/1752-0509-6-S1-S16

    Article  Google Scholar 

  253. Subramanian I, Verma S, Kumar S, Jere A, Anamika K (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051. https://doi.org/10.1177/1177932219899051

    Article  Google Scholar 

  254. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81

    Article  Google Scholar 

  255. Suhre K, Claverie J-M (2004) FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res 32:D273-276. https://doi.org/10.1093/nar/gkh053

    Article  Google Scholar 

  256. Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger SA, Kultima JR, Coelho LP, Arumugam M, Tap J, Nielsen HB, Rasmussen S, Brunak S, Pedersen O, Guarner F, de Vos WM, Wang J, Li J, Doré J, Ehrlich SD, Stamatakis A, Bork P (2013) Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods 10:1196–1199. https://doi.org/10.1038/nmeth.2693

    Article  Google Scholar 

  257. Sunkin SM, Ng L, Lau C, Dolbeare T, Gilbert TL, Thompson CL, Hawrylycz M, Dang C (2013) Allen brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucl Acids Res 41:D996–D1008. https://doi.org/10.1093/nar/gks1042

    Article  Google Scholar 

  258. Tanizawa Y, Fujisawa T, Kodama Y, Kosuge T, Mashima J, Tanjo T, Nakamura Y (2023) DNA Data Bank of Japan (DDBJ) update report 2022. Nucl Acids Res 51:D101–D105. https://doi.org/10.1093/nar/gkac1083

    Article  Google Scholar 

  259. Teeling H, Waldmann J, Lombardot T, Bauer M, Glöckner FO (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinform 5:163. https://doi.org/10.1186/1471-2105-5-163

    Article  Google Scholar 

  260. Thompson JF, Steinmann KE (2010) Single molecule sequencing with a heliscope genetic analysis system. Curr Protoc Mol Biol. https://doi.org/10.1002/0471142727.mb0710s92

    Article  Google Scholar 

  261. Tinnikov AA, Samuels HHA (2013) novel cell lysis approach reveals that caspase 2 rapidly translocates from the nucleus to the cytoplasm in response to apoptotic stimuli. PLoS ONE 8:e61085

    Article  Google Scholar 

  262. Tobi EW, van Zwet EW, Lumey LH, Heijmans BT (2018) Why mediation analysis trumps Mendelian randomization in population epigenomics studies of the Dutch Famine. https://doi.org/10.1101/362392

  263. Tolani P, Gupta S, Yadav K, Aggarwal S, Yadav AK (2021) Chapter FourBig data, integrative omics and network biology. In: Donev R, Karabencheva-Christova T (eds) Advances in protein chemistry and structural biology, proteomics and systems biology. Academic Press, New York, pp 127–160. https://doi.org/10.1016/bs.apcsb.2021.03.006

    Chapter  Google Scholar 

  264. Torres TT, Metta M, Ottenwalder B, Schlotterer C (2008) Gene expression profiling by massively parallel sequencing. Genome Res 18:172–177

    Article  Google Scholar 

  265. Toth AL et al (2007) Wasp gene expression supports an evolutionary link between maternal behavior and eusociality. Science 318:441–444

    Article  Google Scholar 

  266. Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14:R2

    Article  Google Scholar 

  267. Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979. https://doi.org/10.1093/nar/gkt1211

    Article  Google Scholar 

  268. Tucker T, Marra M, Friedman JM (2009) Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet 85:142–154. https://doi.org/10.1016/j.ajhg.2009.06.022

    Article  Google Scholar 

  269. Uchiyama I (2007) MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups. Nucleic Acids Res 35:D343–D346

    Article  Google Scholar 

  270. Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S (2010) Towards a knowledge-based human protein atlas. Nat Biotechnol 28:1248–1250

    Article  Google Scholar 

  271. Ullah S, Rahman W, Ullah F, Ahmad G, Ijaz M, Gao T (2021) DBHR: a collection of databases relevant to human research. Future Sci OA 8:FSO780. https://doi.org/10.2144/fsoa-2021-0101

    Article  Google Scholar 

  272. Via M, Gignoux C, Burchard EG (2010) The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med 2:3. https://doi.org/10.1186/gm124

    Article  Google Scholar 

  273. Viant MR, Sommer U (2012) Mass spectrometry based environmental metabolomics: a primer and review. Metabolomics 9:144–158

    Article  Google Scholar 

  274. Visel A, Thaller C, Eichele G (2004) GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucl Acids Res 32:D552–D556. https://doi.org/10.1093/nar/gkh029

    Article  Google Scholar 

  275. Vizcaíno JA et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226

    Article  Google Scholar 

  276. Volders P-J, Anckaert J, Verheggen K, Nuytens J, Martens L, Mestdagh P et al (2019) LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 47:D135–D139. https://doi.org/10.1093/nar/gky1031

    Article  Google Scholar 

  277. von Itzstein M, Moran AP (2010) Chapter 50—future potential of glycomics in microbiology and infectious diseases. In: Holst O, Brennan PJ, von Itzstein M, Moran AP (eds) Microbial glycobiology. Academic Press, San Diego, pp 981–986. https://doi.org/10.1016/B978-0-12-374546-0.00050-X

    Chapter  Google Scholar 

  278. Vulimiri SV, Sonawane BR, Szabo DT (2014) Systems biology application in toxicology. In: Wexler P (ed) Encyclopedia of toxicology, 3rd edn. Academic Press, Oxford, pp 454–458. https://doi.org/10.1016/B978-0-12-386454-3.01047-2

    Chapter  Google Scholar 

  279. Wang FJ et al (2010) Fractionation of phosphopeptides on strong anion-exchange capillary trap column for large-scale phosphoproteome analysis of microgram samples. J Seper Sci 33:1879–1887

    Article  Google Scholar 

  280. Wang W, Song X, Wang L, Song L (2018) Pathogen-derived carbohydrate recognition in molluscs immune defense. Int J Mol Sci 19:721. https://doi.org/10.3390/ijms19030721

    Article  Google Scholar 

  281. Wang X, Wang Y, Yue B, Zhang X, Liu S (2013) The complete mitochondrial genome of the Bufo tibetanus (Anura: Bufonidae). Mitochondrial DNA 24:186–188. https://doi.org/10.3109/19401736.2012.744978

    Article  Google Scholar 

  282. Wang Y, Kung L, Wang WYC, Cegielski CG (2018) An integrated big data analytics-enabled transformation model: application to health care. Inf Manag 55:64–79. https://doi.org/10.1016/j.im.2017.04.001

    Article  Google Scholar 

  283. Wang Y, Leung H, Yiu S, Chin F (2014) MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genom 15(Suppl 1):S12. https://doi.org/10.1186/1471-2164-15-S1-S12

    Article  Google Scholar 

  284. Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S (2002) Gramene: a resource for comparative grass genomics. Nucl Acids Res 30:103–105

    Article  Google Scholar 

  285. Waters M, Stasiewicz S, Alex Merrick B, Tomer K, Bushel P, Paules R et al (2007) CEBS—chemical effects in biological systems: a public data repository integrating study design and toxicity data with microarray and proteomics data. Nucleic Acids Res 36:D892-900

    Article  Google Scholar 

  286. Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 144:32–42

    Article  Google Scholar 

  287. Wei G, Hu R, Li Q, Lu W, Liang H, Nan H, Lu J, Li J, Zhao Q (2022) Oligonucleotide discrimination enabled by tannic acid-coordinated film-coated solid-state nanopores. Langmuir 38:6443–6453. https://doi.org/10.1021/acs.langmuir.2c00638

    Article  Google Scholar 

  288. Wei W, Yeung ES (2000) Improvements in DNA sequencing by capillary electrophoresis at elevated temperature using poly(ethylene oxide) as a sieving matrix. J Chromatogr B Biomed Sci Appl 745:221–230. https://doi.org/10.1016/S0378-4347(00)00069-4

    Article  Google Scholar 

  289. Wilhelm M et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587

    Article  Google Scholar 

  290. Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, Djoumbou Y, Mandal R, Aziat F, Dong E (2012) HMDB 3.0—the human metabolome database in 2013. Nucl Acids Res 41:D801–D807

    Article  Google Scholar 

  291. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. https://doi.org/10.1186/gb-2014-15-3-r46

    Article  Google Scholar 

  292. Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291

    Article  Google Scholar 

  293. Xu Q, Dunbrack RL (2011) The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms. Nucleic Acids Res 39:D761–D770. https://doi.org/10.1093/nar/gkq1059

    Article  Google Scholar 

  294. Yang Y, Wang D, Miao Y-R, Wu X, Luo H, Cao W et al (2023) lncRNASNP v3: an updated database for functional variants in long non-coding RNAs. Nucleic Acids Res 51:D192–D198. https://doi.org/10.1093/nar/gkac981

    Article  Google Scholar 

  295. Yao T, Chen M-H, Lindemann SR (2020) Structurally complex carbohydrates maintain diversity in gut-derived microbial consortia under high dilution pressure. FEMS Microbiol Ecol 96:finaa1158. https://doi.org/10.1093/femsec/fiaa158

    Article  Google Scholar 

  296. Ye Y, Tang H (2009) An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 7:455–471. https://doi.org/10.1142/s0219720009004151

    Article  Google Scholar 

  297. Yuan Z, Wang C, Yi X, Ni Z, Chen Y, Li T (2018) Solid-state nanopore. Nanoscale Res Lett 13:56. https://doi.org/10.1186/s11671-018-2463-z

    Article  Google Scholar 

  298. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. https://doi.org/10.1101/gr.074492.107

    Article  Google Scholar 

  299. Zhang A, Sun H, Wang X (2012) Saliva metabolomics opens door to biomarker discovery, disease diagnosis, and treatment. Appl Biochem Biotechnol 168:1718–1727

    Article  Google Scholar 

  300. Zhang J, Li C, Wu C, Xiong L, Chen G, Zhang Q, Wang S (2006) RMD: a rice mutant database for functional analysis of the rice genome. Nucl Acids Res 34:D745–D748

    Article  Google Scholar 

  301. Zhang Y, Lin J, Zhao L, Zeng X, Liu X (2021) A novel antibacterial peptide recognition algorithm based on BERT. Brief Bioinform 22:bbab200. https://doi.org/10.1093/bib/bbab200

    Article  Google Scholar 

  302. Zhao J, Klyne G, Benson E, Gudmannsdottir E, White-Cooper H, Shotton D (2010) FlyTED: the drosophila testis gene expression database. Nucl Acids Res 38:D710-715. https://doi.org/10.1093/nar/gkp1006

    Article  Google Scholar 

  303. Zhao L, Deng L, Li G, Jin H, Cai J, Shang H, Li Y, Wu H, Xu W, Zeng L, Zhang R, Zhao H, Wu P, Zhou Z, Zheng J, Ezanno P, Yang AX, Yan Q, Deem MW, He J (2017) Single molecule sequencing of the M13 virus genome without amplification. PLoS ONE 12:e0188181. https://doi.org/10.1371/journal.pone.0188181

    Article  Google Scholar 

  304. Zheng H, Wu H (2010) Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol 8:995–1011. https://doi.org/10.1142/s0219720010005051

    Article  Google Scholar 

  305. Zhou B, Xiao JF, Tuli L, Ressom HW (2012) LC-MS-based metabolomics. Mol BioSyst 8:470–481

    Article  Google Scholar 

  306. Zou D, Ma L, Yu J, Zhang Z (2015) Biological databases for human research. Genom Proteom Bioinform 13:55–63. https://doi.org/10.1016/j.gpb.2015.01.006

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Professor Anil Kumar, Director of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, for guidance and valuable suggestions.

Author information

Authors and Affiliations

Authors

Contributions

DBS and RKP developed the idea for this review article and its coverage. JKC, SP, and RJ drafted the article. RKP provided valuable input for the improvement of the article. DBS critically revised the work and updated the manuscript for publication. All authors have read the final manuscript and approved the submission.

Corresponding author

Correspondence to Dev Bukhsh Singh.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaudhari, J.K., Pant, S., Jha, R. et al. Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review. Knowl Inf Syst 66, 3159–3209 (2024). https://doi.org/10.1007/s10115-023-02049-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02049-4

Keywords

Navigation