Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review

Chaudhari, Jyoti Kant; Pant, Shubham; Jha, Richa; Pathak, Rajesh Kumar; Singh, Dev Bukhsh

doi:10.1007/s10115-023-02049-4

Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review

Review
Published: 27 January 2024

Volume 66, pages 3159–3209, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jyoti Kant Chaudhari¹,
Shubham Pant²,
Richa Jha³,
Rajesh Kumar Pathak⁴ &
…
Dev Bukhsh Singh⁵

432 Accesses
Explore all metrics

Abstract

Biological big data are a massive amount of data generated from multi-omics experiments, such as genomics, transcriptomics, proteomics, metabolomics, phenomics, glycomics, epigenomics, and other omics. These data are used to study biological processes and to gain insights into how living systems work. It can also be used to develop new treatments for diseases and understand the causes of certain conditions. The storage and analysis of these data present several challenges owing to their sheer size and complexity. Storing these data efficiently requires a large amount of storage space and processing power. Furthermore, there are certain limitations in terms of the kind of insights that can be gained from multi-omics data because of their complexity. Despite these challenges, biological big data offers great potential for advancing our understanding of biology and developing new treatments for diseases. Big-data research is a rapidly growing field, with numerous applications. As the amount of data continues to increase, it is important to understand its storage, utility, limitations, and challenges. In this review article, various sources of big-data research and their storage capacities, limitations, and challenges are discussed. Factors affecting the data quality and accuracy have been reported. It will be helpful for researchers to understand the available big data in biology for their further utilization and integration into novel discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

Quantitative Mass Spectrometry-Based Proteomics: An Overview

References

Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security and privacy. J Big Data 5:1. https://doi.org/10.1186/s40537-017-0110-7
Article Google Scholar
Abriata LA (2017) Structural database resources for biological macromolecules. Brief Bioinform 18:659–669. https://doi.org/10.1093/bib/bbw049
Article Google Scholar
Agapito G, Pastrello C, Guzzi PH, Jurisica I, Cannataro M (2020) BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways. Bioinformatics 36:4377–4378. https://doi.org/10.1093/bioinformatics/btaa529
Article Google Scholar
Alpert AJ (1990) Hydrophilic-interaction chromatography for the separation of peptides, nucleic-acids and other polar compounds. J Chromatogr 499:177–196
Article Google Scholar
Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS (2011) lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39:D146–D151. https://doi.org/10.1093/nar/gkq1138
Article Google Scholar
Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P, Felts B, Nulton J, Mahaffy J, Rohwer F (2005) PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinform 6:41. https://doi.org/10.1186/1471-2105-6-41
Article Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:D115-119. https://doi.org/10.1093/nar/gkh131
Article Google Scholar
Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M (2016) PGP repository: a plant phenomics and genomics data publication infrastructure. Database 2016:baw033
Article Google Scholar
Arumugam M, Harrington ED, Foerstner KU, Raes J, Bork P (2010) SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26:2977–2978. https://doi.org/10.1093/bioinformatics/btq536
Article Google Scholar
Atas E, Singer A, Meller A (2012) DNA sequencing and bar-coding using solid-state nanopores. Electrophoresis 33:3437–3447. https://doi.org/10.1002/elps.201200266
Article Google Scholar
Avner BS, Fialho AM, Chakrabarty AM (2012) Overcoming drug resistance in multi-drug resistant cancers and microorganisms: a conceptual framework. Bioengineered 3:262. https://doi.org/10.4161/bioe.21130
Article Google Scholar
Axtell MJ, Jan C, Rajagopalan R, Bartel DP (2006) A two-hit trigger for siRNA biogenesis in plants. Cell 127:565–577
Article Google Scholar
Bai JPF, Alekseyenko AV, Statnikov A, Wang I-M, Wong PH (2013) Strategic applications of gene expression: from drug discovery/development to bedside. AAPS J 15:427–437. https://doi.org/10.1208/s12248-012-9447-1
Article Google Scholar
Bai W, Yang W, Wang W, Wang Y, Liu C, Jiang Q, Hua J, Liao M (2017) GED: a manually curated comprehensive resource for epigenetic modification of gametogenesis. Brief Bioinform 18:98–104. https://doi.org/10.1093/bib/bbw007
Article Google Scholar
Bainbridge MN et al (2006) Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genom 7:246
Article Google Scholar
Baldock RA (2007) The Edinburgh mouse atlas project: data mapping and spatial organisation. FASEB J 21:A201–A201. https://doi.org/10.1096/fasebj.21.5.A201-b
Article Google Scholar
Baqader NO, Radulovic M, Crawford M, Stoeber K, Godovac-Zimmermann J (2014) Nuclear cytoplasmic trafficking of proteins is a major response of human fibroblasts to oxidative stress. J Proteome Res 13:4398–4423
Article Google Scholar
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995. https://doi.org/10.1093/nar/gks1193
Article Google Scholar
Batth TS, Francavilla C, Olsen JV (2014) Off-line high pH reversed-phase fractionation for in depth phosphoproteomics. J Proteome Res 13:6176–6186
Article Google Scholar
Bennett S (2004) Solexa Ltd. Pharmacogenomics 5:433–438. https://doi.org/10.1517/14622416.5.4.433
Article Google Scholar
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J et al (2013) GenBank. Nucleic Acids Res 41:D36-42. https://doi.org/10.1093/nar/gks1195
Article Google Scholar
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E, Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang G-D, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Sohna ES, Spence J, Stevens EJ, Sutton K, Szajkowski N, Tregidgo L, Turcatti CL, vandeVondele G, Verhovsky S, Virk Y, Wakelin SM, Walcott S, Wang GC, Worsley J, Yan GJ, Yau J, Zuerlein L, Rogers M, Jane Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59. https://doi.org/10.1038/nature07517
Article Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The protein data bank. Nucleic Acids Res 28:235–242. https://doi.org/10.1093/nar/28.1.235
Article Google Scholar
Bhattacharya A, Ziebarth JD, Cui Y (2014) PolymiRTS database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Res 42:D86-91. https://doi.org/10.1093/nar/gkt1028
Article Google Scholar
Bird SS, Marur VR, Sniatynski MJ et al (2011) Serum lipidomics profiling using LC-MS and high-energy collisional dissociation fragmentation: focus on triglyceride detection and characterization. Anal Chem 83:6648–6657
Article Google Scholar
Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L et al (2004) An overview of ensembl. Genome Res 14:925–928. https://doi.org/10.1101/gr.1860604
Article Google Scholar
Blake VC, Birkett C, Matthews DE, Hane DL, Bradbury P, Jannink J-L (2016) The triticeae toolbox: combining phenotype and genotype data to advance small-grains breeding. Plant Genome. https://doi.org/10.3835/plantgenome2014.12.0099
Article Google Scholar
Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P (2007) CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinform 8:209. https://doi.org/10.1186/1471-2105-8-209
Article Google Scholar
Boersema PJ, Raijmakers R, Lemeer S, Mohammed S, Heck AJR (2009) Multiplex peptide stable isotope dimethyl labeling for quantitative proteomics. Nat Protoc 4:484–494
Article Google Scholar
Bono H (2020) All of gene expression (AOE): An integrated index for public gene expression databases. PLoS ONE 15:e0227076. https://doi.org/10.1371/journal.pone.0227076
Article Google Scholar
Bowers J, Mitchell J, Beer E, Buzby PR, Causey M, Efcavitch JW, Jarosz M, Krzymanska-Olejnik E, Kung L, Lipson D, Lowman GM, Marappan S, McInerney P, Platt A, Roy A, Siddiqi SM, Steinmann K, Thompson JF (2009) Virtual Terminator nucleotides for next generation DNA sequencing. Nat Methods 6:593–595. https://doi.org/10.1038/nmeth.1354
Article Google Scholar
Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6:673–676. https://doi.org/10.1038/nmeth.1358
Article Google Scholar
Breker M, Schuldiner M (2014) The emergence of proteome-wide technologies: systematic analysis of proteins comes of age. Nat Rev Mol Cell Biol 15:453–464
Article Google Scholar
Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochimica et Biophysica Acta (BBA) Mol Basis Disease From Genome Funct 1842:1932–1941. https://doi.org/10.1016/j.bbadis.2014.06.015
Article Google Scholar
Burger A, Baldock R, Yang Y, Waterhouse A, Houghton D, Burton N, Davidson D (2002) The Edinburgh mouse atlas and gene-expression database: a spatio-temporal database for biological research. In: proceedings 14th international conference on scientific and statistical database management. Presented at the proceedings 14th international conference on scientific and statistical database management, pp 239. https://doi.org/10.1109/SSDM.2002.1029726
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L et al (2022) RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 51:D488-508. https://doi.org/10.1093/nar/gkac1077
Article Google Scholar
Cases I, Pisano DG, Andres E, Carro A, Fernandez JM, Gomez-Lopez G et al (2007) CARGO: a web portal to integrate customized biological information. Nucleic Acids Res 35:W16-20
Article Google Scholar
Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Berhanu Lemma R, Turchi L, Blanc-Mathieu R et al (2022) JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 50:D165–D173. https://doi.org/10.1093/nar/gkab1113
Article Google Scholar
Chaisson MJ et al (2009) Resolving the complexity of the human genome using single-molecule sequencing. Nature 517:265–270
Google Scholar
Champagne A, Boutry M (2013) Proteomics of nonmodel plant species. Proteomics 13:663–673
Article Google Scholar
Chan C-KK, Hsu AL, Halgamuge SK, Tang S-L (2008) Binning sequences using very sparse labels within a metagenome. BMC Bioinform 9:215. https://doi.org/10.1186/1471-2105-9-215
Article Google Scholar
Chapin N, Sen R (2023) Chapter 12—COVID-19 phenomics. In: Barh D, Azevedo V (eds) Omics approaches and technologies in COVID-19. Academic Press, New York, pp 191–218. https://doi.org/10.1016/B978-0-323-91794-0.00014-7
Chapter Google Scholar
Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L et al (2007) MINT: the molecular INTeraction database. Nucleic Acids Res 35:D572. https://doi.org/10.1093/nar/gkl950
Article Google Scholar
Chen G, Ning B, Shi T (2019) Single-cell RNA-Seq technologies and related computational data analysis. Front Genet 10
Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M et al (2019) LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res 47:D140–D144. https://doi.org/10.1093/nar/gky1051
Article Google Scholar
Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET et al (1998) SGD: saccharomyces genome database. Nucleic Acids Res 26:73–79
Article Google Scholar
Cheung F, Haas BJ, Goldberg SM, May GD, Xiao Y, Town CD (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genom 7:272
Article Google Scholar
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159. https://doi.org/10.1101/gr.1917404
Article Google Scholar
Choi M, Carver J, Chiva C, Tzouros M, Huang T, Tsai T-H, Pullman B, Bernhardt OM, Hüttenhain R, Teo GC, Perez-Riverol Y, Muntel J, Müller M, Goetze S, Pavlou M, Verschueren E, Wollscheid B, Nesvizhskii AI, Reiter L, Dunkley T, Sabidó E, Bandeira N, Vitek O (2020) MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets. Nat Methods 17:981–984. https://doi.org/10.1038/s41592-020-0955-0
Article Google Scholar
Choksi NY, Jahnke GD, St Hilaire C, Shelby M (2003) Role of thyroid hormones in human and laboratory animal reproductive health. Birth Defects Res B Dev Reprod Toxicol 68:479–491
Article Google Scholar
Choubey J, Choudhari JK, Sahariah BP, Verma MK, Banerjee A (2021) Chapter 25—molecular tools: advance approaches to analyze diversity of microbial community. In: Shah MP, Sarkar A, Mandal S (eds) Wastewater treatment. Elsevier, pp 507–520. https://doi.org/10.1016/B978-0-12-821881-5.00025-8
Choubey J, Choudhari JK, Verma MK, Chatterjee T, Sahariah BP (2022) Metagenomics and metatranscriptomic analysis of wastewater. In: Microbial community studies in industrial wastewater treatment. CRC Press
Choudhari JK, Chatterjee T, Gupta S, Garcia-Garcia JG, Vera-González J (2021) Network biology approaches in ophthalmological diseases: a case study of glaucoma. In: Wolkenhauer O (ed) Systems medicine. Academic Press, Oxford, pp 190–202. https://doi.org/10.1016/B978-0-12-801238-3.11586-7
Chapter Google Scholar
Choudhari JK, Choubey J, Verma MK, Chatterjee T, Sahariah BP (2022) Chapter 10—metagenomics: the boon for microbial world knowledge and current challenges. In: Singh DB, Pathak RK (eds) Bioinformatics. Academic Press, New York, pp 159–175. https://doi.org/10.1016/B978-0-323-89775-4.00022-5
Chapter Google Scholar
Chuh KN, Pratt MR (2015) Chemical methods for the proteome-wide identification of posttranslationally modified proteins. Curr Opin Chem 24:27–37
Article Google Scholar
Churbanov A, Ryan R, Hasan N, Bailey D, Chen H, Milligan B, Houde P (2012) HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. Bioinformatics 28:2797–2803. https://doi.org/10.1093/bioinformatics/bts524
Article Google Scholar
Cirillo D, Valencia A (2019) Big data analytics for personalized medicine. Current opinion in biotechnology, systems biology. NanoBiotechnology 58:161–167. https://doi.org/10.1016/j.copbio.2019.03.004
Article Google Scholar
Clark TA, Murray IA, Morgan RD, Kislyuk AO, Spittle KE, Boitano M, Fomenkov A, Roberts RJ, Korlach J (2012) Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucl Acids Res 40:e29. https://doi.org/10.1093/nar/gkr1146
Article Google Scholar
Clarke J, Wu H-C, Jayasinghe L, Patel A, Reid S, Bayley H (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270
Article Google Scholar
Conlon MA, Bird AR (2014) The impact of diet and lifestyle on gut microbiota and human health. Nutrients 7:17–44. https://doi.org/10.3390/nu7010017
Article Google Scholar
Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR (2011) RBPDB: a database of RNA-binding specificities. Nucleic Acids Res 39:D301–D308. https://doi.org/10.1093/nar/gkq1069
Article Google Scholar
Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteom 13:2513–2526
Article Google Scholar
Cui L, Lee YH, Kumar Y et al (2013) Serum metabolome and lipidome changes in adult patients with primary dengue infection. PLoS Negl Trop Dis 7:8
Article Google Scholar
Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. Journal of Big Data 6:1–25
Article Google Scholar
Davani-Davari D, Negahdaripour M, Karimzadeh I, Seifan M, Mohkam M, Masoumi SJ, Berenjian A, Ghasemi Y (2019) Prebiotics: definition, types, sources, mechanisms, and clinical applications. Foods 8:92. https://doi.org/10.3390/foods8030092
Article Google Scholar
Davis S, Meltzer PS (2007) GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23:1846–1847. https://doi.org/10.1093/bioinformatics/btm254
Article Google Scholar
Deamer D, Akeson M, Branton D (2016) Three decades of nanopore sequencing. Nat Biotechnol 34:518–524. https://doi.org/10.1038/nbt.3423
Article Google Scholar
Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW (2009) TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinform 10:56. https://doi.org/10.1186/1471-2105-10-56
Article Google Scholar
Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, Banfield JF (2009) Community-wide analysis of microbial genome sequence signatures. Genome Biol 10:R85. https://doi.org/10.1186/gb-2009-10-8-r85
Article Google Scholar
Drmanac R et al (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327:78–81
Article Google Scholar
Eid J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
Article Google Scholar
ElSayed IA, ElDahshan K, Hefny H, ElSayed EK (2021) Big data and its future in computational biology: a literature review. J Comput Sci 17:1222–1228. https://doi.org/10.3844/jcssp.2021.1222.1228
Article Google Scholar
Fabre J, Dauzat M, Nègre V, Wuyts N, Tireau A, Gennari E, Neveu P, Tisné S, Massonnet C, Hummel I (2011) PHENOPSIS DB: an Information System for Arabidopsis thalianaphenotypic data in an environmental context. BMC Plant Biol 11:1–7
Article Google Scholar
Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H (2017) Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform 18:142. https://doi.org/10.1186/s12859-017-1559-2
Article Google Scholar
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Natl Sci Rev 1:293–314. https://doi.org/10.1093/nsr/nwt032
Article Google Scholar
Farag MA, Porzel A, Schmidt J (2011) Profiling and fingerprinting of commercial cultivars of Humulus lupulus L. (hop): a comparison of MS and NMR methods in metabolomics. Metabolomics 8:492–507
Article Google Scholar
Fehlmann T, Reinheimer S, Geng C, Su X, Drmanac S, Alexeev A, Zhang C, Backes C, Ludwig N, Hart M, An D, Zhu Z, Xu C, Chen A, Ni M, Liu J, Li Y, Poulter M, Li Y, Stähler C, Drmanac R, Xu X, Meese E, Keller A (2016) cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs. Clin Epigenetics 8:123. https://doi.org/10.1186/s13148-016-0287-1
Article Google Scholar
Feng X, Liu X, Luo QBFL (2008) Mass spectrometry in systems biology: an overview. Mass Spectrom Rev 27:635–660
Article Google Scholar
Fernández-Torras A, Duran-Frigola M, Bertoni M, Locatelli M, Aloy P (2022) Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. Nat Commun 13:5304. https://doi.org/10.1038/s41467-022-33026-0
Article Google Scholar
Fiehn O (2012) Metabolomics–the link between genotypes and phenotypes. Plant Mol Biol 2002:801–807
Google Scholar
Fiehn O, Robertson D, Griffin J, van der Werf M, Nikolau B, Morrison N, Sumner LW, Goodacre R, Hardy NW, Taylor C, Fostel J, Kristal B, Kaddurah-Daouk R, Mendes P, van Ommen B, Lindon JC, Sansone S-A (2007) The metabolomics standards initiative (MSI). Metabolomics 3:175–178. https://doi.org/10.1007/s11306-007-0070-6
Article Google Scholar
Filipowicz W, Bhattacharyya SN, Sonenberg N (2008) Mechanisms of posttranscriptional regulation by microRNAs: are the answers in sight. Nat 9:102–114
Google Scholar
Finger JH, Smith CM, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2011) The mouse gene expression database (GXD): 2011 update. Nucl Acids Res 39:D835–D841. https://doi.org/10.1093/nar/gkq1132
Article Google Scholar
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M (2014) Pfam: the protein families database. Nucl Acids Res 42:D222-230. https://doi.org/10.1093/nar/gkt1223
Article Google Scholar
Floegel A, Stefan N, Yu Z et al (2013) Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach. Diabetes 62:639–648
Article Google Scholar
Flood PJ, Kruijer W, Schnabel SK, van der Schoor R, Jalink H, Snel JFH, Harbinson J, Aarts MGM (2016) Phenomics for photosynthesis, growth and reflectance in Arabidopsis thaliana reveals circadian and long-term fluctuations in heritability. Plant Methods 12:14. https://doi.org/10.1186/s13007-016-0113-y
Article Google Scholar
Froebel LK, Jalukar S, Lavergne TA, Lee JT, Duong T (2019) Administration of dietary prebiotics improves growth performance and reduces pathogen colonization in broiler chickens. Poult Sci 98:6668–6676. https://doi.org/10.3382/ps/pez537
Article Google Scholar
Garg P, Jaiswal P (2016) Databases and bioinformatics tools for rice research. Curr Plant Biol 7–8:39–52. https://doi.org/10.1016/j.cpb.2016.12.006
Article Google Scholar
Gelly J-C, Orgeur M, Jacq C, Lelandais G (2011) MitoGenesisDB: an expression data mining tool to explore spatio-temporal dynamics of mitochondrial biogenesis. Nucl Acids Res 39:D1079–D1084. https://doi.org/10.1093/nar/gkq781
Article Google Scholar
Gillet LC et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell 11:0111.016717
Google Scholar
Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26:2631–2632
Article Google Scholar
Goñi JR, Fenollosa C, Pérez A, Torrents D, Orozco M (2008) DNAlive: a tool for the physical analysis of DNA at the genomic scale. Bioinformatics 24:1731–1732
Article Google Scholar
Gonzalez-Galarza FF, McCabe A, dos Santos EJM, Jones J, Takeshita L, Ortega-Rivera ND et al (2020) Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res 48:D783–D788. https://doi.org/10.1093/nar/gkz1029
Article Google Scholar
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351
Article Google Scholar
Gowda GAN, Raftery D (2021) NMR based metabolomics. Adv Exp Med Biol 1280:19–37. https://doi.org/10.1007/978-3-030-51652-9_2
Article Google Scholar
Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucl Acids Res 38:D843–D846
Article Google Scholar
Greene CS, Tan J, Ung M, Moore JH, Cheng C (2014) Big data bioinformatics. J Cell Physiol 229:1896–1900. https://doi.org/10.1002/jcp.24662
Article Google Scholar
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR (2003) Rfam: an RNA family database. Nucleic Acids Res 31:439. https://doi.org/10.1093/nar/gkg006
Article Google Scholar
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34:D140. https://doi.org/10.1093/nar/gkj112
Article Google Scholar
Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge structural database. Acta Crystallogr B Struct Sci Cryst Eng Mater 72:171–179. https://doi.org/10.1107/S2052520616003954
Article Google Scholar
Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E, Conroy MJ et al (2014) PDBe: protein data bank in Europe. Nucleic Acids Res 42:D285–D291. https://doi.org/10.1093/nar/gkt1180
Article Google Scholar
Haleem A, Javaid M, Khan IH, Vaishya R (2020) Significant applications of big data in COVID-19 pandemic. Indian J Orthop 54:526–528. https://doi.org/10.1007/s43465-020-00129-z
Article Google Scholar
Haudry Y, Berube H, Letunic I, Weeber P-D, Gagneur J, Girardot C, Kapushesky M, Arendt D, Bork P, Brazma A, Furlong EEM, Wittbrodt J, Henrich T (2008) 4DXpress: a database for cross-species expression pattern comparisons. Nucl Acids Res 36:D847-853. https://doi.org/10.1093/nar/gkm797
Article Google Scholar
Haverland NA, Fox HS, Ciborowski P (2014) Quantitative proteomics by SWATH MS reveals altered expression of nucleic acid binding and regulatory proteins in HIV 1 infected macrophages. J Proteome Res 13:2109–2119
Article Google Scholar
Heather JM, Chain B (2016) The sequence of sequencers: the history of sequencing DNA. Genomics 107:1–8. https://doi.org/10.1016/j.ygeno.2015.11.003
Article Google Scholar
Hendlich M, Bergner A, Günther J, Klebe G (2003) Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J Mol Biol 326:607–620. https://doi.org/10.1016/s0022-2836(02)01408-0
Article Google Scholar
Henrich T, Ramialison M, Quiring R, Wittbrodt B, Furutani-Seiki M, Wittbrodt J, Kondoh H (2003) MEPD: a Medaka gene expression pattern database. Nucl Acids Res 31:72–74
Article Google Scholar
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD (2020) Computational methods for single-cell RNA sequencing. Annu Rev Biomed Data Sci 3:339–364. https://doi.org/10.1146/annurev-biodatasci-012220-100601
Article Google Scholar
Hillier L, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6:807–828
Article Google Scholar
Hoch JC, Baskaran K, Burr H, Chin J, Eghbalnia HR, Fujiwara T et al (2023) Biological magnetic resonance data bank. Nucleic Acids Res 51:D368–D376. https://doi.org/10.1093/nar/gkac1050
Article Google Scholar
Holmes DE (2017) The data explosion. In: Holmes DE (ed) Big data: a very short introduction. Oxford University Press, Oxford. https://doi.org/10.1093/actrade/9780198779575.003.0001
Chapter Google Scholar
Houwing S et al (2007) A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in zebrafish. Cell 129:69–82
Article Google Scholar
Hu Y, Yang L, Lu Y, Wang Y, Jiang J, Liu Y, Cao Q (2022) Systems network pharmacology-based prediction and analysis of potential targets and pharmacological mechanism of Actinidia chinensis planch. Root extract for application in hepatocellular carcinoma. Evid Based Complement Alternat Med 2022:2116006. https://doi.org/10.1155/2022/2116006
Article Google Scholar
Huang S-SC, Ecker JR (2018) Piecing together cis-regulatory networks: insights from epigenomics studies in plants. Wiley Interdiscip Rev Syst Biol Med 10:e1411. https://doi.org/10.1002/wsbm.1411
Article Google Scholar
Huang H-Y, Lin Y-C-D, Li J, Huang K-Y, Shrestha S, Hong H-C et al (2020) miRTarBase updates to the experimentally validated microRNA–target interaction database. Nucleic Acids Res 2020(48):D148–D154. https://doi.org/10.1093/nar/gkz896
Article Google Scholar
Hucka M, Bergmann FT, Dräger A, Hoops S, Keating SM, Le Novére N, Myers CJ, Olivier BG, Sahle S, Schaff JC, Smith LP, Waltemath D, Wilkinson DJ (2015) Systems biology markup language (SBML) level 2 version 5: structures and facilities for model definitions. J Integr Bioinform 12:271. https://doi.org/10.2390/biecoll-jib-2015-271
Article Google Scholar
Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS et al (2006) The PROSITE database. Nucleic Acids Res 34:D227–D230. https://doi.org/10.1093/nar/gkj063
Article Google Scholar
Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-Beltran A, Hunter C, Jones P, Leinonen R, McAnulla C, Maguire E (2014) EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucl Acids Res 42:D600–D606
Article Google Scholar
Huson DH, Weber N (2013) Microbial community analysis using MEGAN. Methods Enzymol 531:465–485. https://doi.org/10.1016/B978-0-12-407863-5.00021-6
Article Google Scholar
Imker HJ (2018) 25 Years of molecular biology databases: a study of proliferation, impact, and maintenance. Front Res Metrics Analyt 3
Jaiswal P, Cooper L, Elser JL, Meier A, Laporte M-A, Mungall C, Smith B, Johnson EKS, Seymour M, Preece J (2016) Planteome: a resource for common reference ontologies and applications for plant biology
Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22:1601–1606
Article Google Scholar
Jirtle RL (2014) The Agouti mouse: a biosensor for environmental epigenomics studies investigating the developmental origins of health and disease. Epigenomics 6:447–450. https://doi.org/10.2217/epi.14.58
Article Google Scholar
Jones-Rhoades MW, Borevitz JO, Preuss D (2007) Genome-wide expression profiling of the Arabidopsis female gametophyte identifies families of small. secreted proteins. PLoS Genet 3:1848–1861
Article Google Scholar
Kadota K, Nishimura S-I, Bono H, Nakamura S, Hayashizaki Y, Okazaki Y, Takahashi K (2003) Detection of genes with tissue-specific expression patterns using Akaike’s information criterion procedure. Physiol Genom 12:251–259. https://doi.org/10.1152/physiolgenomics.00153.2002
Article Google Scholar
Kahraman A, Avramov A, Nashev LG, Popov D, Ternes R, Pohlenz H-D, Weiss B (2005) PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics 21:418–420
Article Google Scholar
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucl Acids Res 28:27–30. https://doi.org/10.1093/nar/28.1.27
Article Google Scholar
Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A (2010) Gene expression atlas at the European bioinformatics institute. Nucl Acids Res 38:D690–D698. https://doi.org/10.1093/nar/gkp936
Article Google Scholar
Karolchik D, Hinrichs AS, Kent WJ (2009) The UCSC genome browser. Curr Protoc Bioinformatics CHAPTER:Unit1.4. https://doi.org/10.1002/0471250953.bi0104s28
Karow J (2015) Qiagen launches GeneReader NGS System at AMP; presents performance evaluation by broad. GenomeWeb, molecular-diagnostics/qiagen-launches-genereader-ngs-system-amp-presents-performance-evaluation 10:12885–017.
Kato K, Ishiwa A (2015) The role of carbohydrates in infection strategies of enteric pathogens. Trop Med Health 43:41–52. https://doi.org/10.2149/tmh.2014-25
Article Google Scholar
Kaur AP, Bhardwaj S, Dhanjal DS, Nepovimova E, Cruz-Martins N, Kuča K, Chopra C, Singh R, Kumar H, Șen F, Kumar V, Verma R, Kumar D (2021) Plant prebiotics and their role in the amelioration of diseases. Biomolecules 11:234. https://doi.org/10.3390/biom11030440
Article Google Scholar
Kechagia M, Basoulis D, Konstantopoulou S, Dimitriadi D, Gyftopoulou K, Skarmoutsou N, Fakiri EM (2013) Health benefits of probiotics: a review. ISRN Nutr 2013:481651. https://doi.org/10.5402/2013/481651
Article Google Scholar
Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233. https://doi.org/10.1007/978-1-4939-3369-3_13
Article Google Scholar
Kellman BP, Lewis NE (2021) Big-data glycomics: tools to connect glycan biosynthesis to extracellular communication. Trends Biochem Sci 46:284–300. https://doi.org/10.1016/j.tibs.2020.10.004
Article Google Scholar
Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S et al (2009) Human protein reference database—2009 update. Nucleic Acids Res 37:D767–D772. https://doi.org/10.1093/nar/gkn892
Article Google Scholar
Khan N, Yaqoob I, Hashem IAT, Inayat Z, Mahmoud Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:e712826. https://doi.org/10.1155/2014/712826
Article Google Scholar
Khoroshevskyi O, LeRoy N, Reuter VP, Sheffield NC (2023) GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA. Bioinformatics 39:btad069. https://doi.org/10.1093/bioinformatics/btad069
Article Google Scholar
Kim M-S, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S (2014) A draft map of the human proteome. Nature 509:575–581
Article Google Scholar
Kind T, Scholz M, Fiehn O (2009) How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS ONE 4(5):e5440
Article Google Scholar
Kircher M, Kelso J (2010) High-throughput DNA sequencing—concepts and limitations. BioEssays 32:524–536
Article Google Scholar
Knudsen M, Wiuf C (2010) The CATH database. Hum Genom 4:207–212. https://doi.org/10.1186/1479-7364-4-3-207
Article Google Scholar
Koslicki D, Foucart S, Rosen G (2014) WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS ONE 9:e91784. https://doi.org/10.1371/journal.pone.0091784
Article Google Scholar
Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J (2008) Phylogenetic classification of short environmental DNA fragments. Nucl Acids Res 36:2230–2239. https://doi.org/10.1093/nar/gkn038
Article Google Scholar
Kristensen AR, Gsponer J, Foster LJA (2012) high-throughput approach for measuring temporal changes in the interactome. Nat Methods 9:907–909
Article Google Scholar
Kulak NA, Pichler G, Paron I, Nagaraj N, Mann MM (2014) encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324
Article Google Scholar
Kurc T, Qi X, Wang D, Wang F, Teodoro G, Cooper L, Nalisnik M, Yang L, Saltz J, Foran DJ (2015) Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies. BMC Bioinform 16:399. https://doi.org/10.1186/s12859-015-0831-6
Article Google Scholar
Kv V, Sa D, Jd D (2009) Next-generation sequencing: from basic research to diagnostics. Clin Chem. https://doi.org/10.1373/clinchem.2008.112789
Article Google Scholar
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CS-O, Aparicio S, Baaijens J, Balvert M, de Barbanson B, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo T-H, Lelieveldt BPF, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Raczkowski L, Reinders M, de Ridder J, Saliba A-E, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A (2020) Eleven grand challenges in single-cell data science. Genome Biol 21:31. https://doi.org/10.1186/s13059-020-1926-6
Article Google Scholar
Langevin SM, Kelsey KT (2013) The fate is not always written in the genes: epigenomics in epidemiologic studies. Environ Mol Mutagen 54:533–541. https://doi.org/10.1002/em.21762
Article Google Scholar
Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, ur-Rehman S, et al (2015) The European genome-phenome archive of human data consented for biomedical research. Nat Genet 47:692–695. https://doi.org/10.1038/ng.3312
Article Google Scholar
Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J et al (2013) dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res 41:D936–D941. https://doi.org/10.1093/nar/gks1213
Article Google Scholar
Larance M, Ahmad Y, Kirkwood KJ, Ly T, Lamond AI (2013) Global subcellular characterization of protein degradation using quantitative proteomics. Mol Cell 12:638–650
Google Scholar
Larmande P, Gay C, Lorieux M, Périn C, Bouniol M, Droc G, Sallaud C, Perez P, Barnola I, Biderre-Petit C, Martin J, Morel JB, Johnson AAT, Bourgis F, Ghesquière A, Ruiz M, Courtois B, Guiderdoni E (2008) Oryza Tag Line, a phenotypic mutant database for the Genoplante rice insertion line library. Nucl Acids Res 36:D1022-1027. https://doi.org/10.1093/nar/gkm762
Article Google Scholar
Larsen JEP, Lund O, Nielsen M (2006) Improved method for predicting linear B-cell epitopes. Immunome Res 2:1–7
Article Google Scholar
Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucl Acids Res 32:D393–D397
Article Google Scholar
Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158-162. https://doi.org/10.1093/nar/gkj002
Article Google Scholar
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714. https://doi.org/10.1093/bioinformatics/btn025
Article Google Scholar
Li Y, Chen L (2014) Big biological data: challenges and opportunities. Genom Proteom Bioinform 12:187–189. https://doi.org/10.1016/j.gpb.2014.10.001
Article Google Scholar
Liang K, Sakakibara Y (2021) MetaVelvet-DL: a MetaVelvet deep learning extension for de novo metagenome assembly. BMC Bioinform 22:427. https://doi.org/10.1186/s12859-020-03737-6
Article Google Scholar
Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M (2011) Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genom 12(Suppl 2):S4. https://doi.org/10.1186/1471-2164-12-S2-S4
Article Google Scholar
Liu Q, Guo Y, Li J, Long J, Zhang B, Shyr Y (2012) Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data. BMC Genom 13(Suppl 8):S8. https://doi.org/10.1186/1471-2164-13-S8-S8
Article Google Scholar
Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: A database for tissue-specific gene expression and regulation. BMC Bioinform 9:271. https://doi.org/10.1186/1471-2105-9-271
Article Google Scholar
Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI (2006) OPM: orientations of proteins in membranes database. Bioinformatics 22:623–625. https://doi.org/10.1093/bioinformatics/btk023
Article Google Scholar
Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569
Article Google Scholar
Luan H, Geczy P, Lai H, Gobert J, Yang SJH, Ogata H, Baltes J, Guerra R, Li P, Tsai C-C (2020) Challenges and future directions of big data and artificial intelligence in education. Front Psychol 11
Luo C, Rodriguez-r LM, Konstantinidis KT (2014) MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucl Acids Res 42:e73–e73
Article Google Scholar
Ly T, Endo A, Brenes A, Gierlinski M, Afzal V, Pawellek A, Lamond AI (2018) Proteome-wide analysis of protein abundance and turnover remodelling during oncogenic transformation of human breast epithelial cells. Wellcome Open Res 3:51. https://doi.org/10.12688/wellcomeopenres.14392.1
Article Google Scholar
MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103. https://doi.org/10.1186/gb-2009-10-10-r103
Article Google Scholar
MacDonald NJ, Parks DH, Beiko RG (2012) Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucl Acids Res 40:e111. https://doi.org/10.1093/nar/gks335
Article Google Scholar
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucl Acids Res 47:W636–W641. https://doi.org/10.1093/nar/gkz268
Article Google Scholar
Markowitz VM, Chen I-MA, Chu K, Szeto E, Palaniappan K, Pillay M, Ratner A, Huang J, Pagani I, Tringe S, Huntemann M, Billis K, Varghese N, Tennessen K, Mavromatis K, Pati A, Ivanova NN, Kyrpides NC (2014) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucl Acids Res 42:D568-573. https://doi.org/10.1093/nar/gkt919
Article Google Scholar
Markowitz VM, Chen I-MA, Chu K, Szeto E, Palaniappan K, Jacob B et al (2012) IMG/M-HMP: a metagenome comparative analysis system for the human microbiome project. PLoS ONE 7:e40151. https://doi.org/10.1371/journal.pone.0040151
Article Google Scholar
Marx V (2013) Biology: the big challenges of big data. Nature 498:255–260
Article Google Scholar
Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T (2017) DNA data bank of Japan. Nucl Acids Res 45:D25–D31. https://doi.org/10.1093/nar/gkw1001
Article Google Scholar
McClatchy DB, Liao LJ, Lee JH, Park SK, Yates JR (2012) Dynamics of subcellular proteomes during brain development. J Proteome Res 11:2467–2479
Article Google Scholar
McGeary SE, Lin KS, Shi CY, Pham TM, Bisaria N, Kelley GM et al (2019) The biochemical basis of microRNA targeting efficacy. Science (New York, NY) 366:234. https://doi.org/10.1126/science.aav1741
Article Google Scholar
McHardy AC, Martín HG, Tsirigos A, Hugenholtz P, Rigoutsos I (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72. https://doi.org/10.1038/nmeth976
Article Google Scholar
Mcwilliam H, Valentin F, Goujon M, Li W, Narayanasamy M, Martin J, Miyar T, Lopez R (2009) Web services at the European bioinformatics institute-2009. Nucleic Acids Res 37:W6–W10. https://doi.org/10.1093/nar/gkp302
Article Google Scholar
Merchant CA, Healy K, Wanunu M, Ray V, Peterman N, Bartel J, Fischbein MD, Venta K, Luo Z, Johnson ATC, Drndić M (2010) DNA translocation through graphene nanopores. Nano Lett 10:2915–2921. https://doi.org/10.1021/nl101046t
Article Google Scholar
Merelli I, Pérez-Sánchez H, Gesing S, D’Agostino D (2014) Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives. Biomed Res Int 2014:e134023. https://doi.org/10.1155/2014/134023
Article Google Scholar
Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M et al (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30:31–34
Article Google Scholar
Meyers BC, Souret FF, Lu C, Green PJ (2006) Sweating the small stuff: microRNA discovery in plants. Curr Opin 17:139–146
Google Scholar
Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S et al (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res 33:D284–D288
Article Google Scholar
Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090
Article Google Scholar
Mir RR, Reynolds M, Pinto F, Khan MA, Bhat MA (2019) High-throughput phenotyping for crop improvement in the genomics era. In: Plant science, the 4th international plant phenotyping symposium 282, pp 60–72. https://doi.org/10.1016/j.plantsci.2019.01.007
Mohammed MH, Ghosh TS, Singh NK, Mande SS (2011) SPHINX–an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27:22–30. https://doi.org/10.1093/bioinformatics/btq608
Article Google Scholar
Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS (2009) SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25:1722–1730. https://doi.org/10.1093/bioinformatics/btp317
Article Google Scholar
Moraes G, de Almeida LC (2020) Chapter 11—nutrition and functional aspects of digestion in fish. In: Baldisserotto B, Urbinati EC, Cyrino JEP (eds) Biology and physiology of freshwater neotropical fish. Academic Press, New York, pp 251–271. https://doi.org/10.1016/B978-0-12-815872-2.00011-7
Chapter Google Scholar
Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–264
Article Google Scholar
Naegle KM, White FM, Lauffenburger DA, Yaffe MB (2012) Robust co regulation of tyrosine phosphorylation sites on proteins reveals novel protein interactions. Mol Biosyst 8:2771–2782
Article Google Scholar
Nieduszynski CA, Hiraga S, Ak P, Benham CJ, Donaldson AD (2007) OriDB: a DNA replication origin database. Nucleic Acids Res 35:D40–D46
Article Google Scholar
Nikolskiy I, Mahieu NG, Y-j C et al (2013) An untargeted metabolomic workflow to improve structural characterization of metabolites. Anal Chem 85:7713–7719
Article Google Scholar
O’Donoghue SI (2021) Grand challenges in bioinformatics data visualization. Front Bioinform 1
Ohtsu K et al (2007) Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.). Plant J 52:391–404
Article Google Scholar
O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733-745. https://doi.org/10.1093/nar/gkv1189
Article Google Scholar
Örd T, Õunap K, Stolze LK, Aherrahrou R, Nurminen V, Toropainen A, Selvarajan I, Lönnberg T, Aavik E, Ylä-Herttuala S, Civelek M, Romanoski CE, Kaikkonen MU (2021) Single-cell epigenomics and functional fine-mapping of atherosclerosis GWAS Loci. Circ Res 129:240–258. https://doi.org/10.1161/CIRCRESAHA.121.318971
Article Google Scholar
Pal S, Mondal S, Das G, Khatua S, Ghosh Z (2020) Big data in biology: the hope and present-day challenges in it. Gene Rep 21:100869. https://doi.org/10.1016/j.genrep.2020.100869
Article Google Scholar
Papatheodorou I, Fonseca NA, Keays M, Tang YA, Barrera E, Bazant W, Burke M, Füllgrabe A, Fuentes AM-P, George N, Huerta L, Koskinen S, Mohammed S, Geniza M, Preece J, Jaiswal P, Jarnuczak AF, Huber W, Stegle O, Vizcaino JA, Brazma A, Petryszak R (2018) Expression Atlas: gene and protein expression across multiple studies and organisms. Nucl Acids Res 46:D246–D251. https://doi.org/10.1093/nar/gkx1158
Article Google Scholar
Park SK et al (2014) Census 2: isobaric labeling data analysis. Bioinformatics 30:2208–2209
Article Google Scholar
Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl Acids Res 39:D1002–D1004. https://doi.org/10.1093/nar/gkq1040
Article Google Scholar
Pati A, Heath LS, Kyrpides NC, Ivanova N (2011) ClaMS: a classifier for metagenomic sequences. Stand Genomic Sci 5:248–253. https://doi.org/10.4056/sigs.2075298
Article Google Scholar
Patti GJ, Yanes O, Siuzdak G (2012) Metabolomics the apogee of the omics trilogy. Nat Rev Mol Cell Biol 13:263–269
Article Google Scholar
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA et al (2023) InterPro in 2022. Nucleic Acids Res 51:D418–D427. https://doi.org/10.1093/nar/gkac993
Article Google Scholar
Peterlongo P, Chikhi R (2012) Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinform 13:48. https://doi.org/10.1186/1471-2105-13-48
Article Google Scholar
Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98:9748–9753. https://doi.org/10.1073/pnas.171285098
Article MathSciNet Google Scholar
Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H et al (2011) ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 39:D465–D474. https://doi.org/10.1093/nar/gkq1091
Article Google Scholar
Pj F, Jh M, Hr K (2021) The phenomics and genetics of addictive and affective comorbidity in opioid use disorder. Drug Alcohol Depend 221:234. https://doi.org/10.1016/j.drugalcdep.2021.108602
Article Google Scholar
Pollet N, Schmidt HA, Gawantka V, Vingron M, Niehrs C (2000) Axeldb: a Xenopus laevis database focusing on gene expression. Nucl Acids Res 28:139–140. https://doi.org/10.1093/nar/28.1.139
Article Google Scholar
Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P (2014) eggNOG v4.0: nested orthology inference across 3686 organisms. Nucl Acids Res 42:D231–D239. https://doi.org/10.1093/nar/gkt1253
Article Google Scholar
Prestat E, David MM, Hultman J, Taş N, Lamendella R, Dvornik J, Mackelprang R, Myrold DD, Jumpponen A, Tringe SG, Holman E, Mavromatis K, Jansson JK (2014) FOAM (functional ontology assignments for metagenomes): a hidden Markov model (HMM) database with environmental focus. Nucl Acids Res 42:e145. https://doi.org/10.1093/nar/gku702
Article Google Scholar
Raghavendra P, Pullaiah T (2018) Chapter 7—pathogen identification using novel sequencing methods. In: Raghavendra P, Pullaiah T (eds) Advances in cell and molecular diagnostics. Academic Press, New York, pp 161–202. https://doi.org/10.1016/B978-0-12-813679-9.00007-5
Chapter Google Scholar
Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3. https://doi.org/10.1186/2047-2501-2-3
Article Google Scholar
Rangwala SH, Kuznetsov A, Ananiev V, Asztalos A, Borodin E, Evgeniev V et al (2021) Accessing NCBI data using the NCBI sequence viewer and genome data viewer (GDV). Genome Res 31:159–169. https://doi.org/10.1101/gr.266932.120
Article Google Scholar
Renuse S, Chaerkady R, Pandey A (2011) Proteogenomics. Proteomics 11:620–630
Article Google Scholar
Reuter JA, Spacek D, Snyder MP (2015) High-throughput sequencing technologies. Mol Cell 58:586–597. https://doi.org/10.1016/j.molcel.2015.05.004
Article Google Scholar
Rhee J-S, Yu IT, Kim B-M, Jeong C-B, Lee K-W, Kim M-J, Lee S-J, Park GS, Lee J-S (2013) Copper induces apoptotic cell death through reactive oxygen species-triggered oxidative stress in the intertidal copepod Tigriopus japonicus. Aquat Toxicol 132–133:182–189. https://doi.org/10.1016/j.aquatox.2013.02.013
Article Google Scholar
Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucl Acids Res 38:e191. https://doi.org/10.1093/nar/gkq747
Article Google Scholar
Rigden DJ, Fernández XM (2022) The 2022 nucleic acids research database issue and the online molecular biology database collection. Nucl Acids Res 50:D1–D10. https://doi.org/10.1093/nar/gkab1195
Article Google Scholar
Rigden DJ, Fernández XM (2021) The 2021 nucleic acids research database issue and the online molecular biology database collection. Nucl Acids Res 49:D1–D9. https://doi.org/10.1093/nar/gkaa1216
Article Google Scholar
Ristevski B, Chen M (2018) Big data analytics in medicine and healthcare. J Integr Bioinform 15:20170030. https://doi.org/10.1515/jib-2017-0030
Article Google Scholar
RNAcentral (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res 45:128–134. https://doi.org/10.1093/nar/gkw1008
Robinson C (1994) The European Bioinformatics Institute (EBI)—open for business. Trends Biotechnol 12:391–392. https://doi.org/10.1016/0167-7799(94)90024-8
Article Google Scholar
Robison K (2022) 2022: a wild year for short reads in genome sequencing? GEN Biotechnol 1:40–42
Article Google Scholar
Rodríguez-Ezpeleta N, Hackenberg M, Aransay AM (eds) (2012) Bioinformatics for high throughput sequencing. Springer, New York. https://doi.org/10.1007/978-1-4614-0782-9
Book Google Scholar
Rosen GL, Reichenberger ER, Rosenfeld AM (2011) NBC: the Naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27:127–129. https://doi.org/10.1093/bioinformatics/btq619
Article Google Scholar
Roux KJ, Kim DI, Raida M, Burke BA (2012) promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. J Cell Biol 196:801–810
Article Google Scholar
Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y et al (2008) TreeFam: 2008 update. Nucleic Acids Res 36:D735–D740. https://doi.org/10.1093/nar/gkm1005
Article Google Scholar
Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M et al (2010) GeneCards version 3: the human gene integrator. Database (Oxford) 2010:baq020. https://doi.org/10.1093/database/baq020
Article Google Scholar
Sai Lakshmi S, Agrawal S (2008) piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res 36:D173–D177. https://doi.org/10.1093/nar/gkm696
Article Google Scholar
Saito T, Ariizumi T, Okabe Y, Asamizu E, Hiwasa-Tanase K, Fukuda N, Mizoguchi T, Yamazaki Y, Aoki K, Ezura H (2011) TOMATOMA: a novel tomato mutant database distributing Micro-Tom mutant collections. Plant Cell Physiol 52:283–296
Article Google Scholar
Salek RM, Steinbeck C, Viant MR et al (2013) The role of reporting standards for metabolite annotation and identification in metabolomic studies. Gigascience 2:1
Article Google Scholar
Sallet E, Gouzy J, Schiex T (2019) EuGene: an automated integrative gene finder for eukaryotes and prokaryotes. Methods Mol Biol 1962:97–120. https://doi.org/10.1007/978-1-4939-9173-0_6
Article Google Scholar
Samaras P, Schmidt T, Frejno M, Gessulat S, Reinecke M, Jarzab A, Zecha J, Mergner J, Giansanti P, Ehrlich H-C, Aiche S, Rank J, Kienegger H, Krcmar H, Kuster B, Wilhelm M (2020) ProteomicsDB: a multi-omics and multi-organism resource for life science research. Nucl Acids Res 48:D1153–D1163. https://doi.org/10.1093/nar/gkz974
Article Google Scholar
Sato K, Sakakibara Y (2015) MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res 22:69–77
Article Google Scholar
Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Sherry ST, Yankie L, Karsch-Mizrachi I (2023) GenBank 2023 update. Nucl Acids Res 51:D141–D144. https://doi.org/10.1093/nar/gkac1012
Article Google Scholar
Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH (2009) PID: the pathway interaction database. Nucl Acids Res 37:D674–D679
Article Google Scholar
Schatz MC (2015) Biological data sciences in genome research. Genome Res 25:1417–1422. https://doi.org/10.1101/gr.191684.115
Article Google Scholar
Schicho R, Shaykhutdinov R, Ngo J et al (2012) Quantitative metabolomic profiling of serum, plasma, and urine by (1)H NMR spectroscopy discriminates between patients with inflammatory bowel disease and healthy individuals. J Proteome Res 11:3344–3357
Article Google Scholar
Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5:e75. https://doi.org/10.1371/journal.pbio.0050075
Article Google Scholar
Sethupathy P, Corda B, Hatzigeorgiou AG (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 12:192–197. https://doi.org/10.1261/rna.2239606
Article Google Scholar
Sharon D, Tilgner H, Grubert F, Snyder MA (2013) single-molecule long-read survey of the human transcriptome. Nat 31:1009–1014
Google Scholar
Sharon N, Ofek I (2000) Safe as mother’s milk: carbohydrates as future anti-adhesion drugs for bacterial diseases. Glycoconj J 17:659–664. https://doi.org/10.1023/a:1011091029973
Article Google Scholar
Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP, Dickerson JA (2005) BarleyBase–an expression profiling database for plant genomics. Nucl Acids Res 33:D614-618. https://doi.org/10.1093/nar/gki123
Article Google Scholar
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol İ (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123. https://doi.org/10.1101/gr.089532.108
Article Google Scholar
Slatko BE, Gardner AF, Ausubel FM (2018) Overview of next generation sequencing technologies. Curr Protoc Mol Biol 122:e59. https://doi.org/10.1002/cpmb.59
Article Google Scholar
Slavin J (2013) Fiber and prebiotics: mechanisms and health benefits. Nutrients 5:1417–1435. https://doi.org/10.3390/nu5041417
Article Google Scholar
Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL (2018) WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucl Acids Res 46:D661–D667. https://doi.org/10.1093/nar/gkx1064
Article Google Scholar
Smigielski EM, Sirotkin K, Ward M, Sherry ST (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28:352–355
Article Google Scholar
Sreenivasan VKA, Henck J, Spielmann M (2022) Single-cell sequencing: promises and challenges for human genetics. Med Gen 34:261–273. https://doi.org/10.1515/medgen-2022-2156
Article Google Scholar
Stehr H, Duarte JM, Lappe M, Bhak J, Bolser DM (2010) PDBWiki: added value through community annotation of the protein data bank. Database (Oxford) 2010:baq009. https://doi.org/10.1093/database/baq009
Article Google Scholar
Su X, Xu J, Ning K (2012) Parallel-META: efficient metagenomic data analysis based on high-performance computation. BMC Syst Biol 6(Suppl 1):S16. https://doi.org/10.1186/1752-0509-6-S1-S16
Article Google Scholar
Subramanian I, Verma S, Kumar S, Jere A, Anamika K (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051. https://doi.org/10.1177/1177932219899051
Article Google Scholar
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81
Article Google Scholar
Suhre K, Claverie J-M (2004) FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res 32:D273-276. https://doi.org/10.1093/nar/gkh053
Article Google Scholar
Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger SA, Kultima JR, Coelho LP, Arumugam M, Tap J, Nielsen HB, Rasmussen S, Brunak S, Pedersen O, Guarner F, de Vos WM, Wang J, Li J, Doré J, Ehrlich SD, Stamatakis A, Bork P (2013) Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods 10:1196–1199. https://doi.org/10.1038/nmeth.2693
Article Google Scholar
Sunkin SM, Ng L, Lau C, Dolbeare T, Gilbert TL, Thompson CL, Hawrylycz M, Dang C (2013) Allen brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucl Acids Res 41:D996–D1008. https://doi.org/10.1093/nar/gks1042
Article Google Scholar
Tanizawa Y, Fujisawa T, Kodama Y, Kosuge T, Mashima J, Tanjo T, Nakamura Y (2023) DNA Data Bank of Japan (DDBJ) update report 2022. Nucl Acids Res 51:D101–D105. https://doi.org/10.1093/nar/gkac1083
Article Google Scholar
Teeling H, Waldmann J, Lombardot T, Bauer M, Glöckner FO (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinform 5:163. https://doi.org/10.1186/1471-2105-5-163
Article Google Scholar
Thompson JF, Steinmann KE (2010) Single molecule sequencing with a heliscope genetic analysis system. Curr Protoc Mol Biol. https://doi.org/10.1002/0471142727.mb0710s92
Article Google Scholar
Tinnikov AA, Samuels HHA (2013) novel cell lysis approach reveals that caspase 2 rapidly translocates from the nucleus to the cytoplasm in response to apoptotic stimuli. PLoS ONE 8:e61085
Article Google Scholar
Tobi EW, van Zwet EW, Lumey LH, Heijmans BT (2018) Why mediation analysis trumps Mendelian randomization in population epigenomics studies of the Dutch Famine. https://doi.org/10.1101/362392
Tolani P, Gupta S, Yadav K, Aggarwal S, Yadav AK (2021) Chapter FourBig data, integrative omics and network biology. In: Donev R, Karabencheva-Christova T (eds) Advances in protein chemistry and structural biology, proteomics and systems biology. Academic Press, New York, pp 127–160. https://doi.org/10.1016/bs.apcsb.2021.03.006
Chapter Google Scholar
Torres TT, Metta M, Ottenwalder B, Schlotterer C (2008) Gene expression profiling by massively parallel sequencing. Genome Res 18:172–177
Article Google Scholar
Toth AL et al (2007) Wasp gene expression supports an evolutionary link between maternal behavior and eusociality. Science 318:441–444
Article Google Scholar
Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14:R2
Article Google Scholar
Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979. https://doi.org/10.1093/nar/gkt1211
Article Google Scholar
Tucker T, Marra M, Friedman JM (2009) Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet 85:142–154. https://doi.org/10.1016/j.ajhg.2009.06.022
Article Google Scholar
Uchiyama I (2007) MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups. Nucleic Acids Res 35:D343–D346
Article Google Scholar
Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S (2010) Towards a knowledge-based human protein atlas. Nat Biotechnol 28:1248–1250
Article Google Scholar
Ullah S, Rahman W, Ullah F, Ahmad G, Ijaz M, Gao T (2021) DBHR: a collection of databases relevant to human research. Future Sci OA 8:FSO780. https://doi.org/10.2144/fsoa-2021-0101
Article Google Scholar
Via M, Gignoux C, Burchard EG (2010) The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med 2:3. https://doi.org/10.1186/gm124
Article Google Scholar
Viant MR, Sommer U (2012) Mass spectrometry based environmental metabolomics: a primer and review. Metabolomics 9:144–158
Article Google Scholar
Visel A, Thaller C, Eichele G (2004) GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucl Acids Res 32:D552–D556. https://doi.org/10.1093/nar/gkh029
Article Google Scholar
Vizcaíno JA et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226
Article Google Scholar
Volders P-J, Anckaert J, Verheggen K, Nuytens J, Martens L, Mestdagh P et al (2019) LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 47:D135–D139. https://doi.org/10.1093/nar/gky1031
Article Google Scholar
von Itzstein M, Moran AP (2010) Chapter 50—future potential of glycomics in microbiology and infectious diseases. In: Holst O, Brennan PJ, von Itzstein M, Moran AP (eds) Microbial glycobiology. Academic Press, San Diego, pp 981–986. https://doi.org/10.1016/B978-0-12-374546-0.00050-X
Chapter Google Scholar
Vulimiri SV, Sonawane BR, Szabo DT (2014) Systems biology application in toxicology. In: Wexler P (ed) Encyclopedia of toxicology, 3rd edn. Academic Press, Oxford, pp 454–458. https://doi.org/10.1016/B978-0-12-386454-3.01047-2
Chapter Google Scholar
Wang FJ et al (2010) Fractionation of phosphopeptides on strong anion-exchange capillary trap column for large-scale phosphoproteome analysis of microgram samples. J Seper Sci 33:1879–1887
Article Google Scholar
Wang W, Song X, Wang L, Song L (2018) Pathogen-derived carbohydrate recognition in molluscs immune defense. Int J Mol Sci 19:721. https://doi.org/10.3390/ijms19030721
Article Google Scholar
Wang X, Wang Y, Yue B, Zhang X, Liu S (2013) The complete mitochondrial genome of the Bufo tibetanus (Anura: Bufonidae). Mitochondrial DNA 24:186–188. https://doi.org/10.3109/19401736.2012.744978
Article Google Scholar
Wang Y, Kung L, Wang WYC, Cegielski CG (2018) An integrated big data analytics-enabled transformation model: application to health care. Inf Manag 55:64–79. https://doi.org/10.1016/j.im.2017.04.001
Article Google Scholar
Wang Y, Leung H, Yiu S, Chin F (2014) MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genom 15(Suppl 1):S12. https://doi.org/10.1186/1471-2164-15-S1-S12
Article Google Scholar
Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S (2002) Gramene: a resource for comparative grass genomics. Nucl Acids Res 30:103–105
Article Google Scholar
Waters M, Stasiewicz S, Alex Merrick B, Tomer K, Bushel P, Paules R et al (2007) CEBS—chemical effects in biological systems: a public data repository integrating study design and toxicity data with microarray and proteomics data. Nucleic Acids Res 36:D892-900
Article Google Scholar
Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 144:32–42
Article Google Scholar
Wei G, Hu R, Li Q, Lu W, Liang H, Nan H, Lu J, Li J, Zhao Q (2022) Oligonucleotide discrimination enabled by tannic acid-coordinated film-coated solid-state nanopores. Langmuir 38:6443–6453. https://doi.org/10.1021/acs.langmuir.2c00638
Article Google Scholar
Wei W, Yeung ES (2000) Improvements in DNA sequencing by capillary electrophoresis at elevated temperature using poly(ethylene oxide) as a sieving matrix. J Chromatogr B Biomed Sci Appl 745:221–230. https://doi.org/10.1016/S0378-4347(00)00069-4
Article Google Scholar
Wilhelm M et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587
Article Google Scholar
Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, Djoumbou Y, Mandal R, Aziat F, Dong E (2012) HMDB 3.0—the human metabolome database in 2013. Nucl Acids Res 41:D801–D807
Article Google Scholar
Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. https://doi.org/10.1186/gb-2014-15-3-r46
Article Google Scholar
Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291
Article Google Scholar
Xu Q, Dunbrack RL (2011) The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms. Nucleic Acids Res 39:D761–D770. https://doi.org/10.1093/nar/gkq1059
Article Google Scholar
Yang Y, Wang D, Miao Y-R, Wu X, Luo H, Cao W et al (2023) lncRNASNP v3: an updated database for functional variants in long non-coding RNAs. Nucleic Acids Res 51:D192–D198. https://doi.org/10.1093/nar/gkac981
Article Google Scholar
Yao T, Chen M-H, Lindemann SR (2020) Structurally complex carbohydrates maintain diversity in gut-derived microbial consortia under high dilution pressure. FEMS Microbiol Ecol 96:finaa1158. https://doi.org/10.1093/femsec/fiaa158
Article Google Scholar
Ye Y, Tang H (2009) An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 7:455–471. https://doi.org/10.1142/s0219720009004151
Article Google Scholar
Yuan Z, Wang C, Yi X, Ni Z, Chen Y, Li T (2018) Solid-state nanopore. Nanoscale Res Lett 13:56. https://doi.org/10.1186/s11671-018-2463-z
Article Google Scholar
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. https://doi.org/10.1101/gr.074492.107
Article Google Scholar
Zhang A, Sun H, Wang X (2012) Saliva metabolomics opens door to biomarker discovery, disease diagnosis, and treatment. Appl Biochem Biotechnol 168:1718–1727
Article Google Scholar
Zhang J, Li C, Wu C, Xiong L, Chen G, Zhang Q, Wang S (2006) RMD: a rice mutant database for functional analysis of the rice genome. Nucl Acids Res 34:D745–D748
Article Google Scholar
Zhang Y, Lin J, Zhao L, Zeng X, Liu X (2021) A novel antibacterial peptide recognition algorithm based on BERT. Brief Bioinform 22:bbab200. https://doi.org/10.1093/bib/bbab200
Article Google Scholar
Zhao J, Klyne G, Benson E, Gudmannsdottir E, White-Cooper H, Shotton D (2010) FlyTED: the drosophila testis gene expression database. Nucl Acids Res 38:D710-715. https://doi.org/10.1093/nar/gkp1006
Article Google Scholar
Zhao L, Deng L, Li G, Jin H, Cai J, Shang H, Li Y, Wu H, Xu W, Zeng L, Zhang R, Zhao H, Wu P, Zhou Z, Zheng J, Ezanno P, Yang AX, Yan Q, Deem MW, He J (2017) Single molecule sequencing of the M13 virus genome without amplification. PLoS ONE 12:e0188181. https://doi.org/10.1371/journal.pone.0188181
Article Google Scholar
Zheng H, Wu H (2010) Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol 8:995–1011. https://doi.org/10.1142/s0219720010005051
Article Google Scholar
Zhou B, Xiao JF, Tuli L, Ressom HW (2012) LC-MS-based metabolomics. Mol BioSyst 8:470–481
Article Google Scholar
Zou D, Ma L, Yu J, Zhang Z (2015) Biological databases for human research. Genom Proteom Bioinform 13:55–63. https://doi.org/10.1016/j.gpb.2015.01.006
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to Professor Anil Kumar, Director of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, for guidance and valuable suggestions.

Author information

Authors and Affiliations

Department of Bioinformatics, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, 462003, India
Jyoti Kant Chaudhari
Electrochemical Process Engineering Division, CSIR-Central Electrochemical Research Institute, Karaikudi, Tamilnadu, 630003, India
Shubham Pant
Department of Molecular Biology and Genetic Engineering, G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India
Richa Jha
Department of Animal Science and Technology, Chung-Ang University, Anseong-Si, Gyeonggi-Do, 17546, Republic of Korea
Rajesh Kumar Pathak
Department of Biotechnology, Siddharth University, Kapilvastu, Siddharth Nagar, Uttar Pradesh, 272207, India
Dev Bukhsh Singh

Authors

Jyoti Kant Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Pant
View author publications
You can also search for this author in PubMed Google Scholar
Richa Jha
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Kumar Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Dev Bukhsh Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DBS and RKP developed the idea for this review article and its coverage. JKC, SP, and RJ drafted the article. RKP provided valuable input for the improvement of the article. DBS critically revised the work and updated the manuscript for publication. All authors have read the final manuscript and approved the submission.

Corresponding author

Correspondence to Dev Bukhsh Singh.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chaudhari, J.K., Pant, S., Jha, R. et al. Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review. Knowl Inf Syst 66, 3159–3209 (2024). https://doi.org/10.1007/s10115-023-02049-4

Download citation

Received: 18 July 2023
Revised: 12 September 2023
Accepted: 11 December 2023
Published: 27 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10115-023-02049-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A practical guide to amplicon and metagenomic analysis of microbiome data

Quantitative Mass Spectrometry-Based Proteomics: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A practical guide to amplicon and metagenomic analysis of microbiome data

Quantitative Mass Spectrometry-Based Proteomics: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation