Big Data Analytics in Bio-informatics

  • C.S.R. PrabhuEmail author
  • Aneesh Sreevallabh Chivukula
  • Aditya Mogadala
  • Rohit Ghosh
  • L.M. Jenila Livingston


Bio-informatics is an interdisciplinary science, which provides life solutions in the discipline of biology and health care by combining the tools available in various disciplines such as computer science, statistics, storage, retrieval and processing of biological data. This interdisciplinary science can provide inputs to diverse sectors such as medical, health, food and agriculture.


  1. 1.
    EMBL-European Bioinformatics Institute, EMBL-EBI annual scientific report 2013 (2014)Google Scholar
  2. 2.
    V. Marx, Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)CrossRefGoogle Scholar
  3. 3.
    S.Y. Rojahn, Breaking the genome bottleneck. MIT Technology Review (May 2012)Google Scholar
  4. 4.
    A. Nekrutenko, J. Taylor, Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13(9), 667–672 (2012)CrossRefGoogle Scholar
  5. 5.
    M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  6. 6.
    D. Croft, G. OKelly, G. Wu, R. Haw, M. Gillespie, L. Matthews, M. Caudy, P. Garapati, G. Gopinath, B. Jassal et al., Reactome a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2010)Google Scholar
  7. 7.
    E.G. Cerami, B.E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G.D. Bader, C. Sander, Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(suppl 1), D685–D690 (2011)CrossRefGoogle Scholar
  8. 8.
    J. Mosquera, A. Sanchez-Pla, Serbgo: searching for the best go tool. Nucleic Acids Res. 36(suppl 2), W368–W371 (2008)CrossRefGoogle Scholar
  9. 9.
    T.H. Stokes, R.A. Moffitt, J.H. Phan, M.D. Wang, Chip artifact CORRECTion (caCORRECT): a bioinformatics system for quality assurance of genomics and proteomics array data. Ann. Biomed. Eng. 35(6), 1068–1080 (2007)Google Scholar
  10. 10.
    J.H. Phan, A.N. Young, M.D. Wang, ominBiomarker a web-based application for knowledge-driven biomarker identification. IEEE Trans. Biomed. Eng. 60(12), 3364–3367 (2013)CrossRefGoogle Scholar
  11. 11.
    M. Liang, F. Zhang, G. Jin, J. Zhu, FastGCN: a GPU accelerated tool for fast gene co-expression networks. PLoS one 10(1), e0116776 (2014)Google Scholar
  12. 12.
    D.G. McArt, P. Bankhead, P.D. Dunne, M. Salto-Tellez, P. Hamilton, S.D. Zhang, cudaMap: a GPU accelerated program for gene expression connectively mapping. BMC Bioinform. 14(1), 305 (2013)CrossRefGoogle Scholar
  13. 13.
    A. Day, J. Dong, V.A. Funari, B. Harry, S.P. Strom, D.H. Cohn, S.F. Nelson, Disease gene characterization through large scale co-expression analysis. PLoS One 4(12), e8491 (2009)Google Scholar
  14. 14.
    H. Kashyap, H.A. Ahmed, N. Hoque, S. Roy, D.K. Bhattacharyya, Big data analytics in bioinformatics: a machine learning perspectiveGoogle Scholar
  15. 15.
    A. Day, M.R. Carlson, J. Dong, B.D. O’Connor, S.F. Nelson, Celsius: a community resource for Affymetrix microarray data. Genome Biol. 8(6), R112 (2007)Google Scholar
  16. 16.
    P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9(1), 559 (2008)CrossRefGoogle Scholar
  17. 17.
    C.G. Rivera, R. Vakil, J.S. Bader, NeMo: network module identification in cytoscape. BMC Bioinform. 11(Suppl 1), S61 (2010)CrossRefGoogle Scholar
  18. 18.
    G.D. Bader, C.W. Hogue, An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4(1), 2 (2003)CrossRefGoogle Scholar
  19. 19.
    T. Nepusz, H. Yu, A. Paccanaro, Detcting overlapping protein complexes in protein protein interaction networks. Nat. Methods 9(5), 471–472 (2012)CrossRefGoogle Scholar
  20. 20.
    B.P. Kelley, B. Yuan, F. Lewritter, R. Sharan, B.R. Stockwell, T. Ideker, PathBALST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32(suppl 2), W83–W88 (2004)CrossRefGoogle Scholar
  21. 21.
    J. Goecks, A. Nekrutenko, J. Taylor et al., Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life science. Genomic Biol. 11(8), R86 (2010)CrossRefGoogle Scholar
  22. 22.
    A. Matsunaga, M. Tsugawa, J. Fortes, Cloudblast: combining MapReduce and virtualization on distributed resources for bioinformatics applications, in eScience’08 IEEE Fourth International Conference on IEEE, 2008, pp. 222–229Google Scholar
  23. 23.
    H. Nordberg, K. Bhatia, K. Wang, Z. Wang, BioPig: a hadoop based analytic toolkit for large-scale sequence data. Bioinformatics 29(23), 3014–3019 (2013)CrossRefGoogle Scholar
  24. 24.
    A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Kotpelainen, G. Zanetti, K. Heljanko, SeqPig: simple and scalable scripting for large sequencing data sets in hadoop. Bioinformatics 30(1), 119–120 (2014)CrossRefGoogle Scholar
  25. 25.
    B. Langmead, M.C. Schatz, J. Lin, M. Pop, S.L. Salzherg, Searching for SNPs with cloud computing. Genome Biol. 10(11), R134 (2009)CrossRefGoogle Scholar
  26. 26.
    B. Langmead, C. Trapnell, M. Pop, S.L. Salzberg et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)Google Scholar
  27. 27.
    R. Li, Y. Li, X. Fang, H. Yang, J. Wang, K. Kristiansen, J. Wang, SNP detection for massively parallel whole-genome resequencing. Genome Res. 19(6), 1125–1132 (2009)CrossRefGoogle Scholar
  28. 28.
    S. Zhao, K. Prenger, L. Smith, Strombow: a cloud-based tool for reads mapping and expression quantification in large scale RNA-Seq studies. Int. Sch. Res. Not. 2013 (2013)Google Scholar
  29. 29.
    S.V. Angiuoli, M. Matalka, A. Gussman, K. Galens, M. Vangala, D.R. Riley, C. Arze, J.R. White, O. White, W.F. Fricke, CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinform. 12(1), 356 (2011)CrossRefGoogle Scholar
  30. 30.
    S. Zhao, K. Prenger, L. Smith, T. Messina, H. Fan, E. Jaeger, S. Stephens, Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing. BMC Genom. 14(1), 425 (2013)CrossRefGoogle Scholar
  31. 31.
    S. Kurtz, The vmatch large scale sequence analysis software. Ref Type: Computer Program, pp. 4–12 (2003)Google Scholar
  32. 32.
  33. 33.
    A.C. Zambon, S. Gaj, I. Ho, K. Hanspers, K. Vranizan, C.T. Evelo, B.R. Conklin, A.R. Pico, N. Salomonis, GO-Elite a flexible solution for pathway and ontology over representation. Bioinformatics 28(16), 2209–2210 (2012)CrossRefGoogle Scholar
  34. 34.
    M.P. van lersel T. Kelder, A.R. Pico, K. Hanspers, S. Coort, B.R. Conklin, C. Evelo, Presenting and exploring biological pathways with PathVisio. BMC Bioinform. 9(1), 399 (2008)Google Scholar
  35. 35.
    P. Yang, E. Patrick, S.X. Tan, D.J. Fazakerley, J. Burchfield, C. Gribben, M.J. Prior, D.E. James, Y.H. Yang, Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway. Bioinformatics 30(6), 808–814 (2014)Google Scholar
  36. 36.
    P. Grosu, J.P. Townsend, D.L. Hartl, D. Cavalieri, Pathway processor: a tool for integrating whole-genome expression results into metabolic networks. Genome Res. 12(7), 1121–1126 (2002)Google Scholar
  37. 37.
    Y.S. Park, M. Schmidt, E.R. Martin, M.A. Pericak-Vance, R.H. Chung, Pathway PDT: a flexible pathway analysis tool for nuclear families. BMC Bioinform. 14(1), 267 (2013)CrossRefGoogle Scholar
  38. 38.
    W. Luo, C. Brouwer, Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics 29(14), 1830–1831 (2013)CrossRefGoogle Scholar
  39. 39.
    S. Kumar, M. Nei, J. Dedley, K. Tamura, MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief. Bioinform. 9(4), 299–306 (2008)CrossRefGoogle Scholar
  40. 40.
    M.S. Barker, K.M. Dlugosch, L. Dinh, R.S. Challa, N.C. Kane, M.G. King, L.H. Rieseberg, EvoPipes net: bioinformatic tools for ecological and evolutionary genomics. Evol. Bioinform. Online 6, 143 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • C.S.R. Prabhu
    • 1
    Email author
  • Aneesh Sreevallabh Chivukula
    • 2
  • Aditya Mogadala
    • 3
  • Rohit Ghosh
    • 4
  • L.M. Jenila Livingston
    • 5
  1. 1.National Informatics CentreNew DelhiIndia
  2. 2.Advanced Analytics InstituteUniversity of Technology, SydneyUltimoAustralia
  3. 3.Saarland UniversitySaarbrückenGermany
  4. 4.Qure.aiGoregaon East, MumbaiIndia
  5. 5.School of Computing Science and EngineeringVellore Institute of TechnologyChennaiIndia

Personalised recommendations