BaaS - Bioinformatics as a Service

  • Ritesh KrishnaEmail author
  • Vadim ElisseevEmail author
  • Samuel Antao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


Genomics and related technologies, collectively known as Omics, have transformed life sciences research. These technologies produce mountain of data that needs to be managed and analysed. Rapid developments in the Next Generation Sequencing technologies have helped genomics become mainstream, but the compute support systems, meant to enable genomics, have lagged behind. As genomics is making inroads into personalised health care and clinical settings, it is paramount that a robust compute infrastructure be designed to meet the growing needs of the field. Infrastructure design to deal with omics datasets is an active area of research and a critical one, for omics to be adopted in industrial healthcare and clinical settings. In this paper, we propose a blueprint for an as-a service compute infrastructure for fast and scalable processing of omics datasets. We explain our approach with help of a well-known bioinformatics workflow and a compute environment that can be tailored to achieve portability, reproducibility and scalability using modern High Performance Computing systems.


Bioinformatics HPC Containers Genomics Workflows 


  1. 1.
  2. 2.
    Bhattacharya, A., Cui, Y.: A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules. Sci. Rep. 7(1) (2017). Article no. 4162Google Scholar
  3. 3.
    CWL working group: Common workflow language (2016).
  4. 4.
    Docker: Docker (2018).
  5. 5.
    Ekblom, R., Wolf, J.B.W.: A field guide to whole-genome sequencing, assembly and annotation 7(9), 1026–1042. Scholar
  6. 6.
  7. 7.
  8. 8.
  9. 9.
    Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., Salzberg, S.L.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions 14, R36. Scholar
  10. 10.
    Kim, N.S., Chen, D., Xiong, J., Hwu, W.W.: Heterogeneous computing meets near-memory acceleration and high-level synthesis in the post-moore era. IEEE Micro 37(4), 10–18 (2017). Scholar
  11. 11.
    Nobile, M.S., Cazzaniga, P., Tangherloni, A., Besozzi, D.: Graphics processing units in bioinformatics, computational biology and systems biology. Brief. Bioinform. 18(5), 870–885 (2017)Google Scholar
  12. 12.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013).
  13. 13.
    Ronquist F., et al.: MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012)CrossRefGoogle Scholar
  14. 14.
    Sadasivam, S.K., Thompto, B.W., Kalla, R., Starke, W.J.: IBM power9 processor architecture. IEEE Micro 37(2), 40–51 (2017). Scholar
  15. 15.
    Stephens, Z.D., et al.: Big data: astronomical or genomical? 13(7), e1002195. Scholar
  16. 16.
    Tebani, A., Afonso, C., Marret, S., Bekri, S.: Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations 17(9), 1555. Scholar
  17. 17.
    Trapnell, C., et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks 7(3), 562–578. Scholar
  18. 18.
    Trapnell, C., et al.: Transcript assembly and abundance estimation from RNA-seq reveals thousands of new transcripts and switching among isoforms 28(5), 511–515. Scholar
  19. 19.
    Various: Rabix website (2018).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IBM Research, SciTech DaresburyWarringtonUK

Personalised recommendations