GENESIS—Cloud-Based System for Next Generation Sequencing Analysis: A Proof of Concept

  • Maider AlberichEmail author
  • Arkaitz Artetxe
  • Eduardo Santamaría-Navarro
  • Alfons Nonell-Canals
  • Grégory Maclair
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 60)


With the advent of the technology, the DNA sequencing has become cheaper and faster. Next-Generation Sequencing platforms are providing new opportunities to address biological and medical issues. However, they present new challenges of storing, handling and processing, as they produce massive amounts of data. Powerful computational infrastructure, new bioinformatics softwares and skilled people in programming are required to work with the analysis tools. This project aims to design and develop an intelligent system that analyses high-throughput datasets, with the purpose of improving the effectiveness in the biological and medical research fields. The target is to make a user-friendly tool that allows the user to automatically or manually design the desired analysis workflow. Therefore, the technological challenges consist in: (i) an interface between clinician and bioinformatics language, (ii) an intelligent tool that selects the appropriate analysis workflow and (iii) a solution that can handle, store and manage big datasets at a reasonable-price. In order to tackle these bottlenecks, a cloud-based prototype enhanced by a graphical user-friendly interface and implemented using Amazon Web Service.


Next generation sequencing (NGS) High-throughput sequencing Automatized workflow Cloud-computing Amazon web services (AWS) 



This work was supported by the Provincial Council of Gipuzkoa. The authors would like to express their gratitude to the researchers of the Multiple Sclerosis group of BioDonostia Health Institute for their cooperation.


  1. 1.
    Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)CrossRefGoogle Scholar
  2. 2.
    Quail, M., Smith, M.E., Coupland, P., Otto, T.D., Harris, S.R., Connor, T.R., Bertoni, A., Swerdlow, H.P., Gu, Y.: A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genomics 13(1). 341 (2012)Google Scholar
  3. 3.
    Shendure, J., Ji, Hanlee: Next-generation DNA sequencing. Nat. Biotechnol. 26(10), 1135–1145 (2008)CrossRefGoogle Scholar
  4. 4.
    Bhuvaneshwar, K., Sulakhe, D., Gauba, R., Rodriguez, A., Madduri, R., Dave, U., Lacinski, L., Foster, I., Gusev, Y., Madhavan, S.: A case study for cloud based high throughput analysis of NGS data using the globus genomics system. Comput. Struct. Biotechnol. J. 13, 64–74 (2015)CrossRefGoogle Scholar
  5. 5.
    Thakur, R.S., Bandopadhyay, R., Chaudhary, B., Chatterjee, S.: Now and next-generation sequencing techniques: future of sequence analysis using cloud computing. Front. Gene 3 (2012)Google Scholar
  6. 6.
    Nagasaki, H., Mochizuki, T., Kodama, Y., Saruhashi, S., Morizaki, S., Sugawara, H., Ohyanagi, H., Kurata, N., Okubo, K., Takagi, T., Kaminuma, E., Nakamura, Y.: DDBJ read annotation pipeline: A cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data. DNA Res. 20(4), 383–390 (2013)CrossRefGoogle Scholar
  7. 7.
    Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010)CrossRefGoogle Scholar
  8. 8.
    Rex, D.E., Ma, J.Q., Toga, A.W.: The LONI pipeline processing environment. Neuroimage 19, 1033–1048 (2003)CrossRefGoogle Scholar
  9. 9.
    Hull, D., Wolstencroft, K., Stevens, R., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006)CrossRefGoogle Scholar
  10. 10.
    Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M.R., Zschocke, J., Trajanoski, Z.: A survey of tools for variant analysis of next-generation genome sequencing data. Briefings Bioinform. 15(2), 256–278 (2013)CrossRefGoogle Scholar
  11. 11.
    Torri, F., Dinov, I.D., Zamanyan, A. et al.: Next generation sequence analysis and computational genomics using graphical pipeline workflows. Genes 3(4), 545–575 (2012)Google Scholar
  12. 12.
    Celery. Accessed 22 Feb 2016
  13. 13.
    RabbitMQ. Accessed: 22 Feb 2016
  14. 14.
    SQLAlchemy. Accessed: 22 Feb 2016
  15. 15.
    Boto. Accessed: 22 Feb 2016
  16. 16.
    Fastqp. Accessed 22 Feb 2016
  17. 17.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Fenome Biol. 10(3), R25 (2009)Google Scholar
  18. 18.
    Li, H.: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21), 2987–2993 (2011)Google Scholar
  19. 19.
    Danecek, P., Auton, A., et al.: The variant call format and VCFtools. Bioinformatics 27(15), 2156–2158 (2011)CrossRefGoogle Scholar
  20. 20.
    Subprocess. Accessed 22 Feb 2016
  21. 21.
    Matplotlib. Accessed 22 Feb 2016
  22. 22.
    JBrowse. Accessed 22 Feb 2016
  23. 23.
    Bootstrap. Accessed 22 Feb 2016
  24. 24.
    JSPlumb. Accessed 22 Feb 2016
  25. 25.
    Django. Accessed 18 Jan 2016

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Maider Alberich
    • 1
    • 2
    Email author
  • Arkaitz Artetxe
    • 1
    • 2
  • Eduardo Santamaría-Navarro
    • 3
  • Alfons Nonell-Canals
    • 3
  • Grégory Maclair
    • 1
    • 2
  1. 1.Vicomtech-IK4San SebastianSpain
  2. 2.Biodonostia Health Research InstituteSan SebastianSpain
  3. 3.Mind the ByteBarcelonaSpain

Personalised recommendations