Advertisement

Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing

  • Paul WalshEmail author
  • Brendan Lawlor
  • Brian Kelly
  • Timmy Manning
  • Timm Heuss
  • Markus Leopold
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10084)

Abstract

With the advent of next-generation sequencing technology, clinical data sets now contain enormous amounts of valuable genomic information related to a wide range of diseases such as cancer. This data needs to be analysed, managed, stored, visualized and integrated in order to be clinically useful. However, many clinicians and researchers, who need to interpret these data sets, are non-specialists in the information technology domain and so need systems that are effective and easy to use. Herein, we present an overview of a novel cloud computing based next-generation sequencing research management software system which has simplicity, scalability, speed and reproducibility at its core. A prototype that enables rapid visualization of big data cancer sets is described. We present preliminary results from a bioinformatics pipeline for the Sage Care project, a European Union funded cancer research project, for comprehensive genome mapping analysis and visualization and outlined benefits of integrating this into a graphical user interface platform such as Simplicity.

Keywords

Cloud Computing Differential Expression Analysis Public Cloud Private Cloud Cloud Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Paul Walsh, Brian Kelly, Timm Heuss and Brendan Lawlor are investigators on Sage Care, a H2020 MCSA funded project, grant number 644186.

References

  1. 1.
    Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647–657 (2010). doi: 10.1038/nrg2857 CrossRefGoogle Scholar
  2. 2.
    Tsai, E.A., et al.: Bioinformatics workflow for clinical whole genome sequencing at partners healthcare personalized medicine. J. Personal. Med. 6(1), 12 (2016)CrossRefGoogle Scholar
  3. 3.
    Liu, C.M., Wong, T., Wu, E., Luo, R., Yiu, S.M., Li, Y., Wang, B., Yu, C., Chu, X., Zhao, K., Li, R., Lam, T.W.: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6), 878–879 (2011)CrossRefGoogle Scholar
  4. 4.
    Grossman, R.: Managing and Analysing 1,000,000 Genomes, September 2012. http://rgrossman.com/2012/09/18/million-genomes-challeng
  5. 5.
    Foster, I.: Accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011). ISSN: 1089-7801CrossRefGoogle Scholar
  6. 6.
    Whiteman, D.C., Green, A.C., Olsen, C.M.: The growing burden of invasive melanoma: projections of incidence rates and numbers of new cases in six susceptible populations through 2031. J. Investig. Dermatol. (2016). doi: 10.1016/j.jid.2016.01.035 Google Scholar
  7. 7.
    Mell, P., Grance, T.: The NIST definition of cloud computing, National Institute of Standards and Technology (2011). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  8. 8.
    Hyek, P.: Cloud computing issues and impacts, Global Technology Industry Discussion Series, E&Y (2011). http://www.ey.com/Publication/vwLUAssets/Cloud_computing_issues,_impacts_and_insights/$File/Cloud%20computing%20issues%20and%20impacts_14Apr11.pdf
  9. 9.
    Shvachko, K.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium, Mass Storage Systems and Technologies (MSST). IEEE (2010)Google Scholar
  10. 10.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running pipelines of services. Nucl. Acids Res. 34(Web Server issue), 729–732 (2006)CrossRefGoogle Scholar
  11. 11.
    Brooksbank, C., Cameron, G., Thornton, J.: The European Bioinformatics Institute’s data resources. Nucl. Acids Res. Advance Access (2009). doi: 10.1093/nar/gkp986 Google Scholar
  12. 12.
    Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40(4), 346–358 (2001)Google Scholar
  13. 13.
    Brazas, M.D., Yamada, J.T., Ouellette, B.F.: Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucl. Acids Res. 37, 3–5 (2009)CrossRefGoogle Scholar
  14. 14.
    Dudley, J.T., Butte, A.J.: A quick guide for developing effective bioinformatics programming skills. PLoS Comput. Biol. 5(12), e1000589 (2009)CrossRefGoogle Scholar
  15. 15.
    Papazoglou, M.P.: Service-oriented computing: state of the art and research challenges. Computer 40(11), 38–45 (2007). IEEE Computer Society. ISSN: 0018-9162CrossRefGoogle Scholar
  16. 16.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). doi: 10.1145/1721654.1721672 CrossRefGoogle Scholar
  17. 17.
    Lu, W., Jackson, J., Barga, R.: AzureBlast: a case study of developing science applications on the cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC 2010), pp. 413–420. ACM, New York (2010). doi: 10.1145/1851476.1851537
  18. 18.
    Cockburn, A.: Agile Software Development. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)zbMATHGoogle Scholar
  19. 19.
    Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., Mesirov, J.P.: Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011)CrossRefGoogle Scholar
  20. 20.
    Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)CrossRefGoogle Scholar
  21. 21.
    Walsh, P., Carroll, J., Sleator, R.D.: Accelerating in silico research with workflows: a lesson in simplicity. Comput. Biol. Med. 43(12), 2028–2035 (2013)CrossRefGoogle Scholar
  22. 22.
    Shachak, A., Shuval, K., Fine, S.: Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study. J. Med. Libr. Assoc. 95(4), 454–458 (2007)CrossRefGoogle Scholar
  23. 23.
    Stajich, J., Lapp, H.: Open source tools and toolkits for bioinformatics: significance, and where are we? Brief. Bioinform. 7(3), 287–296 (2006)CrossRefGoogle Scholar
  24. 24.
    Greene, S., Jones, L., Matchen, P., Thomas, J.: Iterative development in the field. IBM Syst. J. 42(4), 594–612 (2003)CrossRefGoogle Scholar
  25. 25.
    Love, M., Anders, S., Huber, W.: Differential analysis of count data–the DESeq2 package. Genome Biol. 15, 550 (2014)CrossRefGoogle Scholar
  26. 26.
    Kahn, S.D.: On the future of genomic data. Science 331(6018), 728–729 (2011)CrossRefGoogle Scholar
  27. 27.
    Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. In: Internet Computing. IEEE, May–June 2011Google Scholar
  28. 28.
    Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13(9), 667–672 (2012)CrossRefGoogle Scholar
  29. 29.
    Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Paul Walsh
    • 1
    Email author
  • Brendan Lawlor
    • 1
  • Brian Kelly
    • 1
  • Timmy Manning
    • 1
  • Timm Heuss
    • 2
  • Markus Leopold
    • 2
  1. 1.NSilico LifescienceRubicon CentreCorkIreland
  2. 2.University of Applied Sciences DarmstadtDarmstadtGermany

Personalised recommendations