Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing
With the advent of next-generation sequencing technology, clinical data sets now contain enormous amounts of valuable genomic information related to a wide range of diseases such as cancer. This data needs to be analysed, managed, stored, visualized and integrated in order to be clinically useful. However, many clinicians and researchers, who need to interpret these data sets, are non-specialists in the information technology domain and so need systems that are effective and easy to use. Herein, we present an overview of a novel cloud computing based next-generation sequencing research management software system which has simplicity, scalability, speed and reproducibility at its core. A prototype that enables rapid visualization of big data cancer sets is described. We present preliminary results from a bioinformatics pipeline for the Sage Care project, a European Union funded cancer research project, for comprehensive genome mapping analysis and visualization and outlined benefits of integrating this into a graphical user interface platform such as Simplicity.
KeywordsCloud Computing Differential Expression Analysis Public Cloud Private Cloud Cloud Technology
Paul Walsh, Brian Kelly, Timm Heuss and Brendan Lawlor are investigators on Sage Care, a H2020 MCSA funded project, grant number 644186.
- 4.Grossman, R.: Managing and Analysing 1,000,000 Genomes, September 2012. http://rgrossman.com/2012/09/18/million-genomes-challeng
- 7.Mell, P., Grance, T.: The NIST definition of cloud computing, National Institute of Standards and Technology (2011). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
- 8.Hyek, P.: Cloud computing issues and impacts, Global Technology Industry Discussion Series, E&Y (2011). http://www.ey.com/Publication/vwLUAssets/Cloud_computing_issues,_impacts_and_insights/$File/Cloud%20computing%20issues%20and%20impacts_14Apr11.pdf
- 9.Shvachko, K.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium, Mass Storage Systems and Technologies (MSST). IEEE (2010)Google Scholar
- 12.Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40(4), 346–358 (2001)Google Scholar
- 17.Lu, W., Jackson, J., Barga, R.: AzureBlast: a case study of developing science applications on the cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC 2010), pp. 413–420. ACM, New York (2010). doi: 10.1145/1851476.1851537
- 27.Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. In: Internet Computing. IEEE, May–June 2011Google Scholar