Abstract
Computer science plays a key role in today’s genetic research. Next-generation sequencing technologies produce an enormous amount of data, pushing genetic laboratories to the limits of data storage and computational power. Therefore, new approaches are needed to eliminate these shortcomings and provide possibilities to use current algorithms in the area of bioinformatics with improved usability. A possible starting point is cloud computing with the opportunity to use linked computer systems and services on demand. Thus, huge amounts of data can be analysed much faster and more efficiently than by utilising a single computer system. This chapter gives the reader an overview about cloud computing, discusses its challenges and opportunities and shows existing solutions in the field of genetics to gather some hands-on experience.
Lukas Forer, Sebastian Schönherr and Hansi Weißensteiner contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Using Illumina’s 1G platform, including all image data.
- 2.
IBM: http://www.ibm.com.
- 3.
XEN: http://www.xen.org.
- 4.
VMWare: http://www.vmware.com.
- 5.
Google App Engine: https://developers.google.com/appengine/.
- 6.
Microsoft Azure: http://www.windowsazure.com.
- 7.
Ensemble Genome Browser: http://www.ensembl.org.
- 8.
UCSC Genome Browser: http://genome.ucsc.edu.
- 9.
Complete physical server can be provided as well.
- 10.
OpenAM: http://forgerock.com/openam.html.
- 11.
Apache Hadoop framework: http://hadoop.apache.org.
- 12.
Apache Whirr project: http://whirr.apache.org/.
- 13.
Amazon AWS Management Console: https://console.aws.amazon.com.
- 14.
Amazon pricing list: http://aws.amazon.com/ec2/instance-types/.
- 15.
Cold Spring Harbor Laboratory in New York.
- 16.
Johns Hopkins Bloomberg School of Public Health.
- 17.
Cloudgene: http://cloudgene.uibk.ac.at.
- 18.
CloudBioLinux: http://cloudbiolinux.org.
- 19.
Public Datasets on Amazon: http://aws.amazon.com/publicdatasets.
- 20.
GenBank: http://www.ncbi.nlm.nih.gov/genbank/.
- 21.
HapMap: www.hapmap.org.
- 22.
UniGene: http://www.ncbi.nlm.nih.gov/unigene.
References
Afgan E, Baker D, Coraor N, Chapman B, Nekrutenko A, Taylor J (2010) Galaxy CloudMan: delivering cloud compute clusters. BMC Bioinformatics 11(Suppl 12):S4. doi:10.1186/1471-2105-11-S12-S4
Clark J (2011) NHS laptop loss could put millions of records at risk. http://www.zdnet.co.uk/news/security-management/2011/06/15/nhs-laptop-loss-could-put-millions-of-records-at-risk-40093112/. Accessed 20 Jun 2011
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Feng X, Grossman R, Stein L (2011) PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 12:139. doi:10.1186/1471-2105-12-139
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86. doi:10.1186/gb-2010-11-8-r86
Holland R (2011) Ten steps to successful cloud migration. http://www.eaglegenomics.com/download-files/whitepaper/CloudWhitePaper.pdf. Accessed 20 Jun 2011
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Searching for SNPs with cloud computing. Genome Biol 10(11):R134. doi:10.1186/gb-2009-10-11-r134
Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11(8):R83. doi:10.1186/gb-2010-11-8-r83
Markovich S (2010) How to secure sensitive data in cloud environments. http://www.eweek.com/c/a/Cloud-Computing/How-to-Secure-Sensitive-Data-in-Cloud-Environments/. Accessed 20 Jun 2011
Matsunaga A, Tsugawa M, Fortes J (2008) Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: Proceedings of the 2008 fourth IEEE international conference on eScience, IEEE, pp 222–229. doi:10.1109/eScience.2008.62
O’Connor BD, Merriman B, Nelson SF (2010) SeqWare query engine: storing and searching sequence data in the cloud. BMC Bioinformatics 11(Suppl 12):S2. doi:10.1186/1471-2105-11-S12-S2
Rittinghouse J, Ransome J (2009) Cloud computing: implementation, management, and security, 1st edn. CRC, Boca Raton
Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369. doi:10.1093/bioinformatics/btp236
Wetterstrand KA (2011) DNA sequencing costs: data from the NHGRI large-scale genome sequencing program. http://www.genome.gov/sequencingcosts. Accessed 11 Apr 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Wien
About this chapter
Cite this chapter
Forer, L., Schönherr, S., Weißensteiner, H., Specht, G., Kronenberg, F., Kloss-Brandstätter, A. (2012). Cloud Computing. In: Trajanoski, Z. (eds) Computational Medicine. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0947-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-0947-2_2
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0946-5
Online ISBN: 978-3-7091-0947-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)