Bioinformatics and Microarray Data Analysis on the Cloud

  • Barbara Calabrese
  • Mario CannataroEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1375)


High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.


Cloud computing Bioinformatics Microarray data analysis 


  1. 1.
    Mell P, Grance T. The NIST definition of cloud computing. Recommendations of the National Institute of Standards and Technology, Special Publication, 800–145
  2. 2.
    Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58CrossRefGoogle Scholar
  3. 3.
    Vaquero LM, Rodero-Merino L, Caceres J et al (2009) A break in the clouds: towards a cloud definition. Comput Comm Rev 39:50–55CrossRefGoogle Scholar
  4. 4.
    Calabrese B, Cannataro M, Cloud Computing in Healthcare and Biomedicine, Scalable Computing: Practice and Experience 16(1):1–18. doi: 10.12694/scpe.v16i1.1057
  5. 5.
    Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput Surv 43(1):1–36CrossRefGoogle Scholar
  6. 6.
    Phillips C (2009) SNP databases. In: Komar AA (ed) Single nucleotide polymorphisms, vol 578. Humana, Totowa, NJ, pp 43–71, ch. 3CrossRefGoogle Scholar
  7. 7.
    Schadt EE, Linderman MD, Sorenson J et al (2011) Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology. Nat Rev Genet 12(3):224CrossRefPubMedGoogle Scholar
  8. 8.
    Grossmann RL, White KP (2011) A vision for a biomedical cloud. J Intern Med 271(2):122–130CrossRefGoogle Scholar
  9. 9.
    Dudley JT, Pouliot Y, Chen JR et al (2010) Translational bioinformatics in the cloud: an affordable alternative. Genome Med 2:51CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Fusaro VA, Patil P, Gafni E et al (2011) Biomedical cloud computing with Amazon web services. PLoS Comput Biol 7(8):e1002147. doi: 10.1371/journal.pcbi.1002147 CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Dai L, Gao X, Guo Y et al (2012) Bioinformatics clouds for big data manipulation. Biol Direct 7:43. doi: 10.1186/1745-6150-7-43 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Zhang L, Gu S, Wang B et al (2012) Gene set analysis in the cloud. Bioinformatics 28(2):294–295CrossRefPubMedGoogle Scholar
  13. 13.
    Wang Z, Wang Y, Tan KL et al (2011) eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study. Bioinformatics 27(8):1045–1051CrossRefPubMedGoogle Scholar
  14. 14.
    Karczewski KJ, Fernald GH, Martin AR et al (2014) STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PLoS One 9(1):e84860. doi: 10.1371/journal.pone.0084860 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Nguyen T, Shi W, Ruden D (2011) CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping. BMC Res Notes 4:171. doi: 10.1186/1756-0500-4-171 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Langmead B, Schatz MC, Lin J et al (2009) Searching for SNPs with cloud computing. Genome Biol 10:R134. doi: 10.1186/gb-2009-10-11-r134 CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Habegger L, Balasubramanian S, Chen DZ et al (2012) VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. Bioinformatics 28(17):2267–2269CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Hong D (2012) FX: an RNA-Seq analysis tool on the cloud. Bioinformatics 28(5):721–723CrossRefPubMedGoogle Scholar
  20. 20.
    Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11:R83. doi: 10.1186/gb-2010-11-8-r83 CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Feng X, Grossman R, Stein L (2011) PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 12:139. doi: 10.1186/1471-2105-12-139 CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Muth T, Peters J, Blackburn J et al (2013) ProteoCloud: a full-featured open source proteomics cloud computing pipeline. J Proteomics 88:104–108CrossRefPubMedGoogle Scholar
  23. 23.
    Lee H, Yang Y, Chae H et al (2012) BioVLAB-MMIA: a cloud environment for microRNA and mRNA integrated analysis (MMIA) on Amazon EC2. IEEE Trans Nanobioscience 11(3):266–272CrossRefPubMedGoogle Scholar
  24. 24.
    Chae H, Rhee S, Nephew KP et al (2014) BioVLAB-MMIA-NGS: MicroRNA-mRNA integrated analysis using high throughput sequencing data. Bioinformatics 31:265–267. doi: 10.1093/bioinformatics/btu614 CrossRefPubMedGoogle Scholar
  25. 25.
    Agapito G, Cannataro M, Guzzi PH et al (2013) Cloud4SNP: distributed analysis of SNP microarray data on the cloud. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics (BCB’13)Google Scholar
  26. 26.
    Afgan E, Baker D, Coraor N et al (2011) Harnessing cloud computing with Galaxy Cloud. Nat Biotechnol 29(11):972–974CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Afgan E, Chapman B, Taylor J (2012) CloudMan as a platform for tool, data and analysis distribution. BMC Bioinformatics 13:315. doi: 10.1186/1471-2105-13-315 CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Jourdren L, Bernard M, Dillies MA et al (2012) Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics 11(28):1542–1543CrossRefGoogle Scholar
  29. 29.
    Heath P, Greenway M, Powell R et al (2014) Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets. Int J Med Inform 21(6):969–975. doi: 10.1136/amiajnl-2013-002155 Google Scholar
  30. 30.
    Angiuoli SV, Matalka M, Gussman A et al (2011) CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12:356. doi: 10.1186/1471-2105-12-356 CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Krampis K, Booth T, Chapman B et al (2012) Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. Bioinformatics 13:42. doi: 10.1186/1471-2105-13-42 PubMedPubMedCentralGoogle Scholar
  32. 32.
    Johnson ME (2009) Data hemorrhages in the health-care sector, Financial Cryptography and Data Security, Lecture Notes in Computer Science Volume 5628, pp. 71–89. doi: 10.1007/978-3-642-03549-4_5
  33. 33.
    Guidelines on security and privacy in public cloud computing. National Institute of Standards and Technology (NIST), U.S. Department of Commerce. Special Publication, 800–144. 144/SP800-144.pdf
  34. 34.
    Kamara S, Lauter K (2010) Cryptographic Cloud Storage, Financial Cryptography and Data Security, Lecture Notes in Computer Science Volume 6054, pp. 136–149. doi: 10.1007/978-3-642-14992-4_13
  35. 35.
    Abbas A, Khan SU (2014) A review on the state-of-the-art privacy preserving approaches in the e-health clouds. IEEE J Biomed Health Inform 18(4):1431–1441CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly

Personalised recommendations