Parallel Software Architecture for Experimental Workflows in Computational Biology on Clouds
Cloud computing opens new possibilities for computational biologists. Given the pay-as-you-go model and the commodity hardware base, new tools for extensive parallelism are needed to make experimentation in the cloud an attractive option. In this paper, we present EasyProt, a parallel message-passing architecture designed for developing experimental workflows in computational biology while harnessing the power of cloud resources. The system exploits parallelism in two ways: by multithreading modular components on virtual machines while respecting data dependencies and by allowing expansion across multiple virtual machines. Components of the system, called elements, are easily configured for efficient modification and testing of workflows during ever-changing experimentation. Though EasyProt, as an abstract cloud programming model, can be extended beyond computational biology, current development brings cloud computing to experimenters in this important discipline who are facing unprecedented data-processing challenges, with a type system designed for proteomics, interactomics and comparative genomics data, and a suite of elements that perform useful analysis tasks on biological data using cloud resources.
Availability: EasyProt is available as a public abstract machine image (AMI) on Amazon EC2 cloud service, with an open source license, registered with manifest easyprot-ami/easyprot.img.manifest.xml.
Keywordsparallel architectures scientific workflows cloud computing
Unable to display preview. Download preview PDF.
- 6.Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: Proc. of the 18th Symposium on Operating Systems Principles, SOSP 2001 (2001)Google Scholar
- 7.Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., et al.: Above the clouds: a Berkeley view of cloud computing. EECS Department, University of California, Berkeley UCB/EECS-2009-28 (2009)Google Scholar
- 8.Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Heidelberg (2006)Google Scholar
- 9.Deelman, E., Singh, G., Su, M., Blythe, J., Gil, Y.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 219–237 (2005)Google Scholar
- 10.Juve, G., Deelman, E.: Scientific workflows in the cloud. In: Cafaro, M., Aloisio, G. (eds.) Grids, Clouds and Virtualization, pp. 71–91. Springer, Heidelberg (2010)Google Scholar
- 11.Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34(Web Server issue), W729–W732 (2006)Google Scholar
- 17.Klipp, E., Liebermeister, W., Wierling, C., Kowald, A., Lehrach, H., Herwig, R.: Systems Biology: A Textbook. Wiley-VCH, Weinheim (2009)Google Scholar
- 18.Hodgkinson, L., Karp, R.M.: Algorithms to detect multiprotein modularity conserved during evolution. IEEE/ACM Trans. on Computational Biology and Bioinformatics (September 27, 2011), IEEE Computer Society Digital Library. IEEE Computer Society, http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.125
- 20.Bialecki, A., Cafarella, M., Cutting, D., OMalley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, Wiki at, http://lucene.apache.org/hadoop