Abstract
Computational grids combine computers in the Internet for distributed data processing and are an attractive platform for the data-intensive applications of bioinformatics. We present an extensible genome processing software for the grid and evaluate its performance. Our software was able to discover previously unknown circular permutations (CP) in the ProDom database containing more than 70MB of protein data. A specific feature of our software is its design as a component: the Alignment HOC, a Higher-Order Component that makes use of the latest Globus toolkit as grid middleware. Besides genome data, the Alignment HOC accepts plugin code for processing this data as its input, and contains all the required configuration to run the component on top of Globus, thus, freeing the non-grid-expert user from dealing with grid middleware. Instead of writing data distribution procedures and configuring the middleware appropriately for every new algorithm, Alignment HOC users reuse the existing component and only write application-specific plugins. To maintain plugins persistently in a reusable manner, we built a web-accessible plugin database with a comfortable administration GUI. The flexible component-based implementation makes it easy to study CPs in other databases (e.g. UniProt/Swiss-Prot) or to use an alignment algorithm different than the standard Needleman-Wunsch. For the efficient distribution of workload, we developed a library of group communication operations for HOCs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Altunay, D. Colonnese, and C. Warade. High Throughput Web Services for Life Sciences. In IT Coding and Computing, pages 329-334, Washington, DC, USA, 2005. IEEE.
Laurent Baduel, Francoise Baude, and Denis Caromel. Efficient, Flexible, and Typed Group Communications in Java. In Java Grande Conference, pages 28-36, Seattle, 2002. ACM Press.
Bornberg-Bauer et al. Raspodom Results. http://www.uni-muenster.de/Biologie.Botanik/ebb/projects/raspodom
Janusz M. Bujnicki. Sequence Permutations in the Molecular Evolution of DNA methyl-transferases. BMC Evolutionary Biology, 2:3, 2002.
Jan D unnweber and Catalin L. Dumitrescu et al. . The HOC-SA Globus Incubator Project. Web page: http://dev.globus.org/incubator/hoc-sa/, 2006.
Jan D unnweber, Sergei Gorlatch, Marco Aldinucci, Marco Danelutto, and Sonia Campa. Adaptable Parallel Components for Grid Programming. In Integrated Research in GRID Computing, pages 43-59. Springer Verlag, December 2006.
Ian T. Foster. Globus Toolkit Version 4: Software for Service-Oriented Systems. In NPC, pages 2-13, 2005.
Sergei Gorlatch and Jan D unnweber. From Grid Middleware to Grid Applications: Bridging the Gap with HOCs. In Future Generation Grids, pages 299-306. Springer Verlag, 2005.
O. Gotoh. An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162:705-708, 1982.
A. Jeltsch. Circular Permutations in the Molecular Evolution of DNA Methyltransferases. S164, 1999.
Ahmed Moustafa. The JAligner Library for Biological Sequence Alignment, 2007. http://jaligner.sourceforge.net.
S. B. Needleman and C. D. Wunsch. A General Method Applicable to Search for Sim-ilarities in the Amino Acid Sequences of two Proteins. Journal of Molecular Biology, 48:443-453, 1970.
Zemin Ning, Anthony Cox, and James Mullikin. SSAHA: A Fast Search Method for Large DNA Databases. In Genome Research 11, pages 1725-1729, 2001.
OGSA-DAI project team. The Open Grid Service Architecture - Data Access and Integra-tion OGSA-DAI, 2007. http://www.ogsadai.org.uk.
T. Rauber, R. Reilein-Ruı, and G. R unger. ORT - A Communication Library for Orthogonal Processor Groups. In Proc. of the ACM/IEEE Supercomputing Conf. 2001 (SC’01), Denver, Colorado, USA, 2001. ACM.
T. F. Smith and M. S. Waterman. Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147:195-197, 1981.
J. 3rd Weiner, G. Thomas, and E. Bornberg-Bauer. Rapid motif-based Prediction of Circular Permutations in multi-domain Proteins. Bioinformatics, 21:932-937, 2005.
Asim YarKhan and Jack J. Dongarra. Biological Sequence Alignment on Computational Grids using the GrADS Framework. Future Gener. Comput. Syst., 21(6):980-986, 2005.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Ludeking, P., Dunnweber, J., Gorlatch, S. (2008). A Higher-Order Component for Efficient Genome Processing in the Grid. In: Making Grids Work. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-78448-9_28
Download citation
DOI: https://doi.org/10.1007/978-0-387-78448-9_28
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-78447-2
Online ISBN: 978-0-387-78448-9
eBook Packages: Computer ScienceComputer Science (R0)