A Higher-Order Component for Efficient Genome Processing in the Grid

Ludeking, Philipp; Dunnweber, Jan; Gorlatch, Sergei

doi:10.1007/978-0-387-78448-9_28

Philipp Ludeking⁵,
Jan Dunnweber⁵ &
Sergei Gorlatch⁵

280 Accesses

Abstract

Computational grids combine computers in the Internet for distributed data processing and are an attractive platform for the data-intensive applications of bioinformatics. We present an extensible genome processing software for the grid and evaluate its performance. Our software was able to discover previously unknown circular permutations (CP) in the ProDom database containing more than 70MB of protein data. A specific feature of our software is its design as a component: the Alignment HOC, a Higher-Order Component that makes use of the latest Globus toolkit as grid middleware. Besides genome data, the Alignment HOC accepts plugin code for processing this data as its input, and contains all the required configuration to run the component on top of Globus, thus, freeing the non-grid-expert user from dealing with grid middleware. Instead of writing data distribution procedures and configuring the middleware appropriately for every new algorithm, Alignment HOC users reuse the existing component and only write application-specific plugins. To maintain plugins persistently in a reusable manner, we built a web-accessible plugin database with a comfortable administration GUI. The flexible component-based implementation makes it easy to study CPs in other databases (e.g. UniProt/Swiss-Prot) or to use an alignment algorithm different than the standard Needleman-Wunsch. For the efficient distribution of workload, we developed a library of group communication operations for HOCs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Altunay, D. Colonnese, and C. Warade. High Throughput Web Services for Life Sciences. In IT Coding and Computing, pages 329-334, Washington, DC, USA, 2005. IEEE.
Google Scholar
Laurent Baduel, Francoise Baude, and Denis Caromel. Efficient, Flexible, and Typed Group Communications in Java. In Java Grande Conference, pages 28-36, Seattle, 2002. ACM Press.
Google Scholar
Bornberg-Bauer et al. Raspodom Results. http://www.uni-muenster.de/Biologie.Botanik/ebb/projects/raspodom
Janusz M. Bujnicki. Sequence Permutations in the Molecular Evolution of DNA methyl-transferases. BMC Evolutionary Biology, 2:3, 2002.
Article Google Scholar
Jan D unnweber and Catalin L. Dumitrescu et al. . The HOC-SA Globus Incubator Project. Web page: http://dev.globus.org/incubator/hoc-sa/, 2006.
Jan D unnweber, Sergei Gorlatch, Marco Aldinucci, Marco Danelutto, and Sonia Campa. Adaptable Parallel Components for Grid Programming. In Integrated Research in GRID Computing, pages 43-59. Springer Verlag, December 2006.
Google Scholar
Ian T. Foster. Globus Toolkit Version 4: Software for Service-Oriented Systems. In NPC, pages 2-13, 2005.
Google Scholar
Sergei Gorlatch and Jan D unnweber. From Grid Middleware to Grid Applications: Bridging the Gap with HOCs. In Future Generation Grids, pages 299-306. Springer Verlag, 2005.
Google Scholar
O. Gotoh. An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162:705-708, 1982.
Article Google Scholar
A. Jeltsch. Circular Permutations in the Molecular Evolution of DNA Methyltransferases. S164, 1999.
Google Scholar
Ahmed Moustafa. The JAligner Library for Biological Sequence Alignment, 2007. http://jaligner.sourceforge.net.
S. B. Needleman and C. D. Wunsch. A General Method Applicable to Search for Sim-ilarities in the Amino Acid Sequences of two Proteins. Journal of Molecular Biology, 48:443-453, 1970.
Article Google Scholar
Zemin Ning, Anthony Cox, and James Mullikin. SSAHA: A Fast Search Method for Large DNA Databases. In Genome Research 11, pages 1725-1729, 2001.
Article Google Scholar
OGSA-DAI project team. The Open Grid Service Architecture - Data Access and Integra-tion OGSA-DAI, 2007. http://www.ogsadai.org.uk.
T. Rauber, R. Reilein-Ruı, and G. R unger. ORT - A Communication Library for Orthogonal Processor Groups. In Proc. of the ACM/IEEE Supercomputing Conf. 2001 (SC’01), Denver, Colorado, USA, 2001. ACM.
Google Scholar
T. F. Smith and M. S. Waterman. Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147:195-197, 1981.
Article Google Scholar
J. 3rd Weiner, G. Thomas, and E. Bornberg-Bauer. Rapid motif-based Prediction of Circular Permutations in multi-domain Proteins. Bioinformatics, 21:932-937, 2005.
Article Google Scholar
Asim YarKhan and Jack J. Dongarra. Biological Sequence Alignment on Computational Grids using the GrADS Framework. Future Gener. Comput. Syst., 21(6):980-986, 2005.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Munster, Munster, Germany
Philipp Ludeking, Jan Dunnweber & Sergei Gorlatch

Authors

Philipp Ludeking
View author publications
You can also search for this author in PubMed Google Scholar
Jan Dunnweber
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ludeking, P., Dunnweber, J., Gorlatch, S. (2008). A Higher-Order Component for Efficient Genome Processing in the Grid. In: Making Grids Work. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-78448-9_28

Download citation

DOI: https://doi.org/10.1007/978-0-387-78448-9_28
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-78447-2
Online ISBN: 978-0-387-78448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics