The encyclopedia of life project: Grid software and deployment

Li, Wilfred W.; Byrnes, Robert W.; Hayes, Jim; Birnbaum, Adam; Reyes, Vicente M.; Shahab, Atif; Mosley, Coleman; Pekurovsky, Dmitry; Quinn, Greg B.; Shindyalov, Ilya N.; Casanova, Henri; Ang, Larry; Berman, Fran; Arzberger, Peter W.; Miller, Mark A.; Bourne, Philip E.

doi:10.1007/BF03040951

The encyclopedia of life project: Grid software and deployment

Special Feature
Published: June 2004

Volume 22, pages 127–136, (2004)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Wilfred W. Li¹,
Robert W. Byrnes¹,
Jim Hayes^1,3,
Adam Birnbaum¹,
Vicente M. Reyes¹,
Atif Shahab⁵,
Coleman Mosley⁴,
Dmitry Pekurovsky¹,
Greg B. Quinn¹,
Ilya N. Shindyalov¹,
Henri Casanova^1,3,
Larry Ang⁵,
Fran Berman^1,3,
Peter W. Arzberger¹,
Mark A. Miller¹ &
…
Philip E. Bourne^1,2

112 Accesses
15 Citations
Explore all metrics

Abstract

The ongoing global effort of genome sequencing is making large scale comparative proteomic analysis an intriguing task. The Encyclopedia of Life (EOL; http://eol.sdsc.edu) project aims to provide current functional and structural annotations for all available proteomes, a computational challenge never seen before in biology. Using an integrative genome annotation pipeline (iGAP), we have produced 3D models and functional annotations for more than 100 proteomes thus far. This process is greatly facilitated by grid compute resources, and especially by the development of grid application execution environment. AppLeS (Application-Level Scheduling) Parameter Sweep Template (APST) has been adopted by the EOL project as a mediator to grid middleware. APST has made the annotation process much more efficient, highly automated and scalable. Currently we are building a domain-specific bioinformatics workflow management system (BWMS) on top of APST, which further streamlines grid deployment of life science applications. With these developments in mind, we discuss some common problems and expectations of grid computing for high throughput proteomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

A Scalable Grid Computing Framework for Extensible Phylogenetic Profile Construction

Bioinformatics Tools to Analyze Proteome and Genome Data

References

Berman, F., Fox, G. and Hey, T., “The Grid: Past, Present, Future,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. and Hey, T. eds.), Wiley, West Sussex, 2003.
Google Scholar
Baldridge, K. and Bourne, P. E., “The New Biology and the Grid,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), pp. 907–922, Wiley, West Sussex, 2003.
Google Scholar
Li, W. W., Quinn, G. B., Alexandrov, N. N., Bourne, P. E. and Shindyalov, I. N., “A Comparative Proteomics Resource: Proteins ofArabidopsis thaliana,”Genome Biol, 4, pp. R51, 2003.
Article Google Scholar
Casanova, H. and Berman, F., “Parameter Sweeps on the Grid with APST,” inGrid Computing: Making he Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), Wiley, West Sussex, 2003.
Google Scholar
Lo Conte, L., Brenner, S. E., Hubbard, T. J., Chothia, C. and Murzin, A. G., “SCOP Database in 2002: Refinements Accommodate Structural Genomics,”Nucleic Acids Research, 30, pp. 264–267, 2002.
Article Google Scholar
Alexandrov, N. and Shindyalov, I., “PDP: Protein Domain Parser,”Bioinformatics, 19, pp. 429–430, 2003.
Article Google Scholar
Shindyalov, I. N. and Bourne, P. E., “A Database and Tools for 3-D Protein Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm,”Nucleic Acids Res, 29, pp. 228–229, 2001.
Article Google Scholar
Alexandrov, N. N. and Luethy, R., “Alignment Algorithm for Homology Modeling and Threading,”Protein Science, 7, pp. 254–258, 1998.
Article Google Scholar
Berman, H. M. et al., “The Protein Data Bank,”Acta Crystallogr D Biol Crystallogr, 58, pp. 899–907, 2002.
Article Google Scholar
Chandonia, J. M. et al., “ASTRAL Compendium Enhancements,”Nucleic Acids Research, 30, pp. 260–263, 2002.
Article Google Scholar
Berman, F., Wolski, R. and Casanova, H. et al., “Adaptive Computing on the Grid Using AppLeS,” inIEEE Transactions on Parallel and Distributed Systems (TPDS), pp. 369–382, 2003.
Berman, F., Wolski, R., Figueira, S., Schopf, J. and Shao, G., “Application Level Scheduling on Distributed Heterogenous Networks,” inProc. of Supercomputing 2000 (SC ’00), 1996.
Czajkowski, K. et al., “A Resource Management Architecture for Metacomputing Systems,” inProc. of IPPS/SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing, pp. 62–82, 1998.
Foster, I., Kesselman, C., Tedesco, J. and Tuecke, S., “GASS: A Data Movement and Access Service for Wide Area Computing Systems,” inProc. of the Sixth workshop on I/O in Parallel and Distributed Systems, May, 1999.
Allcock, W. et al., “GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft,” March, 2001.
Casanova, H., Legrand, A., Zagorodnov, D. and Berman, F., “Heuristics for Scheduling Parameter Sweep Applications in Grid environments,” inProc. of the 9th Heterogeneous Computing Workshop (HCW’00), May 2000.
Czajkowski, K., Fitzgerald, S., Foster, I. and Kesselman, C., “Grid Information Services for Distributed Resource Sharing,” inProc. of the 10th IEEE Symposium on High-Performance Distributed Computing, 2001.
Wolski, R., Spring, N. and Hayes, J., “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” inFuture Generation Computer Systems, pp. 757–768, 1999.

Download references

Author information

Authors and Affiliations

Integrative Biosciences Program San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
Wilfred W. Li, Robert W. Byrnes, Jim Hayes, Adam Birnbaum, Vicente M. Reyes, Dmitry Pekurovsky, Greg B. Quinn, Ilya N. Shindyalov, Henri Casanova, Fran Berman, Peter W. Arzberger, Mark A. Miller & Philip E. Bourne
Department of Pharmacology, University of California, San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
Philip E. Bourne
Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
Jim Hayes, Henri Casanova & Fran Berman
Bioinformatics Program, University of California, San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
Coleman Mosley
Bioinformatics Institute, 21 Heng Mui Keng Terrace I2R, Level 3, 119612, Singapore
Atif Shahab & Larry Ang

Authors

Wilfred W. Li
View author publications
You can also search for this author in PubMed Google Scholar
Robert W. Byrnes
View author publications
You can also search for this author in PubMed Google Scholar
Jim Hayes
View author publications
You can also search for this author in PubMed Google Scholar
Adam Birnbaum
View author publications
You can also search for this author in PubMed Google Scholar
Vicente M. Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Atif Shahab
View author publications
You can also search for this author in PubMed Google Scholar
Coleman Mosley
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Pekurovsky
View author publications
You can also search for this author in PubMed Google Scholar
Greg B. Quinn
View author publications
You can also search for this author in PubMed Google Scholar
Ilya N. Shindyalov
View author publications
You can also search for this author in PubMed Google Scholar
Henri Casanova
View author publications
You can also search for this author in PubMed Google Scholar
Larry Ang
View author publications
You can also search for this author in PubMed Google Scholar
Fran Berman
View author publications
You can also search for this author in PubMed Google Scholar
Peter W. Arzberger
View author publications
You can also search for this author in PubMed Google Scholar
Mark A. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Philip E. Bourne
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Henri Casanova, Ph.D.: He is an adjunct Professor of Computer Science and Engineering at the University of California, San Diego (UCSD), a Research Scientist at the San Diego Supercomputer Center, and the founder and director of the Grid Research and Development Laboratory (GRAIL) at UCSD. His research interests are in the area of parallel, distributed, Grid and Internet computing. He obtained his B.S. from the Ecole Nationale Supérieure d’Electronique, d’Electrotechnique, d’Informatique et d’Hydraulique de Toulouse, France in 1993, his M.S. from the Université Paul Sabatier, Toulouse, France in 1994, and his Ph.D. from the University of Tennessee, Knoxville in 1998.

Francine Berman, Ph.D.: She is a Professor and High Performance Computing Endowed Chair at U.C. San Diego, Director of the San Diego Supercomputer Center and a Fellow of the ACM. Her research over two decades has focused on High Performance and Grid Computing, in particular in the areas of programming environments, adaptive middleware, scheduling and performance prediction. She has served on numerous editorial boards, steering committees, and program and conference committees in the areas of Parallel and Grid computing. She is one of the Principal Investigators of the NSF-supported TeraGrid, and directs NSF’s National Partnership for Advanced Computing Infrastructure (NPACI).

Peter Arzberger, Ph.D.: He is the Director of Life Sciences Initiatives, University of California San Diego, Director of the National Biomedical Computation Resource (http://nbcr.ucsd.edu), funded by the National Center of Research Resource of NIH and the Chair of the Pacific Rim Application and Grid Middleware Assembly (http://www.pragma-grid.edu), an organization of 20 institutions around the pacific rim whose mission is to establish sustained collaborations and to advance the use of grid technologies in applications. He serves on the US National CODATA Committee and the National Advisory Board of the US Long Term Ecological Research. His hobby is working on Lloyds.

Mark A. Miller, Ph.D.: He is Program Coordinator for the Integrative BioSciences Program at San Diego Supercomputer Center. He received his Ph.D. in Biochemistry from Purdue University in 1984. His research interests have slowly moved towards computer driven analyses and quantitative biology, and culminated in managing the BioInformatics Core of the Joint Center for Structural Biology where he helped to plan and implement the informatics solutions for high throughput crystallography. He is currently working on the specification, design and deployment of tools to enable biology research.

Philip Bourne, Ph.D.: He is a Professor of Pharmacology at the University of California, San Diego and co-director of the Protein Data Bank (PDB). He is immediate past President of the International Society for Computational Biology, an Associate Editor of Bioinformatics and on the Editorial Board of several other journals. He received his B.Sc. and Ph.D. in chemistry at the Flinders University, South Australia. His research interests include bioinformatics, particularly structural bioinformatics. This implies algorithms, metalanguages, biological databases, biological query languages and visualization with special interest in cell signaling and apoptosis. Major projects ongoing in the Bourne Lab include the PDB, Encyclopedia of Life (EOL), Systematic Protein Annotation and Modeling (SPAM), and the Tree of Life. Bourne’s personal interests include fishing, tennis, squash, walking, skiing, sports cars, motor bikes and writing.

About this article

Cite this article

Li, W.W., Byrnes, R.W., Hayes, J. et al. The encyclopedia of life project: Grid software and deployment. New Gener Comput 22, 127–136 (2004). https://doi.org/10.1007/BF03040951

Download citation

Received: 15 June 2003
Revised: 07 November 2003
Issue Date: June 2004
DOI: https://doi.org/10.1007/BF03040951

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The encyclopedia of life project: Grid software and deployment

Abstract

Access this article

Similar content being viewed by others

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

A Scalable Grid Computing Framework for Extensible Phylogenetic Profile Construction

Bioinformatics Tools to Analyze Proteome and Genome Data

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

The encyclopedia of life project: Grid software and deployment

Abstract

Access this article

Similar content being viewed by others

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

A Scalable Grid Computing Framework for Extensible Phylogenetic Profile Construction

Bioinformatics Tools to Analyze Proteome and Genome Data

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation