Skip to main content
Log in

The encyclopedia of life project: Grid software and deployment

  • Special Feature
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

The ongoing global effort of genome sequencing is making large scale comparative proteomic analysis an intriguing task. The Encyclopedia of Life (EOL; http://eol.sdsc.edu) project aims to provide current functional and structural annotations for all available proteomes, a computational challenge never seen before in biology. Using an integrative genome annotation pipeline (iGAP), we have produced 3D models and functional annotations for more than 100 proteomes thus far. This process is greatly facilitated by grid compute resources, and especially by the development of grid application execution environment. AppLeS (Application-Level Scheduling) Parameter Sweep Template (APST) has been adopted by the EOL project as a mediator to grid middleware. APST has made the annotation process much more efficient, highly automated and scalable. Currently we are building a domain-specific bioinformatics workflow management system (BWMS) on top of APST, which further streamlines grid deployment of life science applications. With these developments in mind, we discuss some common problems and expectations of grid computing for high throughput proteomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berman, F., Fox, G. and Hey, T., “The Grid: Past, Present, Future,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. and Hey, T. eds.), Wiley, West Sussex, 2003.

    Google Scholar 

  2. Baldridge, K. and Bourne, P. E., “The New Biology and the Grid,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), pp. 907–922, Wiley, West Sussex, 2003.

    Google Scholar 

  3. Li, W. W., Quinn, G. B., Alexandrov, N. N., Bourne, P. E. and Shindyalov, I. N., “A Comparative Proteomics Resource: Proteins ofArabidopsis thaliana,”Genome Biol, 4, pp. R51, 2003.

    Article  Google Scholar 

  4. Casanova, H. and Berman, F., “Parameter Sweeps on the Grid with APST,” inGrid Computing: Making he Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), Wiley, West Sussex, 2003.

    Google Scholar 

  5. Lo Conte, L., Brenner, S. E., Hubbard, T. J., Chothia, C. and Murzin, A. G., “SCOP Database in 2002: Refinements Accommodate Structural Genomics,”Nucleic Acids Research, 30, pp. 264–267, 2002.

    Article  Google Scholar 

  6. Alexandrov, N. and Shindyalov, I., “PDP: Protein Domain Parser,”Bioinformatics, 19, pp. 429–430, 2003.

    Article  Google Scholar 

  7. Shindyalov, I. N. and Bourne, P. E., “A Database and Tools for 3-D Protein Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm,”Nucleic Acids Res, 29, pp. 228–229, 2001.

    Article  Google Scholar 

  8. Alexandrov, N. N. and Luethy, R., “Alignment Algorithm for Homology Modeling and Threading,”Protein Science, 7, pp. 254–258, 1998.

    Article  Google Scholar 

  9. Berman, H. M. et al., “The Protein Data Bank,”Acta Crystallogr D Biol Crystallogr, 58, pp. 899–907, 2002.

    Article  Google Scholar 

  10. Chandonia, J. M. et al., “ASTRAL Compendium Enhancements,”Nucleic Acids Research, 30, pp. 260–263, 2002.

    Article  Google Scholar 

  11. Berman, F., Wolski, R. and Casanova, H. et al., “Adaptive Computing on the Grid Using AppLeS,” inIEEE Transactions on Parallel and Distributed Systems (TPDS), pp. 369–382, 2003.

  12. Berman, F., Wolski, R., Figueira, S., Schopf, J. and Shao, G., “Application Level Scheduling on Distributed Heterogenous Networks,” inProc. of Supercomputing 2000 (SC ’00), 1996.

  13. Czajkowski, K. et al., “A Resource Management Architecture for Metacomputing Systems,” inProc. of IPPS/SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing, pp. 62–82, 1998.

  14. Foster, I., Kesselman, C., Tedesco, J. and Tuecke, S., “GASS: A Data Movement and Access Service for Wide Area Computing Systems,” inProc. of the Sixth workshop on I/O in Parallel and Distributed Systems, May, 1999.

  15. Allcock, W. et al., “GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft,” March, 2001.

  16. Casanova, H., Legrand, A., Zagorodnov, D. and Berman, F., “Heuristics for Scheduling Parameter Sweep Applications in Grid environments,” inProc. of the 9th Heterogeneous Computing Workshop (HCW’00), May 2000.

  17. Czajkowski, K., Fitzgerald, S., Foster, I. and Kesselman, C., “Grid Information Services for Distributed Resource Sharing,” inProc. of the 10th IEEE Symposium on High-Performance Distributed Computing, 2001.

  18. Wolski, R., Spring, N. and Hayes, J., “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” inFuture Generation Computer Systems, pp. 757–768, 1999.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Henri Casanova, Ph.D.: He is an adjunct Professor of Computer Science and Engineering at the University of California, San Diego (UCSD), a Research Scientist at the San Diego Supercomputer Center, and the founder and director of the Grid Research and Development Laboratory (GRAIL) at UCSD. His research interests are in the area of parallel, distributed, Grid and Internet computing. He obtained his B.S. from the Ecole Nationale Supérieure d’Electronique, d’Electrotechnique, d’Informatique et d’Hydraulique de Toulouse, France in 1993, his M.S. from the Université Paul Sabatier, Toulouse, France in 1994, and his Ph.D. from the University of Tennessee, Knoxville in 1998.

Francine Berman, Ph.D.: She is a Professor and High Performance Computing Endowed Chair at U.C. San Diego, Director of the San Diego Supercomputer Center and a Fellow of the ACM. Her research over two decades has focused on High Performance and Grid Computing, in particular in the areas of programming environments, adaptive middleware, scheduling and performance prediction. She has served on numerous editorial boards, steering committees, and program and conference committees in the areas of Parallel and Grid computing. She is one of the Principal Investigators of the NSF-supported TeraGrid, and directs NSF’s National Partnership for Advanced Computing Infrastructure (NPACI).

Peter Arzberger, Ph.D.: He is the Director of Life Sciences Initiatives, University of California San Diego, Director of the National Biomedical Computation Resource (http://nbcr.ucsd.edu), funded by the National Center of Research Resource of NIH and the Chair of the Pacific Rim Application and Grid Middleware Assembly (http://www.pragma-grid.edu), an organization of 20 institutions around the pacific rim whose mission is to establish sustained collaborations and to advance the use of grid technologies in applications. He serves on the US National CODATA Committee and the National Advisory Board of the US Long Term Ecological Research. His hobby is working on Lloyds.

Mark A. Miller, Ph.D.: He is Program Coordinator for the Integrative BioSciences Program at San Diego Supercomputer Center. He received his Ph.D. in Biochemistry from Purdue University in 1984. His research interests have slowly moved towards computer driven analyses and quantitative biology, and culminated in managing the BioInformatics Core of the Joint Center for Structural Biology where he helped to plan and implement the informatics solutions for high throughput crystallography. He is currently working on the specification, design and deployment of tools to enable biology research.

Philip Bourne, Ph.D.: He is a Professor of Pharmacology at the University of California, San Diego and co-director of the Protein Data Bank (PDB). He is immediate past President of the International Society for Computational Biology, an Associate Editor of Bioinformatics and on the Editorial Board of several other journals. He received his B.Sc. and Ph.D. in chemistry at the Flinders University, South Australia. His research interests include bioinformatics, particularly structural bioinformatics. This implies algorithms, metalanguages, biological databases, biological query languages and visualization with special interest in cell signaling and apoptosis. Major projects ongoing in the Bourne Lab include the PDB, Encyclopedia of Life (EOL), Systematic Protein Annotation and Modeling (SPAM), and the Tree of Life. Bourne’s personal interests include fishing, tennis, squash, walking, skiing, sports cars, motor bikes and writing.

About this article

Cite this article

Li, W.W., Byrnes, R.W., Hayes, J. et al. The encyclopedia of life project: Grid software and deployment. New Gener Comput 22, 127–136 (2004). https://doi.org/10.1007/BF03040951

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03040951

Keywords

Navigation