Advertisement

New Generation Computing

, Volume 22, Issue 2, pp 127–136 | Cite as

The encyclopedia of life project: Grid software and deployment

  • Wilfred W. Li
  • Robert W. Byrnes
  • Jim Hayes
  • Adam Birnbaum
  • Vicente M. Reyes
  • Atif Shahab
  • Coleman Mosley
  • Dmitry Pekurovsky
  • Greg B. Quinn
  • Ilya N. Shindyalov
  • Henri Casanova
  • Larry Ang
  • Fran Berman
  • Peter W. Arzberger
  • Mark A. Miller
  • Philip E. Bourne
Special Feature

Abstract

The ongoing global effort of genome sequencing is making large scale comparative proteomic analysis an intriguing task. The Encyclopedia of Life (EOL; http://eol.sdsc.edu) project aims to provide current functional and structural annotations for all available proteomes, a computational challenge never seen before in biology. Using an integrative genome annotation pipeline (iGAP), we have produced 3D models and functional annotations for more than 100 proteomes thus far. This process is greatly facilitated by grid compute resources, and especially by the development of grid application execution environment. AppLeS (Application-Level Scheduling) Parameter Sweep Template (APST) has been adopted by the EOL project as a mediator to grid middleware. APST has made the annotation process much more efficient, highly automated and scalable. Currently we are building a domain-specific bioinformatics workflow management system (BWMS) on top of APST, which further streamlines grid deployment of life science applications. With these developments in mind, we discuss some common problems and expectations of grid computing for high throughput proteomics.

Keywords

Biology on the Grid Integrative Genome Annotation Pipeline Encyclopedia of Life AppLes Parameter Sweep Template Bioinformatics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1).
    Berman, F., Fox, G. and Hey, T., “The Grid: Past, Present, Future,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. and Hey, T. eds.), Wiley, West Sussex, 2003.Google Scholar
  2. 2).
    Baldridge, K. and Bourne, P. E., “The New Biology and the Grid,” inGrid Computing: Making the Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), pp. 907–922, Wiley, West Sussex, 2003.Google Scholar
  3. 3).
    Li, W. W., Quinn, G. B., Alexandrov, N. N., Bourne, P. E. and Shindyalov, I. N., “A Comparative Proteomics Resource: Proteins ofArabidopsis thaliana,”Genome Biol, 4, pp. R51, 2003.CrossRefGoogle Scholar
  4. 4).
    Casanova, H. and Berman, F., “Parameter Sweeps on the Grid with APST,” inGrid Computing: Making he Global Infrastructure a Reality (Berman, F., Fox, G. C. and Hey, A. J. G. eds.), Wiley, West Sussex, 2003.Google Scholar
  5. 5).
    Lo Conte, L., Brenner, S. E., Hubbard, T. J., Chothia, C. and Murzin, A. G., “SCOP Database in 2002: Refinements Accommodate Structural Genomics,”Nucleic Acids Research, 30, pp. 264–267, 2002.CrossRefGoogle Scholar
  6. 6).
    Alexandrov, N. and Shindyalov, I., “PDP: Protein Domain Parser,”Bioinformatics, 19, pp. 429–430, 2003.CrossRefGoogle Scholar
  7. 7).
    Shindyalov, I. N. and Bourne, P. E., “A Database and Tools for 3-D Protein Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm,”Nucleic Acids Res, 29, pp. 228–229, 2001.CrossRefGoogle Scholar
  8. 8).
    Alexandrov, N. N. and Luethy, R., “Alignment Algorithm for Homology Modeling and Threading,”Protein Science, 7, pp. 254–258, 1998.CrossRefGoogle Scholar
  9. 9).
    Berman, H. M. et al., “The Protein Data Bank,”Acta Crystallogr D Biol Crystallogr, 58, pp. 899–907, 2002.CrossRefGoogle Scholar
  10. 10).
    Chandonia, J. M. et al., “ASTRAL Compendium Enhancements,”Nucleic Acids Research, 30, pp. 260–263, 2002.CrossRefGoogle Scholar
  11. 11).
    Berman, F., Wolski, R. and Casanova, H. et al., “Adaptive Computing on the Grid Using AppLeS,” inIEEE Transactions on Parallel and Distributed Systems (TPDS), pp. 369–382, 2003.Google Scholar
  12. 12).
    Berman, F., Wolski, R., Figueira, S., Schopf, J. and Shao, G., “Application Level Scheduling on Distributed Heterogenous Networks,” inProc. of Supercomputing 2000 (SC ’00), 1996.Google Scholar
  13. 13).
    Czajkowski, K. et al., “A Resource Management Architecture for Metacomputing Systems,” inProc. of IPPS/SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing, pp. 62–82, 1998.Google Scholar
  14. 14).
    Foster, I., Kesselman, C., Tedesco, J. and Tuecke, S., “GASS: A Data Movement and Access Service for Wide Area Computing Systems,” inProc. of the Sixth workshop on I/O in Parallel and Distributed Systems, May, 1999.Google Scholar
  15. 15).
    Allcock, W. et al., “GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft,” March, 2001.Google Scholar
  16. 16).
    Casanova, H., Legrand, A., Zagorodnov, D. and Berman, F., “Heuristics for Scheduling Parameter Sweep Applications in Grid environments,” inProc. of the 9th Heterogeneous Computing Workshop (HCW’00), May 2000.Google Scholar
  17. 17).
    Czajkowski, K., Fitzgerald, S., Foster, I. and Kesselman, C., “Grid Information Services for Distributed Resource Sharing,” inProc. of the 10th IEEE Symposium on High-Performance Distributed Computing, 2001.Google Scholar
  18. 18).
    Wolski, R., Spring, N. and Hayes, J., “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” inFuture Generation Computer Systems, pp. 757–768, 1999.Google Scholar

Copyright information

© Ohmsha, Ltd. and Springer 2004

Authors and Affiliations

  • Wilfred W. Li
    • 1
  • Robert W. Byrnes
    • 1
  • Jim Hayes
    • 1
    • 3
  • Adam Birnbaum
    • 1
  • Vicente M. Reyes
    • 1
  • Atif Shahab
    • 5
  • Coleman Mosley
    • 4
  • Dmitry Pekurovsky
    • 1
  • Greg B. Quinn
    • 1
  • Ilya N. Shindyalov
    • 1
  • Henri Casanova
    • 1
    • 3
  • Larry Ang
    • 5
  • Fran Berman
    • 1
    • 3
  • Peter W. Arzberger
    • 1
  • Mark A. Miller
    • 1
  • Philip E. Bourne
    • 1
    • 2
  1. 1.Integrative Biosciences Program San Diego Supercomputer CenterUniversity of California, San DiegoLa JollaUSA
  2. 2.Department of PharmacologyUniversity of California, San DiegoLa JollaUSA
  3. 3.Department of Computer Science and EngineeringUniversity of California, San DiegoLa JollaUSA
  4. 4.Bioinformatics ProgramUniversity of California, San DiegoLa JollaUSA
  5. 5.Bioinformatics InstituteSingapore

Personalised recommendations