Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline

  • Adam Birnbaum
  • James Hayes
  • Wilfred W. Li
  • Mark A. Miller
  • Peter W. Arzberger
  • Phililp E. Bourne
  • Henri Casanova
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3370)


The goal of the Encyclopedia of Life (EOL) Project is to predict structural information for all proteins, in all organisms. This calculation presents challenges both in terms of the scale of the computational resources required (approximately 1.8 million CPU hours), as well as in data and workflow management. While tools are available that solve some subsets of these problems, it was necessary for us to build software to integrate and manage the overall Grid application execution. In this paper, we present this workflow system, detail its components, and report on the performance of our initial prototype implementation for runs over a large-scale Grid platform during the SC’03 conference.


Grid Resource Grid Service Grid Application Application Task Storage Resource Broker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramson, D., Giddy, J., Kotler, L.: High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid? In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Cancun, Mexico, pp. 520–528 (May 2000)Google Scholar
  2. 2.
    Agrawal, S., Dongarra, J., Seymour, K., Vadhiyar, S.: NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server. In: Hey, A., Berman, F., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)Google Scholar
  3. 3.
    Allcock, W., Bester, J., Bresnahan, J., Chervenak, A., Liming, L., Tuecke, S.: GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft (March 2001)Google Scholar
  4. 4.
  5. 5.
    Baru, C., Rajasekar, R.A., Wan, M.: The SDSC Storage Resource Broker. In: Proceedings of the CASCON 1998 Conference (November 1998)Google Scholar
  6. 6.
    Beaumont, O., Legrand, A., Robert, Y.: Static scheduling strategies for heterogeneous systems. Technical Report LIP RR-2002-29, École Normale Supérieure, Laboratoire d’Informatique du Parallélisme (July 2002)Google Scholar
  7. 7.
    Berman, F., Fox, G., Hey, T. (eds.): Grid Computing: Making the Global Infrastructure a Reality. Wiley Publishers, Inc., Chichester (2003)Google Scholar
  8. 8.
    Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distributed Systems (TPDS) 14(4), 369–382 (2003)CrossRefGoogle Scholar
  9. 9.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  10. 10.
    Beynon, M., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 27(11), 1457–1478 (2001)zbMATHCrossRefGoogle Scholar
  11. 11.
    Braun, T.D., Hensgen, D., Freund, R.F., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61(6), 810–837 (2001)CrossRefGoogle Scholar
  12. 12.
    Buyya, R., Murshed, M., Abramson, D.: A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas (June 2002)Google Scholar
  13. 13.
    Casanova, H., Bartol, T., Stiles, J., Berman, F.: Distributing MCell Simulations on the Grid. International Journal of High Performance Computing Applications (IJHPCA) 14(3) (2001)Google Scholar
  14. 14.
    Casanova, H., Berman, F.: Parameter Sweeps on the Grid with APST. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley Publisher, Inc., Chichester (2002)Google Scholar
  15. 15.
    Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW 2000), Cancun, Mexico, pp. 349–363 (May 2000)Google Scholar
  16. 16.
    Condor Version 6.2.2 Manual,
  17. 17.
    Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: Proceedings of the 10th IEEE Symposium on High-Performance Distributed Computing, HPDC-10 (August 2001)Google Scholar
  18. 18.
    Dail, H., Berman, D., Casanova, H.: A Decoupled Scheduling Approach for Grid Application Development Environments. Journal of Parallel and Distributed Computing 63(5), 505–524 (2003)zbMATHCrossRefGoogle Scholar
  19. 19.
  20. 20.
    EOL Homepage,
  21. 21.
    Foster, I., Kesselman, C.: Globus: A Toolkit-Based Grid Architecture. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)Google Scholar
  22. 22.
    Foster, I., Kesselman, C. (eds.): Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. M. Kaufmann Publichers, Inc., San Francisco (2003)Google Scholar
  23. 23.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications 15(3) (2001)Google Scholar
  24. 24.
  25. 25.
    Joint Center for Structural Genomics,
  26. 26.
    Kwok, Y., Ahmad, I.: Benchmarking and Comparison of Task Graph Scheduling Algorithms. Journal of Parallel and Distributed Computing 59(3), 318–422 (1999)CrossRefGoogle Scholar
  27. 27.
    Li, W.W., Byrnes, R.W., Hayes, J., Birnbaum, A., Reyes, V.M., Shabab, A., Mosley, C., Perkurowsky, D., Quinn, G., Shindyalov, I., Casanova, H., Ang, L., Berman, F., Arzberger, P.W., Miller, M., Bourne, P.E.: The Encyclopedia of Life Project: Grid Software and Deployment. New Generation Computing (2004) (in press)Google Scholar
  28. 28.
    Li, W.W., Quinn, G.B., Alexandrov, N.N., Bourne, P.E., Shindyalov, I.N.: A comparative proteomics resource: proteins of Arabidopsis thaliana. Genome Biology 4(8), R51 (2003)CrossRefGoogle Scholar
  29. 29.
    Marinescu, D.: A Grid Workflow Management Architecture. Global Grid Forum White Paper (August 2002)Google Scholar
  30. 30.
    National Center for Biotechnology Information,
  31. 31.
    Open grid service architecture,
  32. 32.
    Pinchak, C., Lu, P., Goldenberg, M.: Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 205–228. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  33. 33.
    Pacific Rim Applications and Grid Middleware Assembly,
  34. 34.
    Sievert, O., Casanova, H.: Policies for Swapping MPI Processes. In: Proceedings of the 12th IEEE Symposium on High Performance and Distributed Computing (HPDC-12), Seattle (June 2003)Google Scholar
  35. 35.
    Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests. In: Proceedings of the 11th IEEE Symposium on High Performance and Distributed Computing (HPDC-11), Edinburgh (2002)Google Scholar
  36. 36.
    Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Hey, A.J.G., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)Google Scholar
  37. 37.
    Vadhiyar, S., Dongarra, J.: A Performance Oriented Migration Framework for The Grid. In: Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Tokyo (May 2003)Google Scholar
  38. 38.
    Wolski, R., Spring, N., Hayes, J.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computer Systems 15(5-6), 757–768 (1999)CrossRefGoogle Scholar
  39. 39.
    Yarkhan, A., Dongarra, J.: Experiments with Scheduling Using Simulated Annealing in a Grid Environment. In: Proceedings of the 3rd International Workshop on Grid Computing, Baltimore (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Adam Birnbaum
    • 1
  • James Hayes
    • 1
  • Wilfred W. Li
    • 1
  • Mark A. Miller
    • 1
  • Peter W. Arzberger
    • 2
  • Phililp E. Bourne
    • 1
    • 2
  • Henri Casanova
    • 1
    • 2
  1. 1.San Diego Supercomputer CenterLa JollaUSA
  2. 2.University of California, San DiegoLa JollaUSA

Personalised recommendations