Journal of Grid Computing

, Volume 1, Issue 1, pp 53–62 | Cite as

Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids

  • Kavitha Ranganathan
  • Ian Foster
Article

Abstract

Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. Such problems, involving loosely coupled jobs and large data-sets, are found in fields like high-energy physics, astronomy and bioinformatics. A variety of factors need to be considered for effective scheduling of resources in such environments: e.g., resource utilization, response time, global and local allocation policies and scalability. We propose a general and extensible scheduling architecture that addresses these issues. Within this architecture we develop a suite of job scheduling and data replication algorithms that we evaluate using simulations for a wide range of parameters. Our results show that it is important to evaluate the combined effectiveness of replication and scheduling strategies, rather than study them separately. More specifically, we find that scheduling jobs to locations that contain the data they need and asynchronously replicating popular data-sets to remote sites, works rather well.

data replication distributed computing grid computing scheduling simulation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A.H. Alhusaini, V.K. Prasanna and C.S. Raghavendra, “A Unified Resource Scheduling Framework for Heterogeneous Computing Environments”, in Eighth Heterogeneous Computing Workshop, 1999.Google Scholar
  2. 2.
    P. Avery and I. Foster, “The GriPhyN Project: Towards Petascale Virtual Data Grids”, Technical report GriPhyN, 2001.Google Scholar
  3. 3.
    P. Avery, I. Foster, R. Gardner, H. Newman and A. Szalay, “An International Virtual-Data Grid Laboratory for Data Intensive Science”, Technical report iVDGL, 2001.Google Scholar
  4. 4.
    R. Bagrodia et al., “Parsec: A Parallel Simulation Environment for Complex Systems”, Computer, Vol. 31, No. 10, pp. 77–85, 1998.Google Scholar
  5. 5.
    J. Basney, M. Livny and P. Mazzanti, “Harnessing the Capacity of Computational Grids for High Energy Physics”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP 2000), 2000.Google Scholar
  6. 6.
    W.H. Bell, D.G. Cameron et al., “Simulation of Dynamic Grid Replication Strategies in OptorSim”, in Proceedings of the Third Int'l Workshop on Grid Computing, 2002.Google Scholar
  7. 7.
    F. Berman, The Grid, Blueprint for a New Computing Infrastructure, Chapter 12. Morgan Kaufmann Publishers, Inc., 1998.Google Scholar
  8. 8.
    F. Berman, R. Wolski, S. Figuera, J. Schopf and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks”, in Proceedings of SuperComputing'96, 1996.Google Scholar
  9. 9.
    A. Bestavros and C. Cunha, “Server-Initiated Document Dessimination for the WWW”, IEEE Data Engineering Bulletin, Vol. 19, pp. 3–11, 1996.Google Scholar
  10. 10.
    H. Casanova, G. Obertelli, F. Berman and R. Wolski, “The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid”, in Proceedings of SuperComputing'00, 2000.Google Scholar
  11. 11.
    K. Czajkowski, S. Fitzzgerald, I. Foster and C. Kesselman, “Grid Information Services for Distributed Resource Sharing”, in Proceedings of 10 th IEEE International Symposium on High Performance Distributed Computing (HPDC-10), 2001.Google Scholar
  12. 12.
    I. Foster and C. Kesselman (eds.), The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.Google Scholar
  13. 13.
    I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International J. Supercomputing Applications, Vol. 15, No. 3, 2001.Google Scholar
  14. 14.
    J. Gwertzman and M. Seltzer, “The Case for Geographical Push Caching”, in Proceedings of the Workshop on Hot Topics in Operating Systems, 1995.Google Scholar
  15. 15.
    V. Hamscher, U. Schwiegelshohn, A. Streit and R. Yahyapour, “Evaluation of Job-Scheduling Strategies for Grid Computing”, in Proceedings of the Seventh International Conference of High Performance Computing, 2000.Google Scholar
  16. 16.
    K. Holtman, “CMS Requirements for the Grid”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP2001), 2001.Google Scholar
  17. 17.
    K. Holtman, “HEPGRID2001: AModel of a Virtual Data Grid Application”, in Proceedings of HPCN Europe 2001, 2001.Google Scholar
  18. 18.
    H.A. James, K.A. Hawick and P.D. Coddington, “Scheduling Independent Tasks on Metacomputing Systems”, in Proceedings of Conference on Parallel and Distributed Computing Systems, 1999.Google Scholar
  19. 19.
    M. Maheswaran, S. Ali, H.J. Siegel and D. Hensgen, “Dynamic Matching and Scheduling of a Class of Independent Tasks Onto Heterogeneous Computing Systems”, in Proceedings of 8 th Heterogeneous Computing Workshop, 1999.Google Scholar
  20. 20.
    M. Rabinovich and A. Aggarwal, “RaDaR: A Scalable Architecture for a Global Web Hosting Servic”, in Proceedings of the Eighth International World Wide Web Workshop, 1999.Google Scholar
  21. 21.
    K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High Performance Data Grid”, in Proceedings of the Second International Workshop on Grid Computing, 2001.Google Scholar
  22. 22.
    K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications”, in Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing HPDC-11, 2002.Google Scholar
  23. 23.
    B.A. Shirazi, A.R. Husson and K.M. Kavi (eds.), Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press, 1995.Google Scholar
  24. 24.
    V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, “Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests”, in Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), 2002.Google Scholar
  25. 25.
    D. Thain, J. Bent, A. Arpaci-Dusseau, R. Arpaci-Dusseau and M. Livny, “Gathering at the Well: Creating Communities for Grid I/O”, in Proceedings of SuperComputing 2001, 2001.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Kavitha Ranganathan
    • 1
  • Ian Foster
    • 2
  1. 1.Department of Computer ScienceUniversity of ChicagoChicagoUSA
  2. 2.Math. and Computer Science DivisionArgonne National LaboratoryArgonneUSA

Personalised recommendations