Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. Such problems, involving loosely coupled jobs and large data-sets, are found in fields like high-energy physics, astronomy and bioinformatics. A variety of factors need to be considered for effective scheduling of resources in such environments: e.g., resource utilization, response time, global and local allocation policies and scalability. We propose a general and extensible scheduling architecture that addresses these issues. Within this architecture we develop a suite of job scheduling and data replication algorithms that we evaluate using simulations for a wide range of parameters. Our results show that it is important to evaluate the combined effectiveness of replication and scheduling strategies, rather than study them separately. More specifically, we find that scheduling jobs to locations that contain the data they need and asynchronously replicating popular data-sets to remote sites, works rather well.
- A.H. Alhusaini, V.K. Prasanna and C.S. Raghavendra, “A Unified Resource Scheduling Framework for Heterogeneous Computing Environments”, in Eighth Heterogeneous Computing Workshop, 1999.
- P. Avery and I. Foster, “The GriPhyN Project: Towards Petascale Virtual Data Grids”, Technical report GriPhyN, 2001.
- P. Avery, I. Foster, R. Gardner, H. Newman and A. Szalay, “An International Virtual-Data Grid Laboratory for Data Intensive Science”, Technical report iVDGL, 2001.
- R. Bagrodia et al., “Parsec: A Parallel Simulation Environment for Complex Systems”, Computer, Vol. 31, No. 10, pp. 77–85, 1998.
- J. Basney, M. Livny and P. Mazzanti, “Harnessing the Capacity of Computational Grids for High Energy Physics”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP 2000), 2000.
- W.H. Bell, D.G. Cameron et al., “Simulation of Dynamic Grid Replication Strategies in OptorSim”, in Proceedings of the Third Int'l Workshop on Grid Computing, 2002.
- F. Berman, The Grid, Blueprint for a New Computing Infrastructure, Chapter 12. Morgan Kaufmann Publishers, Inc., 1998.
- F. Berman, R. Wolski, S. Figuera, J. Schopf and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks”, in Proceedings of SuperComputing'96, 1996.
- A. Bestavros and C. Cunha, “Server-Initiated Document Dessimination for the WWW”, IEEE Data Engineering Bulletin, Vol. 19, pp. 3–11, 1996.
- H. Casanova, G. Obertelli, F. Berman and R. Wolski, “The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid”, in Proceedings of SuperComputing'00, 2000.
- K. Czajkowski, S. Fitzzgerald, I. Foster and C. Kesselman, “Grid Information Services for Distributed Resource Sharing”, in Proceedings of 10 th IEEE International Symposium on High Performance Distributed Computing (HPDC-10), 2001.
- I. Foster and C. Kesselman (eds.), The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.
- I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International J. Supercomputing Applications, Vol. 15, No. 3, 2001.
- J. Gwertzman and M. Seltzer, “The Case for Geographical Push Caching”, in Proceedings of the Workshop on Hot Topics in Operating Systems, 1995.
- V. Hamscher, U. Schwiegelshohn, A. Streit and R. Yahyapour, “Evaluation of Job-Scheduling Strategies for Grid Computing”, in Proceedings of the Seventh International Conference of High Performance Computing, 2000.
- K. Holtman, “CMS Requirements for the Grid”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP2001), 2001.
- K. Holtman, “HEPGRID2001: AModel of a Virtual Data Grid Application”, in Proceedings of HPCN Europe 2001, 2001.
- H.A. James, K.A. Hawick and P.D. Coddington, “Scheduling Independent Tasks on Metacomputing Systems”, in Proceedings of Conference on Parallel and Distributed Computing Systems, 1999.
- M. Maheswaran, S. Ali, H.J. Siegel and D. Hensgen, “Dynamic Matching and Scheduling of a Class of Independent Tasks Onto Heterogeneous Computing Systems”, in Proceedings of 8 th Heterogeneous Computing Workshop, 1999.
- M. Rabinovich and A. Aggarwal, “RaDaR: A Scalable Architecture for a Global Web Hosting Servic”, in Proceedings of the Eighth International World Wide Web Workshop, 1999.
- K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High Performance Data Grid”, in Proceedings of the Second International Workshop on Grid Computing, 2001.
- K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications”, in Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing HPDC-11, 2002.
- B.A. Shirazi, A.R. Husson and K.M. Kavi (eds.), Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press, 1995.
- V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, “Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests”, in Proceedings of the 11 th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), 2002.
- D. Thain, J. Bent, A. Arpaci-Dusseau, R. Arpaci-Dusseau and M. Livny, “Gathering at the Well: Creating Communities for Grid I/O”, in Proceedings of SuperComputing 2001, 2001.
- Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids
Journal of Grid Computing
Volume 1, Issue 1 , pp 53-62
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- data replication
- distributed computing
- grid computing
- Industry Sectors