, Volume 1, Issue 1, pp 53-62

Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. Such problems, involving loosely coupled jobs and large data-sets, are found in fields like high-energy physics, astronomy and bioinformatics. A variety of factors need to be considered for effective scheduling of resources in such environments: e.g., resource utilization, response time, global and local allocation policies and scalability. We propose a general and extensible scheduling architecture that addresses these issues. Within this architecture we develop a suite of job scheduling and data replication algorithms that we evaluate using simulations for a wide range of parameters. Our results show that it is important to evaluate the combined effectiveness of replication and scheduling strategies, rather than study them separately. More specifically, we find that scheduling jobs to locations that contain the data they need and asynchronously replicating popular data-sets to remote sites, works rather well.