New Worker-Centric Scheduling Strategies for Data-Intensive Grid Applications

  • Steven Y. Ko
  • Ramsés Morales
  • Indranil Gupta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4834)


Distributed computations, dealing with large amounts of data, are scheduled in Grid clusters today using either a task-centric mechanism, or a worker-centric mechanism. Because of the large data sets, the execution time is bounded by the cost of data transfer. In this paper, we introduce new worker-centric scheduling strategies that are novel in that they aim to implicitly exploit the locality of interest in order to reduce the cost of data transfer. Many Grid applications are characterized by such a locality of interest, i.e., a file is often accessed by multiple tasks and, more importantly, a set of files that are accessed by one task are also likely to be accessed together by other tasks. Our new deterministic, as well as probabilistic, scheduling algorithms implicitly exploit this feature to improve running time. Our experiments are done with traces of a real Grid application (Coadd), and show that our algorithms are able to achieve utilization of over 90%, while reducing makespan significantly compared to task-centric approaches.


worker-centric scheduling task-centric scheduling data-intensive applications Grid environments 


  1. 1.
    Allcock, W.E., Bester, J., Bresnahan, J., Chervenak, A.L., Foster, I.T., Kesselman, C., Meder, S., Nefedova, V., Quesnel, D., Tuecke, S.: Secure, efficient data transport and replica management for high-performance data-intensive computing. CoRR cs.DC/0103022 (2001)Google Scholar
  2. 2.
    Meyer, L., Annis, J., Mattoso, M., Wilde, M., Foster, I.: Planning Spatial Workflows to Optimize Grid Performance. Technical Report, GriPhyN 2005-10 (2005)Google Scholar
  3. 3.
  4. 4.
    Santos-Neto, E., Cirne, W., Brasileiro, F.V., Lima, A.: Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids. In: Proc. of JSSPP (2004)Google Scholar
  5. 5.
    Ranganathan, K., Foster, I.T.: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In: Proc. of HPDC-11 (2002)Google Scholar
  6. 6.
    Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. In: Proc. of SC (2000)Google Scholar
  7. 7.
    Iamnitchi, A., Doraimani, S., Garzoglio, G.: Filecules in High-Energy Physics: Characteristics and Impact on Resource Management. In: Proc. of HPDC-15 (2006)Google Scholar
  8. 8.
    Viswanathan, S., Veeravalli, B., Yu, D., Robertazzi, T.G.: Design and Analysis of a Dynamic Scheduling Strategy with Resource Estimation for Large-Scale Grid Systems. In: Proc. of GRID (2004)Google Scholar
  9. 9.
    Rosenberg, A.L., Yurkewych, M.: Guidelines for scheduling some common computation-dags for internet-based computing. IEEE Transactions on Computers 54(4) (April 2005)Google Scholar
  10. 10.
    Foster, I.T., et al.: The Grid2003 Production Grid: Principles and Practice. In: Proc. of HPDC-13 (2004)Google Scholar
  11. 11.
    de Silva, D.P., Cirne, W., Brasileiro, F.V.: Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In: Proc. of Euro-Par 2003 (2003)Google Scholar
  12. 12.
    Pinedo, M.: Scheduling: Theory, Algorithms and Systems, 2nd edn. Prentice Hall, New Jersey, USA (2001)Google Scholar
  13. 13.
    Cirne, W., Brasileiro, F., Sauv, J., Andrade, N., Paranhos, D., Santos-Neto, E., Medeiros, R.: Grid Computing for Bag of Tasks Applications. In: Proc. Third IFIP I3E (September 2003)Google Scholar
  14. 14.
    Legrand, A., Marchal, L., Casanova, H.: Scheduling Distributed Applications: the SimGrid Simulation Framework. In: Proc. of CCGrid (2003)Google Scholar
  15. 15.
    Doar, M.B.: A Better Model for Generating Test Networks. In: Proc. of Globecom. (1996)Google Scholar
  16. 16.
    Top 500 list,
  17. 17.
    Casanova, H., Zagorodnov, D., Berman, F., Legrand, A.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: 9th Heterogeneous Computing Workshop (2000)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2007

Authors and Affiliations

  • Steven Y. Ko
    • 1
  • Ramsés Morales
    • 1
  • Indranil Gupta
    • 1
  1. 1.Department of Computer Science, University of Illinois, Urbana-Champaign, Urbana, IL 61801USA

Personalised recommendations