Advertisement

Journal of Grid Computing

, Volume 5, Issue 1, pp 43–64 | Cite as

Data Intensive and Network Aware (DIANA) Grid Scheduling

  • Richard McClatcheyEmail author
  • Ashiq Anjum
  • Heinz Stockinger
  • Arshad Ali
  • Ian Willers
  • Michael Thomas
Article

Abstract

In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computations and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles.

Key words

meta scheduling network awareness peer-to-peer architectures data intensive scheduling algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, S., Smith, W., Tuecke, S.: A resource management architecture for metacomputing systems. In: 4th Workshop on Job Scheduling Strategies for Parallel Processing. Orlando, FL, 1998 (March 30)Google Scholar
  2. 2.
    Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid. In: International Conference on High Performance Computing in Asia–Pacific Region (HPC Asia 2000). Beijing, China, 2000 (IEEE Computer Society)Google Scholar
  3. 3.
    Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds): Grid Resource Management. Kluwer, Boston, MA (2003)(Fall)Google Scholar
  4. 4.
    Andretto, P., Borgia, S., Dorigo, A., Gianelle, A., Mordacchini, M., et al.: Practical approaches to Grid workload & resource management in the EGEE Project. In: CHEP 2004, Interlaken, Switzerland, 2005Google Scholar
  5. 5.
  6. 6.
    Huedo, E., Montero, R.S., Llorente, I.M.: A framework for adaptive execution on Grids. Softw. Prac. Exp. 34, 631–651 (2004)CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Basney, J., Livny, M., Mazzanti, P.: Utilizing widely distributed computational resources efficiently with execution domains. Comput. Phys. Commun. 140, 246–252 (2001)zbMATHCrossRefGoogle Scholar
  9. 9.
    Brooke, J., Fellows, D., MacLaren, J.: Resource brokering: the EUROGRID/GRIP Approach. In: UK e-Science All Hands Meeting, Nottingham, UK, 2004 (31 Aug.–3 Sep)Google Scholar
  10. 10.
  11. 11.
    Legrand, I.: MonaLIsa – Monitoring agents using a large integrated service architecture. In: International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Tsukuba, Japan, 2003 (December)Google Scholar
  12. 12.
    Cottrell, L., Matthews, W.: Measuring the digital divide with PingER, In: Second round Table on Developing Countries Access to Scientific Knowledge, Trieste, Italy, 2003 (Oct.)Google Scholar
  13. 13.
    Thomas, M., et al.: JClarens: A Java framework for developing and deploying web services for Grid computing. In: ICWS 2005, FL USA, 2005Google Scholar
  14. 14.
    Rang Nathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, 2002 (July)Google Scholar
  15. 15.
  16. 16.
    Fruhwirth, R., Regler, M., Bock, R. K., Grote, H., Notz, D.: Data analysis techniques for high-energy physics. Cambridge University Press, Cambridge, MA (ISBN: 0521635489, p121)Google Scholar
  17. 17.
    Holtman, K.: HEPGRID2001: A model of a virtual data Grid application. In: Proc. of HPCN Europe 2001, Amsterdam, LNCS 2110, p. 711–720. Springer, Berlin Heidelberg New York (2001)Google Scholar
  18. 18.
    Mathis, Semke, Mahdavi, Ott: The macroscopic behaviour of the TCP congestion avoidance algorithm. Comput. Commun. Rev. 27(3), 62–82 (1997) (July)CrossRefGoogle Scholar
  19. 19.
    Jin, H., Shi, X., et al.: An adaptive Meta-Scheduler for data-intensive applications. International Journal of Grid and Utility Computing 1(1), 32–37 (2005)CrossRefGoogle Scholar
  20. 20.
    Park, S., Kim, J.: Chameleon: a resource scheduler in a data Grid environment. In:Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, 2003Google Scholar
  21. 21.
    Stockinger, H., Donno, F., Eulisse, G., Mazzucato, M., Steenberg, C.: Matchmaking, Datasets and Physics Analysis, Workshop on Web and Grid Services for Scientific Data Analysis (WAGSSDA). IEEE Computer Society, Olso, Norway (2005) (June 14)Google Scholar
  22. 22.
  23. 23.
    Steenberg, C., et al.: The Clarens Grid-enabled Web Services Framework: services and implementation, CHEP 2004 Interlaken Switzerland.Google Scholar
  24. 24.
    Les Cottrell, R., Ansari, S., Khandpur, P., Gupta, R., Hughes-Jones, R., Chen, M., McIntosh, L., & Leers, F.: Characterization and evaluation of TCP and UDP-based transport on real networks. In: Protocols for Fast Long-Distance networks, Lyon, France, 2005 (Feb.).(SLAC-PUB-10996)Google Scholar
  25. 25.
    Matthews, W., & Cottrell, L.: Achieving high data throughput in research networks. In: CHEP 2001, China, 2001Google Scholar
  26. 26.
    Andronico, G., Ardizzone, V., Barbera, R., Catania, R., Carrieri, A., Falzone, A., Giorgio, E., La Rocca, G., Monforte, S., Pappalardo, M., Passaro, G., Platania, G.: GILDA: The Grid INFN Virtual Laboratory for Dissemination Activities. In: International Conference on Testbeds and Research Infrastructures for the Development of networks and Communities (Tridentcom 2005), Trento, Italy, 2005 (pp. 304–305)Google Scholar
  27. 27.
    Andreetto, P., et.al.: Practical approaches to Grid workload & resource management in the EGEE Project. In:CHEP 2004, Interlaken, Switzerland, 2004Google Scholar
  28. 28.
    Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid resource management systems for distributed computing. Softw. Pract. Exp. 32(2), 135–164 (2002) (February)zbMATHCrossRefGoogle Scholar
  29. 29.
    Bell, W., Cameron, D., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: Design of a replica optimisation framework. Technical report, DataGrid-02-TED-021215, Geneva, Switzerland, 2002 (December)Google Scholar
  30. 30.
    Cameron, D., Casey, J., Guy, L., Kunszt, P., Lemaitre, S., McCance, G., Stockinger, H., Stockinger, K., et al.: Replica management in the EU DataGrid Project. International Journal of Grid Computing, 2(4), 341–351 (2004)CrossRefGoogle Scholar
  31. 31.
    Stockinger, K., Stockinger, H. et al.: Access cost estimation for unified Grid storage systems. In: 4th International Workshop on Grid Computing (Grid2003), Phoenix, Arizona, 2003. IEEE Computer Society, Los Alamitos, CA (2003)(November 17)Google Scholar
  32. 32.
    Basney, J., Livny, M., Mazzanti, P.: Utilizing widely distributed computational resources efficiently with execution domains. Comput. Phys. Commun. (2000)Google Scholar
  33. 33.
  34. 34.
    Lauret, et al.: The STAR Unified Meta-Scheduler project, a front end around evolving technologies for user analysis and data production. In: CHEP2004, Interlaken Switzerland, 2004Google Scholar
  35. 35.
    Barras, T., et al.: The CMS PhEDEx system: a novel approach to robust Grid data management, UK All Hands Meeting, Nottingham, UK, 2005Google Scholar
  36. 36.
    Kosar, T., Livny, M.: A framework for reliable and efficient data placement in distributed computing systems, To appear in J. Parallel Distrib. Comput. 65(10), 1146–1157 (2005)Google Scholar
  37. 37.
    Thain, D., et al.: Gathering at the well: creating communities for Grid I/O. In: Supercomputing 2001, Denver, CO, 2001 (November)Google Scholar
  38. 38.
    Zhao, Y., Hu, Y.: GRESS – a Grid Replica Selection Service. In: ISCA 16th International Conference on Parallel and Distributed Computing Systems (PDCS-2003), Reno, Nevada, 2003Google Scholar
  39. 39.
    Tan, C., Mills, K.: Performance characterization of decentralized algorithms for replica selection in distributed object systems. In: International Workshop on Software and Performance, Palma de Mallorca, Spain, 2005 (July 12–14)Google Scholar
  40. 40.
    Nabrzyski, J.: Knowledge-based scheduling method for Globus. In: Globus Retreat, Redondo Beach, CA, 1999Google Scholar
  41. 41.
    Chervenak, A., et al.: Giggle: a framework for constructing scalable replica location services. In: Supercomputing 2002, Baltimore, MD, 2002 (Nov.16–22)Google Scholar
  42. 42.
    Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: An architecture of a resource management and scheduling system in a global computational Grid. In:HPC Asia 2000, Beijing, China, 2000 (May 14–17)Google Scholar
  43. 43.
    Uk, J., et al.: SPHINX: A scheduling middleware for aata intensive applications on a Grid. In: CHEP 2004, Interlaken, Switzerland, 2004Google Scholar
  44. 44.
    Application Level Scheduling (AppLeS) http://apples.ucsd.edu/.
  45. 45.
    Daily, H., et al.: A decoupled scheduling approach for the GrADS Program Development Environment. In: Supercomputing 2002, Baltimore, MD, 2002 (November 16–22)Google Scholar
  46. 46.
    Cottrell, R.L., Logg, C.: A new high performance network and application monitoring infrastructure. Technical report SLAC-PUB-9202, SLAC (2002)Google Scholar
  47. 47.
    Mathis, M., Allman, M.: A framework for defining empirical bulk transfer capacity. RFC 3148, USA (2001) (July)Google Scholar
  48. 48.
    Anjum, A., McClatchey, R., Stockinger, H., Ali, A., Willers, I., Thomas, M., Sagheer, M., Hasham, K., Alvi, O.: DIANA Scheduling Hierarchies for Optimizing Grid Bulk Job Scheduling. Accepted by 2nd IEEE Int. Conference on e-Science and Grid Computing (e-Science 2006), IEEE Computer Society Press, Amsterdam, The Netherlands, Dec. 2006.Google Scholar
  49. 49.
    Anjum, A., McClatchey, R., Ali, A., Willers, I.: Bulk Scheduling with the DIANA Scheduler. IEEE Transactions on Nuclear Science, 53(6), December 2006Google Scholar

Copyright information

© Springer Science + Business Media B.V. 2007

Authors and Affiliations

  • Richard McClatchey
    • 1
    Email author
  • Ashiq Anjum
    • 1
    • 3
  • Heinz Stockinger
    • 2
  • Arshad Ali
    • 3
  • Ian Willers
    • 4
  • Michael Thomas
    • 5
  1. 1.CCS Research CentreUniversity of the West of EnglandBristolUK
  2. 2.Swiss Institute of BioinformaticsLausanneSwitzerland
  3. 3.National University of Sciences and TechnologyRawalpindiPakistan
  4. 4.CERN, European Organization for Nuclear ResearchGenevaSwitzerland
  5. 5.California Institute of TechnologyPasadenaUSA

Personalised recommendations