Advertisement

Journal of Grid Computing

, Volume 5, Issue 4, pp 379–405 | Cite as

Scheduling Task Parallel Applications for Rapid Turnaround on Enterprise Desktop Grids

  • Derrick Kondo
  • Andrew A. Chien
  • Henri Casanova
Article

Abstract

Desktop Grids are popular platforms for high throughput applications, but due to their inherent resource volatility it is difficult to exploit them for applications that require rapid turnaround. Efficient desktop Grid execution of short-lived applications is an attractive proposition and we claim that it is achievable via intelligent resource selection. We propose three general techniques for resource selection: resource prioritization, resource exclusion, and task duplication. We use these techniques to instantiate several scheduling heuristics. We evaluate these heuristics through trace-driven simulations of four representative desktop Grid configurations. We find that ranking desktop resources according to their clock rates, without taking into account their availability history, is surprisingly effective in practice. Our main result is that a heuristic that uses the appropriate combination of resource prioritization, resource exclusion, and task replication can achieve performance within a factor of 1.7 of optimal in practice.

Key words

Desktop Grids Network of workstations Resource management Scheduling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acharya, A., Edjlali, G., Saltz, J.: The utility of exploiting idle workstations for parallel computation. In: Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 225–234 (1997)Google Scholar
  2. 2.
    Alexandrov, A.D., Ibel, M., Schauser, K.E., Scheiman, C.: SuperWeb: Towards a global web-based parallel computing infrastructure. In: Proc. of the 11th IEEE International Parallel Processing Symposium (IPPS) (1997)Google Scholar
  3. 3.
    Arpaci, R., Dusseau, A., Vahdat, A., Liu, L., Anderson, T., Patterson, D.: The interaction of parallel and sequential workloads on a network of workstations. In: Proceedings of SIGMETRICS’95, pp 267–278 (1995)Google Scholar
  4. 4.
    Barak, A., Guday, S., W.R.: The MOSIX distributed operating system, load balancing for UNIX. In: Lecture Notes in Computer Science, vol. 672. Springer, Berlin Heidelberg New York (1993)Google Scholar
  5. 5.
    Baratloo, A., Karaul, M., Kedem, Z., Wyckoff, P.: Charlotte: Metacomputing on the web. In: Proc. of the 9th International Conference on Parallel and Distributed Computing Systems (PDCS-96) (1996)Google Scholar
  6. 6.
    Bhagwan, R., Savage, S., Voelker, G.: Understanding availability. In: Proceedings of IPTPS’03 (2003)Google Scholar
  7. 7.
    Bolosky, W., Douceur, J., Ely, D., Theimer, M.: Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In: Proceedings of SIGMETRICS (2000)Google Scholar
  8. 8.
    Braun, T., Siegel, H., Beck, N.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 810–837 (2001)CrossRefGoogle Scholar
  9. 9.
    Camiel, N., London, S., Nisan, N., Regev, O.: The PopCorn Project: Distributed computation over the internet in Java. In: Proc. of the 6th International World Wide Web Conference (1997)Google Scholar
  10. 10.
    CANCER. The Compute Against Cancer project. http://www.computeagainstcancer.org/
  11. 11.
    Cappello, P., Christiansen, B., Ionescu, M., Neary, M., Schauser, K., Wu, D.: Javelin: Internet-based parallel computing using Java. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (1997)Google Scholar
  12. 12.
    Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for scheduling parameter sweep applications in Grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW’00), pp. 349–363 (2000)Google Scholar
  13. 13.
    Chien, A., Calder, B., Elbert, S., Bhatia, K.: Entropia: architecture and performance of an enterprise desktop Grid system. J. Parallel Distrib. Comput. 63, 597–610 (2003)CrossRefGoogle Scholar
  14. 14.
    Chu, J., Labonte, K., Levine, B.: Availability and locality measurements of peer-to-peer file systems. In: Proceedings of ITCom: Scalability and Traffic Control in IP Networks (2003)Google Scholar
  15. 15.
    Dinda, P.: The statistical properties of host load. Sci. Program. 7, 3–4 (1999)Google Scholar
  16. 16.
    Dinda, P.: A prediction-based real-time scheduling advisor. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02) (2002a)Google Scholar
  17. 17.
    Dinda, P.: Online prediction of the running time of tasks. Cluster Comput. 5(3), 225–236 (2002b)CrossRefGoogle Scholar
  18. 18.
    Entropia. Entropia, Inc. http://www.entropia.com
  19. 19.
    Fedak, G., Germain, C., N’eri, V., Cappello, F.: XtremWeb: A generic global computing system. In: Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID’01) (2001)Google Scholar
  20. 20.
    FIGHTAIDS. The Fight Aids At Home project. http://www.fightaidsathome.org/
  21. 21.
    For Network Computing, T. B. O. I. http://boinc.berkeley.edu/
  22. 22.
    Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. Cluster Comput. 5(3), 237–246 (2002)CrossRefGoogle Scholar
  23. 23.
    Ghare, G., Leutenegger, L.: Improving speedup and response times by replicating parallel programs on a SNOW. In: Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (2004)Google Scholar
  24. 24.
    Ghormley, D., Petrou, D., Rodrigues, S., Vahdat, A., Anderson, T.: GLUnix: a global layer unix for a network of workstations. Softw. Pract. Exp. 28(9) (1998)Google Scholar
  25. 25.
    GIMPS. The Great Internet Mersene Prime Search (GIMPS). http://www.mersenne.org/
  26. 26.
    Hupp, S.: The “Worm” programs – early experience with distributed computation. Commun. ACM 3(25), (1982)Google Scholar
  27. 27.
    Kondo, D.: Scheduling task parallel applications on enterprise desktop Grids. Ph.D. thesis (2005)Google Scholar
  28. 28.
    Kondo, D., Casanova, H.: Computing the optimal makespan for jobs with identical and independent tasks scheduled on volatile hosts. Technical Report CS2004-0796, Dept. of Computer Science and Engineering, University of California at San Diego (2004)Google Scholar
  29. 29.
    Kondo, D., Taufer, M., Brooks, C., Casanova, H., Chien, A.: Characterizing and evaluating desktop Grids: An empirical study. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’04) (2004)Google Scholar
  30. 30.
    Kreaseck, B., Carter, L., Casanova, H., Ferrante, J.: Autonomous protocols for bandwidth-centric scheduling of independent-task applications. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’03) (2003)Google Scholar
  31. 31.
    Leutenegger, S., Sun, X.: Distributed computing feasibility in a non-dedicated homogeneous distributed system. In: Proc. of SC’93, Portland, Oregon (1993)Google Scholar
  32. 32.
    Litzkow, M., Livny, M., Mutka, M.: Condor – a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems (ICDCS) (1988)Google Scholar
  33. 33.
    Lodygensky, O., Fedak, G., Neri, V., Cappello, F., Thain, D., Livny, M.: XtremWeb and condor: Sharing resources between internet connected condor pool. In: Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID’03) Workshop on Global Computing on Personal Devices (2003)Google Scholar
  34. 34.
    Long, D., Muir, A., Golding, R.: A longitudinal survey of internet host reliability. In: 14th Symposium on Reliable Distributed Systems, pp. 2–9 (1995)Google Scholar
  35. 35.
    Mutka, M.: Considering deadline constraints when allocating the shared capacity of private workstations. Inter. J. Comput. Simul. 4(1), 41–63 (1994)Google Scholar
  36. 36.
    Mutka, M., Livny, M.: The available capacity of a privately owned workstation environment. Perform. Eval. 4(12), (1991)Google Scholar
  37. 37.
    Nabrzyski, J., Schopf, J., Weglarz, J. (eds.): Grid resource management, Chapt. 26. Kluwer (2003)Google Scholar
  38. 38.
    Oram, A. (ed.): Peer-To-Peer: harnessing the power of disruptive technologies. O’Reilly & Associates, Sebastopol, CA (2001)Google Scholar
  39. 39.
    Pedroso, J., Silva, L., Silva, J.: Web-based metacomputing with JET. In: Proc. of the ACM PPoPP Workshop on Java for Science and Engineering Computation (1997)Google Scholar
  40. 40.
    Platform. Platform Computing Inc. http://www.platform.com/
  41. 41.
    Pruyne, J., Livny, M.: A worldwide flock of condors: load sharing among workstation clusters. Future Gener. Comput. Syst. 12, (1996)Google Scholar
  42. 42.
    Sarmenta, L.: Sabotage-tolerance mechanisms for volunteer computing systems. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid (2001)Google Scholar
  43. 43.
    Sarmenta, L., Hirano, S.: Bayanihan: Building and studying web-based volunteer computing systems using Java. Future Gener. Comput. Syst. 15(5–6), 675–686 (1999)CrossRefGoogle Scholar
  44. 44.
    Saroiu, S., Gummadi, P., Gribble, S.: A measurement study of peer-to-peer file sharing systems. In: Proceedings of MMCN (2002)Google Scholar
  45. 45.
    SETI@home. The SETI@home project. http://setiathome.ssl.berkeley.edu/
  46. 46.
    Shirts, M., Pande, V.: Screen savers of the world, Unite!. Science 290, 1903–1904 (2000)CrossRefGoogle Scholar
  47. 47.
    Smallen, S., Casanova, H., Berman, F.: Tunable on-line parallel tomography. In: Proceedings of SuperComputing’01, Denver, CO (2001)Google Scholar
  48. 48.
    Sullivan, W.T., Werthimer, D., Bowyer, S., Cobb, J., Gedye, G., Anderson, D.: A new major SETI project based on Project Serendip data and 100,000 personal computers. In: Proc. of the Fifth Intl. Conf. on Bioastronomy (1997)Google Scholar
  49. 49.
    Synapse. DataSynapse Inc. http://www.datasynapse.com
  50. 50.
    Taufer, M., An, C., Kerstens, A., C.L.B. III: Predictor@Home: A “protein structure prediction supercomputer” based on public-resource computing. In: IPDPS (2005a)Google Scholar
  51. 51.
    Taufer, M., Anderson, D., Cicotti, P., C.L.B. III: Homogeneous redundancy: A technique to ensure integrity of molecular simulation results using public computing. In: IPDPS (2005b)Google Scholar
  52. 52.
    UD. United Devices Inc. http://www.ud.com/
  53. 53.
    Wolski, R., Spring, N., Hayes, J.: Predicting the CPU availability of time-shared Unix systems. In: Proceedings of 8th IEEE High Performance Distributed Computing Conference (HPDC8) (1999)Google Scholar
  54. 54.
    Wyckoff, P., Johnson, T., Jeong, K.: Finding idle periods on networks of workstations. Technical Report CS761, Dept. of Computer Science, New York University (1998)Google Scholar

Copyright information

© Springer Science + Business Media B.V. 2007

Authors and Affiliations

  • Derrick Kondo
    • 1
  • Andrew A. Chien
    • 2
  • Henri Casanova
    • 3
  1. 1.Laboratoire de Recherche en Informatique/INRIA FutursOrsayFrance
  2. 2.Department of Computer Science and EngineeringUniversity of CaliforniaSan DiegoUSA
  3. 3.Department of Information and Computer SciencesUniversity of Hawai‘iManoaHawai‘i

Personalised recommendations