Towards quality of service for parallel computing: An overview of the MILAN project

  • Holger Karl
Workshop: Distributed Computing and Metacomputing
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1593)

Abstract

Parallel computing is faced with many practical difficulties, e.g. the gap between simple, high-level programming models and complex, real execution environments like a cluster of workstations, and the unpredictability of program execution. The MILAN project addresses these problems and aims at increased Quality of Service (QoS) for parallel programs. This paper presents an overview of the current research results of MILAN. for clusters of workstations, the Calypso system provides a simple programming environment that leverages theoretical results on fault-tolerant execution of parallel programs. A resource management system implements the necessary resource contracts for QoS, particularly for parallel applications. The concepts of Calypso have been applied to Web-based computing with the Charlotte system.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenopoel. TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18–28, February 1996.Google Scholar
  2. 2.
    Y. Aumann, Z. Kedem, K. Palem, and M. Rabin. Highly Efficient Asynchronous Execution of Large-grained Parallel Programs. In Proc. 34th IEEE Ann. Symp. on the Foundations of Computer Science, pages 271–280, 1993.Google Scholar
  3. 3.
    A. Baratloo, P. Dagupta, and Z. M. Kedem. CALYPSO: A Novel Software System for Fault-Tolerant Parallel Processing on Distributed Platforms. In Proc. 4th IEEE Intl. Symp. on High-Performance Distributed Computing, pages 122–129, Washington, D.C., August 1995.Google Scholar
  4. 4.
    A. Baratloo, A. Itzkovitz, Z. Kedem, and Y. Zhao. Just-in-time Transparent Resource Management in Distributed Systems. Technical Report 1998-762, Courante Institute of Mathematical Sciences, New York University, March 1998. http://www.cs.nyu.edu/milan/publications/tr1998-762.ps.gz. Google Scholar
  5. 5.
    A. Baratloo, M. Karaul, H. Karl, and Z. Kedem. Knitting Factory: An Infrastructure for Distributed Web Applications. Concurrency: Practice and Experience, 10(11–13):1029–1041, 1998.CrossRefGoogle Scholar
  6. 6.
    A. Baratloo, M. Karaul, Z. Kedem, and P. Wyckoff. Charlotte: Metacomputing on the Web. In Proc. 9th Intl. Conf. on Parallel and Distributed Computing Systems, pages 181–188, Dijon, France, September 1996.Google Scholar
  7. 7.
    B. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In Proc. of COMPCON 93, pages 528–537, 1993.Google Scholar
  8. 8.
    T. Brecht, H. Sandhu, M. Shan, and J. Talbot. ParaWeb: Towards World-Wide Supercomputing. In 7th ACM SIGOPS European Workshop, pages 181–188, Connemara, Ireland, September 1996. http://cs.yorku.ca/~brecht/papers/html/paraweb/paraweb.html. Google Scholar
  9. 9.
    P. Cappello, B. Christiansen, M. F. Ionescu, M. O. Neary, K. E. Schauser, and D. Wu. Javelin: Internet-Based Parallel Computing Using Java. Concurrency: Practice and Experience, 9(11):1139–1160, November 1997.CrossRefGoogle Scholar
  10. 10.
    J. B. Carter, J. K. Bennet, and W Zwaenepoel. Implementation and Performance of Munin. In Proc. 13th ACM Symp. on Operating System Principles, pages 152–164, October 1991.Google Scholar
  11. 11.
    K. M. Chandy and C. Kesselman. CC++: A Declarative Concurrent, Object Oriented Programming Notation. Technical Report CS-92-01, California Institute of Technology, 1992.Google Scholar
  12. 12.
    P. Dasgupta, Z. M. Kedem, and M. O. Rabin. Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach. In Proc. 15th Intl. Conference on Distributed Computing Systems, pages 467–474, 1995.Google Scholar
  13. 13.
    S.-C. Huang and Z. M. Kedem. Supporting a Flexible Parallel Programming Model on a Network of Workstations. In Proc. 16th IEEE Intl. Conf. on Distributed Computing Systems, pages 75–82. IEEE, 1996.Google Scholar
  14. 14.
    J. Kamada, M. Yuharo, and E. Ono. User-level Realtime Scheduler Exploiting Kernel-level Fixed Priority Scheduler. In Intl. Symposium on Multimedia Systems, Yokohama, Japan, March 1996.Google Scholar
  15. 15.
    A. Kanevsky, A. Skjellum, and A. Rounbehler. MPI/RT—An Emerging standard for High-Performance Real-time Systems. In Proc. HICSS '98, January 1988. http://www.mpirt.org./documents/hicss31_paper.pdf. Google Scholar
  16. 16.
    H. Karl. Bridging the Gap between Distributed Shared Memory and Message Passing. Concurency: Practice and Experience, 10(10–13):887–900, 1998.CrossRefGoogle Scholar
  17. 17.
    H. Karl. A Prototype for Controlled Gang-Scheduling. Technical Report Informatik Bericht 112, Institut für Informatik, Humboldt-Universität, Berlin, Germany, August 1998.Google Scholar
  18. 18.
    H. Karl and M. Werner. An Optimal Checkpointing Interval for Real-Time Systems. In H. R. Arabnia, editor, Proc. of Intl. Conf. Parallel and Distributed Processing Techniques and Applications, pages 604–612, Las Vegas, NV, July 1997.Google Scholar
  19. 19.
    M. Malek. Responsive Systems (The Challenge for the Nineties). Microprocessing & Microprogramming, 30:9–16, 1990.CrossRefGoogle Scholar
  20. 20.
    A. Polze. How to Partition a Workstation. In Proc. Eigth IASTED/ISMM Intl. Conf. on Parallel and Distributed Computing and Systems, pages 184–187, Chicago, IL, October 1996.Google Scholar
  21. 21.
    S. Sardesi. CHIME: A Versatile Distributed Parallel Processing System. PhD thesis, Arizona State University, Tempe, AZ, May 1997.Google Scholar
  22. 22.
    S. Sardesi, D. McLaughlin, and P. Dasgupta. Distributed Cactus Stacks: Runtime Stack-Sharing Support for Distributed Parallel Programs. In H. R. Arabnia, editor, Proc. Intl. Conf. Parallel and Distributed Processing Techniques and Applications, pages 57–65, Las Vegas, NV, July 1997.Google Scholar
  23. 23.
    L. G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103–111, August 1990.CrossRefGoogle Scholar
  24. 24.
    M. Werner and H. Karl. Towards a Definition of Responsiveness. Technical Report Informatik-Berichte Nr. 91, Institut für Informatik, Humboldt-Universität, Berlin, 1997.Google Scholar

Copyright information

© Springer-Verlag 1999

Authors and Affiliations

  • Holger Karl
    • 1
  1. 1.Institut für InformatikHumboldt-University of BerlinBerlinGermany

Personalised recommendations