Design principles of operating systems for large scale multicomputers

  • Amnon Barak
  • Yoram Kornatzky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 309)


Future multicomputer systems are expected to consist of thousands of interconnected computers. To simplify the usage of these systems, multicomputer operating systems must be developed to integrate a cluster of computers into a unified and coherent environment. Using existing multicomputer operating systems is inappropriate as many commonly used techniques get clogged and lead to congestion, once the system is enlarged over a certain size. This paper deals with the various issues involved with designing an operating system for a large scale multicomputer. We identify the difficulties of using existing operating systems in large multicomputer configurations. Then, based on insight gained in the design of several algorithms, we present eight principles which should serve as guidelines for the designer of such systems. These principles include symmetry, customer-server protocols, and partiality. Another component of our approach is the use of randomness in the system's control. We present probabilistic algorithms for information scattering and load estimation. Tolerating node failures, and garbage collection due to node failures, are part of a distributed operating system routine operations. We present a robust algorithm for locating processes, and an efficient algorithm for garbage collection in a large scale system, which are in line with our principles.


Hash Function Single Machine Garbage Collection Probabilistic Algorithm Colored Node 


  1. [1]
    Barak, A. and Drezner, Z., ”Distributed Algorithm for the Average Load of a Multicomputer,” Computing Reasearch Laboratory, CTL-TR-17-84, The University of Michigan, Ann Arbor, Michigan, March 1984.Google Scholar
  2. [2]
    Barak, A. and Litman, A., ” MOS: A Multicomputer Distributed Operating Systems,” Software Practice & Experience, 15, 8, 725–737, Aug. 1985.Google Scholar
  3. [3]
    Barak, A. and Shiloh, A., ”A Distributed Load Balancing Policy for a Multicomputer”, Software Practice & Experience, 15, 9, 901–913, Sept. 1985.Google Scholar
  4. [4]
    Birrell, A.D., Levin, R., Needham, R.M. and Schroeder, M.D., ”Grapevine: An Exercise in Distributed Computing,” CACM, 25, 4, 260–274, April 1982.Google Scholar
  5. [5]
    Cheriton, D.R. and Zwaenepoel, W., ”The Distributed V Kernel and Its Performance for Diskless Workstations,” Proc. of the Ninth Symp. on Operating System Principles, 17, 5, 129–140, Oct. 1983.Google Scholar
  6. [6]
    Cohen, J., ”Garbage Collection of Linked Data Structures”, ACM Computing Surveys, 13, 3, 341–367, Sep. 1981.Google Scholar
  7. [7]
    DeWitt, D.J., Finkel, R. and Solomon, M., ”The Crystal Multicomputer: Design and Implementation Experience”, IEEE Trans. on Software Engineering, SE-13, 8, 953–967, Aug. 1987.Google Scholar
  8. [8]
    Drezner, Z. and Barak, A., ”A Asynchronous Algorithm for Scattering Information Between the Active Nodes of a Multicomputer System,” J. of Parallel and Distributed Computing, 3, 3, 344–351, Sep. 1986.Google Scholar
  9. [9]
    Eager, D.L., Lazowska, E.D. and Zahorjan, J., ”Adaptive Load Sharing in Homogeneous Distributed Systems”, IEEE Trans. on Software Engineering, SE-12, 5, 662–675, May 1986.Google Scholar
  10. [10]
    Hillis, D., ”The Connection Machine”, MIT Press, Cambridge, MA, 1985.Google Scholar
  11. [11]
    Hochschild, P., Denneau, M. and Shichman, G., personal communication, 1987.Google Scholar
  12. [12]
    Krueger, P. and Finkel, R.A., ”An Adaptive Load Balancing Algorithm for a Multicomputer”, Technical Report 539, Department of Computer Science, University of Wisconsin-Madison, Madison, Wisconsin, April 1984.Google Scholar
  13. [13]
    Needham, R.M. and Herbert, A.J., ”The Cambridge Distributed Computing System”, Addison-Wesley Pub. Co., 1982.Google Scholar
  14. [14]
    Schroeder M.D. Birrell A.D. and Needham R.M., ”Experience with Grapevine: The Growth of a Distributed System”, ACM Trans. on Computer Systems, 2, 1, 3–23, Feb. 1984.Google Scholar
  15. [15]
    Sheltzer, A. and Popek, G., ”Internet Locus: Extending Transparency to an Internet Environment” IEEE Trans. on Software Engineering, SE-12, 11, 1067–1075, Nov. 1986.Google Scholar
  16. [16]
    Walker, B., Popek, G., English, R., Kline, C. and Thiel, G., ”The Locus Distributed Operating System”, Proc. the Ninth Symp. on Operating Systems Principles, 17, 5, 49–70, Oct. 1983.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Amnon Barak
    • 1
  • Yoram Kornatzky
    • 1
  1. 1.Department of Computer ScienceThe Hebrew University of JerusalemJerusalemIsrael

Personalised recommendations