Advertisement

Two fundamental issues in multiprocessing

  • Arvind
  • Robert A. Iannucci
Part II - Parallel Computer Architectures
Part of the Lecture Notes in Computer Science book series (LNCS, volume 295)

Abstract

A general purpose multiprocessor should be scalable, i.e. show higher performance when more hardware resources are added to the machine. Architects of such multiprocessors must address the loss in processor efficiency due to two fundamental issues: long memory latencies and waits due to synchronization events. It is argued that a well designed processor can overcome these losses provided there is sufficient parallelism in the program being executed. The detrimental effect of long latency can be reduced by instruction pipelining, however, the restriction of a single thread of computation in von Neumann processors severely limits their ability to have more than a few instructions in the pipeline. Furthermore, techniques to reduce the memory latency tend to increase the cost of task switching. The cost of synchronization events in von Neumann machines makes decomposing a program into very small tasks counter-productive. Dataflow machines, on the other hand, treat each instruction as a task, and by paying a small synchronization cost for each instruction executed, offer the ultimate flexibility in scheduling instructions to reduce processor idle time.

Key words and phrases

caches cache coherence dataflow architectures hazard resolution instruction pipelining LOAD/STORE architectures memory latency multiprocessors multi-thread architectures semaphores synchronization von Neumann architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. [1]
    Arvind and R.E. Bryant Design Considerations for a Partial Equation Machine. Proceedings of Scientific Computer Information Exchange Meeting, Lawrence Livermore Laboratory, Livermore, CA, September, 1979, pp. 94–102.Google Scholar
  2. [2]
    Arvind and D.E. Culler “Dataflow Architectures”. Annual Reviews of Computer Science 1 (1986), 225–253.Google Scholar
  3. [3]
    Arvind and D.E. Culler, R.A. Iannucci, V. Kathail, K. Pingali, and R.E. Thomas The Tagged Token Dataflow Architecture. Internal Report. (including architectural revisions of October, 1983).Google Scholar
  4. [4]
    Arvind and K.P. Gostelow “The U-Interpreter”. Computer 15, 2 (February 1982), 42–49Google Scholar
  5. [5]
    Arvind and R.A. Iannucci Instruction Set Definition for a Tagged-Token Data FLow Machine. Computation Structures Group Memo 212-3, Laboratory for Computer Science, MIT, Cambridge, Mass., Cambridge, MA 02139, December, 1981.Google Scholar
  6. [6]
    Arvind and R.S. Nikhil Executing a Program on the MIT Tagged-Token Data FLow Architecture. Proc. PARLE, (Parallel Architectures and Languages Europe), Eindhoven, The Netherlands, June, 1987.Google Scholar
  7. [7]
    Block, E. The Engineering Design of the STRETCH Computer. Proceedings of the EJCC, 1959, pp. 48–59.Google Scholar
  8. [8]
    Buehrer, R. and K. Ekanadham Dataflow Principles in Multi-processor Systems. ETH Zurich, and Research Division, Yorktown Heights, IBM Corporation, July, 1986.Google Scholar
  9. [9]
    Burks, A., H.H. Goldstine, and J. von Neumann “Preliminary Discussion of the Logical Design of an Electronic Instrument, Part 2”. Datamation 8, 10 (October 1962), 36–41Google Scholar
  10. [10]
    Censier, L.M. and P. Feautrier “A New Solution to the Coherence Problems in Multicache Systems”. IEEE Transactions on Computers C-27, 12 (December 1979), 1112–1118.Google Scholar
  11. [11]
    Clack, C. and Peyton-Jones, S.L. The Four-Stroke Reduction Engine. Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, Association for Computing Machinery, August, 1986, pp. 220–232.Google Scholar
  12. [12]
    Crowley, W.P., C.P. Hendrickson and T.E. Rudy The SIMPLE Code. Internal Report UCID-17715, Lawrence Livermore Laboratory, Livermore, CA, February, 1978.Google Scholar
  13. [13]
    Darlington, J. and M Reeve ALICE: A Multi-Processor Reduction Machine for the Parallel Evaluation of Applicative Languages. Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, Portsmouth, NH, 1981, pp. 65–76.Google Scholar
  14. [14]
    Dennis, J.B. Lecture Notes in Computer Science. Volume 19: First Version of a Data Flow Procedure Language. In In Programming Symposium: Proceedings, Colloque sur la Programmation, B. Robinet, Ed., Springer-Verlag, 1974, pp. 362–376.Google Scholar
  15. [15]
    Dennis, J.B. “Data Flow Supercomputers”. Computer 13, 11 (November 1980), 48–56.Google Scholar
  16. [16]
    Eckert, J.P., J.C. Chu, A.B. Tonik & W.F. Schmitt Design of UNIVAC-LARC System: 1. Proceedings of the EJCC, 1959, pp. 59–65.Google Scholar
  17. [17]
    Edler, J., A. Gottlieb, C.P. Kruskal, K.P. McAuliffe, L. Rudolph, M. Snir, P.J. Teller & J. Wilson Issues Related to MIMD Shared-Memory Computers: The NYU Ultracomputer Approach. Proceedings of the 12th Annual International Symposium On Computer Architecture, Boston, June, 1985, pp. 126–135.Google Scholar
  18. [18]
    Ellis, J.R. Culldog: a Compiler for VLIW Architectures. The MIT Press, 1986.Google Scholar
  19. [19]
    Fisher, J.A. Very Long Instruction Word Architectures and the ELI-512. Proc. of the 10th, Internation Symposium on Computer Architecture, IEEE Computer Society, June, 1983.Google Scholar
  20. [20]
    Gajski, D.D. & J-K. Peir “Essential Issues in Multiprocessor Systems”. Computer 18, 6 (June 1985), 9–27.Google Scholar
  21. [21]
    Gurd, J.R., C.C. Kirkham, and I. Watson “The Manchester Prototype Dataflow Computer”. Communications of ACM 28, 1 (January 1985), 34–52.Google Scholar
  22. [22]
    Hennessey, J.L. “VLSI Processor Architecture”. IEEE Transactions on Computers C-33, 12 (December 1984), 1221–1246.Google Scholar
  23. [23]
    Hiraki, K., S. Sekiguchi, and T. Shimada System Architecture of a Dataflow Supercomputer. Computer Systems Division, Electrotechnical Laboratory, Japan, 1987.Google Scholar
  24. [24]
    Iannucci, R.A. A Dataflow I von Neuamnn Hybrid Architecture. Ph.D.Th.Dept. of Electrical Engineering and Computer Science, MIT, Cambridge, Mass., (in preparation) 1987.Google Scholar
  25. [25]
    Jordan, H.F. Performance Measurement on HEP-A Pipelined MIMD Computer. Proceedings of the 10th Annual International Symposium On Computer Architecture, Stockholm, Sweden, June, 1983, pp. 207–212.Google Scholar
  26. [26]
    Kuck, D.E. Davidson, D. Lawrie, and A. Sameh “Parallel Supercomputing Today and the Cedar Approach”. Science Magazine 231 (February 1986), 967–974.Google Scholar
  27. [27]
    Lampson, B.W. and K.A. Pier A Processor for a High-Performance Personal Computer. Xerox Palo Alto Research Center, January, 1981.Google Scholar
  28. [28]
    Li, Z. and W. Abu-Sufah A Technique for Reducing Synchronization Overhead in Large Scale Multiprocessors. Proc. of the 12th, International Symposium on Computer Architecture, June, 1985, pp. 284–291.Google Scholar
  29. [29]
    Moon, D.A. Architecture of the Symbolics 3600. Proceedings of the 12th Annual International Symposium On Computer Architecture, Boston, June, 1985, pp. 76–83.Google Scholar
  30. [30]
    Nikhil, R.S., K. Pingali, and Arvind Id Nouveau. Computation Structures Group Memo 265, Laboratory for Computer Science, MIT, Cambridge, Mass., Cambridge, MA 02139, July, 1986.Google Scholar
  31. [31]
    Papadopoulos, G.M. Implementation of a General Purpose Dataflow Multiprocessor. Ph.D.Th., Dept. of Electrical Engineering and Computer Science, MIT, Cambridge, Mass., (in preparation) 1987.Google Scholar
  32. [32]
    Paterson, D.A. “Reduced Instruction Set Computers”. Communications of ACM 28, 1 (January 1985), 8–21.Google Scholar
  33. [33]
    Pfister, G.F., W.C. Brantley, D.A. George, S.L. Harvey, W.J. Kleinfelder, K.P. McAuliffe, E.A. Melton, V.A. Norton, and J. Weiss The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture. Proceedings of the 1985 International Conference on Parallel Processing, Institute of Electrical and Electronics Engineers, Piscataway, N.J., 08854, August, 1985, pp. 764–771.Google Scholar
  34. [34]
    Radin, G. The 801 Minicomputer. Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, ACM, March, 1982.Google Scholar
  35. [35]
    Rau, B., D. Glaeser, and E. Greenwalt Architectural Support for the Efficient Generation of Code for Horizontal Architectures. Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, March, 1982. Same as Computer Architecture News 10,2 and SIGPLAN Notices 17,4.Google Scholar
  36. [36]
    Rettberg, R., C. Wyman, D. Hunt, M. Hoffmann, P. Carvey, B. Hyde, W. Clark, and M. Kraley Development of a Voice Funnel System: Design Report. 4098, Bolt Beranek and Newman Inc., August, 1979.Google Scholar
  37. [37]
    Russell, R.M. “The CRAY-1 Computer System”. Communications of ACM 21, 1 (January 1978), 63–72.Google Scholar
  38. [38]
    Seitz, C.M. “The Cosmic Cube”. Communications of ACM 21, 1 (January 1985), 22–33.Google Scholar
  39. [39]
    Smith, B.J. A Pipelined, Shared Resource MIMD Computer. Proceedings of the 1978 International Conference on Parallel Proceeding, 1978, pp. 6–8.Google Scholar
  40. [40]
    Thomton, J.E. Parallel Operations in the Control Data 6600. Proceedings of the SJCC, 1964, pp. 33–39.Google Scholar
  41. [41]
    Traub, K.R. A Compiler for the MIT Tagged-Token Dataflow Architecture — S.M. Thesis. Technical Report 370, Laboratory for Computer Science, MIT, Cambridge, Mass., Cambridge, MA 02139, AUGUST, 1986.Google Scholar
  42. [42]
    ALTO: A Personal Computer System — Hardware Manual Xerox Palo Alto Research Center, Palo Alto, California, 94304, 1979.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Arvind
    • 1
  • Robert A. Iannucci
    • 1
  1. 1.Laboratory for Computer ScienceMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations