Multithreading: Fundamental Limits, Potential Gains, and Alternatives

  • David E. Culler
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 281)


Multithreading as a means of tolerating latency, enabling powerful parallel languages, and exposing parallelism is critically examined in order to identify its fundamental limits and potential gains. A simple analytical model shows how the performance gain due to multithreading is related to switch cost, remote reference frequency, and outstanding message capacity. Examination of current networks shows that they support only limited multithreading, due to overhead, channel, and volumetric constraints. Compiler-controlled multithreading is proposed as an alternative to hardware multithreading to make effective use of the processor with a limited number of communication threads. The approach is illustrated by a simple parallel language, Split-C, with split-phase remote references and a novel compilation methodology, TAM, for powerful parallel languages which require dynamic scheduling of a large number of threads.


Switch Cost Fundamental Limit Remote Memory Active Message Virtual Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S. V. Adve and M. D. Hill. Weak Ordering - A New Definition. In Proc. of the 17th Annual Intl Symp. on Computer Architecture,pages 2–14, Seattle, WA, May 1990.CrossRefGoogle Scholar
  2. 2.
    R. Alverson, D. Callahan, D. Cummings, Koblenz B., A. Porterfield, and B. Smith. The Tera Computer System. In Proc. of 1990 Intl Conf. on Supercomputing, June 1990.Google Scholar
  3. 3.
    A. Argarwal. Limits on Interconnection Network Performance. IEEE Transactions on Parallel and Distributed Systems, 2(4):398–411, October 1991.Google Scholar
  4. 4.
    Arvind and D. E. Culler. Dataflow Architectures. In Annual Reviews in Computer Science, volume 1, pages 225–253. Annual Reviews Inc., Palo Alto, CA, 1986. Reprinted in Dataflow and Reduction Architectures, S. S. Thakkar, editor, IEEE Computer Society Press, 1987.Google Scholar
  5. 5.
    Arvind and K. Ekanadham. Future Scientific Programming on Parallel Machines. Journal of Parallel and Distributed Computing, 5(5):460–493, October 1988.CrossRefGoogle Scholar
  6. 6.
    Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.Google Scholar
  7. 7.
    D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991.Google Scholar
  8. 8.
    D. E. Culler. Managing Parallelism and Resources in Scientific Dataflow Programs. Technical Report 446, MIT Lab for Comp. Sci., March 1990.Google Scholar
  9. 9.
    D. E. Culler and Arvind. Resource Requirements of Dataflow Programs. In Proc. of the 15th Annual Int. Symp. on Comp. Arch., pages 141–150, Hawaii, May 1988.Google Scholar
  10. 10.
    D. E. Culler, M. Gunter, and J. C. Lee. Analysis of Multithreaded Microprocessors under Multiprogramming. Technical Report UCB/CSD 92/687, Univ. of California, Berkeley, Computer Science Division, May 1992.Google Scholar
  11. 11.
    W. Dally and et al. Architecture of a Message-Driven Processor. In Proc. of the 14th Annual Int. Symp. on Comp. Arch., pages 189–196, June 1987.Google Scholar
  12. 12.
    W. Dally and et al. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.Google Scholar
  13. 13.
    J. Darlington and M. Reeve. ALICE - a multiprocessor reduction machine for parallel evaluation of applicative languages. In Proc. of the ACM Conf. on Functional Programming Languages and Computer Architecture,pages 65–75, New Hampshire, October 1981.Google Scholar
  14. 14.
    J. B. Dennis. Data Flow Supercomputers. IEEE Computer, 13(11):48–56, November 1980.CrossRefGoogle Scholar
  15. 15.
    V. G. Grafe and J. E. Hoch. The Epsilon-2 Hybrid Dataflow Architecture. In Proc. of Compcon90, pages 88–93, March 1990.Google Scholar
  16. 16.
    A. Gupta, J. Hennessy, K. Gharachorloo, and W.-D. Weber. Comparative Evalutation of Latency Reducing and Tolerating Techniques. In Proc. of the 18th Annual Intl Symp. on Computer Architecture, pages 254–65, Jerusalem, Israel, May 1991.Google Scholar
  17. 17.
    J. Gurd, C.C. Kirkham, and I. Watson. The Manchester Prototype Dataflow Computer. Communications of the Association for Computing Machinery, 28(1):34–52, January 1985.CrossRefGoogle Scholar
  18. 18.
    R. H. Halstead, Jr. Multilisp: A Language for Concurrent Symbolic Computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, October 1985.MATHCrossRefGoogle Scholar
  19. 19.
    R. H. Halstead and T. Fujita. MASA: a Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. of the 15th Int. Symp. on Comp. Arch., pages 443–451, 1988.Google Scholar
  20. 20.
    K. Hiraki, K. Nishida, S. Sekiguchi, and T. Shimada. Maintainence Architecture and its LSI Implementation of a Dataflow Computer with a Large Number of Processors. In Proc. of the 1986 Int. Conf. on Par. Proc., pages 584–591, 1986.Google Scholar
  21. 21.
    W. Horwat, A. A. Chien, and W. J. Dally. Experience with CST: Programming and Implementation. In Proc. of the ACM SIGPLAN ‘89 Conference on Programming Language Design and Implementation, 1989.Google Scholar
  22. 22.
    R. A. Iannucci. Toward a Dataflow/von Neumann Hybrid Architecture. In Proc. 15th Int. Symp. on Comp. Arch., pages 131–140, Hawaii, May 1988.Google Scholar
  23. 23.
    H. F. Jordan. Performance Measurement on HEP ¡ª A Pipelined MIMD Computer. In Proc. of the 10th Annual Int. Symp. on Comp. Arch., Stockholm, Sweden, June 1983.Google Scholar
  24. 24.
    R. S. Nikhil. ID Language Reference Manual Version 90.1. Technical Report CSG Memo 284–2, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, 1991.Google Scholar
  25. 25.
    R. S. Nikhil. The Parallel Programming Language Id and its Compilation for Parallel Machines. In Proc. Workshop on Massive Parallelism, Amalfi, Italy, October 1989. Academic Press, 1991. Also: CSG Memo 313, MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA.Google Scholar
  26. 26.
    R. S. Nikhil and Arvind. Can Dataflow Subsume von Neumann Computing? In Proc. of the 16th Annual Int. Symp. on Comp. Arch.,Jerusalem, Israel, May 1989.Google Scholar
  27. 27.
    R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proc. of the 19th Annual Intl Symp. on Computer Architecture, pages 156–67, Gold Coast, AUS, May 1992.CrossRefGoogle Scholar
  28. 28.
    G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Symp. on Comp. Arch.,Seattle, Washington, May 1990.Google Scholar
  29. 29.
    S. L. Peyton Jones, C. Clack, J. Salkild, and M. Hardie. GRIP - a high performance architecture for parallel graph reduction. In Proc. of the ACM Conf. on Functional Programming and Computer Architecture, pages 98–112, 1987.Google Scholar
  30. 30.
    R. Saavedra-Barrerra and D. E Culler. An Analytical Solution for a Markov Chain Modeling Multithreaded Execution. Technical Report UCB/XSD 91/623, Univ. of California, Berkeley, Computer Science Division, April 1991.Google Scholar
  31. 31.
    R. Saavedra-Barrerra, D. E. Culler, and T. von Eicken. Analysis of Multithreaded Architectures for Parallel Computing. In Proceedings of the 2nd Annual Symp. on Par. Algorithms and Arch., July 1990.Google Scholar
  32. 32.
    S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pages 46–53, Jerusalem, Israel, June 1989.CrossRefGoogle Scholar
  33. 33.
    K. Schauser, D.Culler, and T. von Eicken. Compiler-controlled Multithreading for Lenient Parallel Languages. In Proceedings of the 1991 Conference on Functional Programming Languages and Computer Architecture, Cambridge, MA, August 1991.Google Scholar
  34. 34.
    B. Smith. Keynote address, 17th Annual Int. Symp. on Comp. Arch., June 1990.Google Scholar
  35. 35.
    M. R. Thistle and B. J. Smith. A Processor Architecture for Horizon. In Proc. of Supercomputing ‘88,pages 35–41, Orlando, FL, 1988.CrossRefGoogle Scholar
  36. 36.
    K. R. Traub, D. E. Culler, and K. E. Schauser. Global Analysis for Partitioning Non-Strict Programs into Sequential Threads. In Proc. of the ACM Conf. on LISP and Functional Programming,San Francisco, CA, June 1992.Google Scholar
  37. 37.
    W. Weber and A. Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proc. of the 16th Int. Symp. on Comp. Arch., pages 273–280, Jerusalem, Israel, May 1989.Google Scholar

Copyright information

© Springer Science+Business Media New York 1994

Authors and Affiliations

  • David E. Culler
    • 1
  1. 1.Computer Science Division, Department of EECSUniversity of CaliforniaBerkeleyUSA

Personalised recommendations