Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded Architecture

  • Andrew Sohn
  • Yuetsu Kodama
  • Jui-Yuan Ku
  • Mitsuhisa Sato
  • Yoshinori Yamaguchi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1808)


Communication latency is a key parameter which affects the performance of distributed-memory multiprocessors. Instruction-level multithreading attempts to tolerate latency by overlapping communication with computation. This chapter explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for dynamic function spawning and instruction-level multithreading. The supports include a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets based on one-sided communication. Two problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. A large number of threads of over eight is found inefficient and has adversely affected the overall performance. FFT yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even at the absence of thread computation parallelism, multithreading helps overlap over 35% of the communication time for bitonic sorting.


Fast Fourier Transform Switching Cost Communication Time Direct Memory Access Remote Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Accelerated Strategic Computing Initiative (ASCI), Lawrence Livermore, Los Alamos, and Sandia National Laboratories,
  2. 2.
    A. Agarwal, R. Bianchini, D. Chaiken, K. L. Johnson, D. Kranz, J. Kubiatowicz, B-H. Lim, K. Mackenzie, and D. Yeung, The MIT Alewife Machine: Architecture and Performances, in Proc. the International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995, pp.2–13.Google Scholar
  3. 3.
    T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D.M. Dias, and M. Snir, SP-2 System Architecture, IBM Systems Journal Vol. 34, No. 2, 1995.Google Scholar
  4. 4.
    R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, The Tera computer system, In Proc. of ACM International Conference on Supercomputing, Amsterdam, Netherlands, June 1990, ACM, pp.1–6.Google Scholar
  5. 5.
    K. Batcher, Sorting Networks and Their Applications, in Proc. the AFIPS Spring Joint Computer Conference 32, Reston, VA, 1968, pp.307–314.Google Scholar
  6. 6.
    D. Culler, S. Goldstein, K. Schauser, and T. von Eicken, TAM-A Compiler Controlled Threaded Abstract Machine, Journal of Parallel and Distributed Computing 18, pp.347–370, 1993.CrossRefGoogle Scholar
  7. 7.
    D. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, in Proc. of the Fourth ACM Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.Google Scholar
  8. 8.
    G. Gao, L. Bic and J-L. Gaudiot (Eds.) Advanced Topic in Dataflow Computing and Multithreading, IEEE Computer society press, 1995.Google Scholar
  9. 9.
    R. Iannucci, G. Gao, R. Halstead, and B. Smith (Eds.), Multithreaded Computer Architecture, Kluwer Publishers, Norwell, MA 1994.Google Scholar
  10. 10.
    Y. Kodama, Y. Koumura, M. Sato, H. Sakane, S. Sakai, and Y. Yamaguchi, EMC-Y: Parallel Processing Element Optimizing Communication and Computation, in Proc. of ACM International Conference on Supercomputing, Tokyo, Japan, July 1993, pp.167–174.Google Scholar
  11. 11.
    Y. Kodama, H. Sakane, M. Sato, H. Yamana, S. Sakai, and Y. Yamaguchi, The EM-X Parallel Computer: Architecture and Basic Performance, in Proc. of ACM International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995, pp.14–23.Google Scholar
  12. 12.
    H. Matsuoka, K. Okamoto, H. Hirono, M. Sato, T. Yokota, S. Sakai, Pipeline design and enhancement for fast network message handling in RWC-1 multiprocessor, in Proc. of the Workshop on Multithreaded Execution, Architecture and Compilation, Las Vegas, Nevada, February 1998.Google Scholar
  13. 13.
    R. Nikhil, G. Papadopolous, and Arvind, *T: A Multithreaded Massively Parallel Architecture, in Proc. of ACM International Symposium on Computer Architecture, Gold Coast, Australia, May 1992, pp.156–167.Google Scholar
  14. 14.
    G. Papadopolous, An Implementation of General Purpose Dataflow Multiprocessor, MIT Press, Cambridge, MA, 1991.Google Scholar
  15. 15.
    R. Saavedra-Barrera, D. Culler, and T. von Eicken, Analysis of Multithreaded Architectures for Parallel Computing, in Proc. of ACM Symposium on Parallel Algorithms and Architectures, pp. 169–178, July 1990.Google Scholar
  16. 16.
    S. Sakai, Y. Yamaguchi, K. Hiraki, and T. Yuba, An Architecture of a Data-flow Single Chip Processor, in Proc. of ACM International Symposium on Computer Architecture, Jerusalem, Israel, May 1989, pp.46–53.Google Scholar
  17. 17.
    M. Sato, Y. Kodama, S. Sakai, Y. Yamaguchi, and Y. Koumura, Thread-based Programming for the EM-4 Hybrid Data-flow Machine, in Proc. of ACM International Symposium on Computer Architecture, Gold Coast, Australia, May 1992, pp.146–155.Google Scholar
  18. 18.
    S. Scott, Synchronization and Communication in the T3E Multiprocessor, in Proc. of ACM Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 1996.Google Scholar
  19. 19.
    B. J. Smith, A Pipelined, Shared Resource MIMD Computer, in Proc. of International Conference on Parallel Processing, 1978, pp.6–8.Google Scholar
  20. 20.
    A. Sohn, J. Ku, Y. Kodama, M. Sato, H. Sakane, H. Yamana, S. Sakai, and Y. Yamaguchi, Identifying the Capability of Overlapping Computation with Communication, in Proc. of ACM/IEEE Conference on Parallel Architectures and Compilation Techniques, Boston, MA, October 1996, pp. 133–138.Google Scholar
  21. 21.
    A. Sohn, M. Sato, N. Yoo, and J-L Gaudiot, Data and Workload Distribution in a Multithreaded Architecture, Journal of Parallel and Distributed Computing, December 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Andrew Sohn
    • 1
  • Yuetsu Kodama
    • 2
  • Jui-Yuan Ku
    • 2
  • Mitsuhisa Sato
    • 3
  • Yoshinori Yamaguchi
    • 3
  1. 1.Computer Information Science Dept.New Jersey Institute of TechnologyNewark
  2. 2.Computer Architecture SectionElectrotechnical LaboratoryTsukuba, IbarakiJapan
  3. 3.Real World Computing Partnership3 Tsukuba Research CenterTsukuba, IbarakiJapan

Personalised recommendations