Advertisement

Speculatively Multithreaded Architectures

  • Gurindar S. Sohi
  • T. N. Vijaykumar
Chapter
Part of the Integrated Circuits and Systems book series (ICIR)

Abstract

Using the increasing number of transistors to build larger dynamic-issue superscalar processors for the purposes of exposing more parallelism has run into problems of diminishing returns, great design complexity, and high power dissipation. While chip multiprocessors (CMPs) alleviate these problems by employing multiple smaller, power-efficient cores to utilize the available transistors, CMPs require parallel programming which is significantly harder than sequential programming. Speculatively multithreaded architectures address both the programmability issues of CMPs and the power–complexity–performance problems of superscalar processors. Speculatively multithreaded architectures partition a sequential program into contiguous program fragments called tasks which are executed in parallel on multiple cores. The architectures execute the tasks in parallel by speculating that the tasks are independent, though the tasks are not guaranteed to be independent. The architecture provides hardware support to detect dependencies and roll back misspeculations. This chapter addresses the key questions of how programs are partitioned into tasks while maximizing inter-task parallelism and how inter-task control-flow and data dependencies (register and memory dependencies) are maintained especially in the distributed multicore organization employed by the speculatively multithreaded architectures.

Keywords

Task Selection Transactional Memory Branch Prediction Annual International Symposium Chip Multiprocessor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pages 226–236, 1998.Google Scholar
  2. 2.
    C. S. Ananian, et al. Unbounded transactional memory. In HPCA ’05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 316–327, 2005.Google Scholar
  3. 3.
    T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 342–351, May 1992.Google Scholar
  4. 4.
    S. E. Breach. Design and Evaluation of a Multiscalar Processor. Ph.D. thesis, University of Wisconsin-Madison, 1998.Google Scholar
  5. 5.
    L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk Disambiguation of Speculative Threads in Multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 227–238, 2006Google Scholar
  6. 6.
    J. Chung, et al. Tradeoffs in transactional memory virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 371–381, 2006.Google Scholar
  7. 7.
    M. Cintra, J. F. Martnez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of 27th Annual International Symposium on Computer Architecture, June 2000.Google Scholar
  8. 8.
    M. Franklin and G. S. Sohi. The expandable split window paradigm for exploiting fine-grained parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 58–67, May 19–21, 1992.Google Scholar
  9. 9.
    M. Franklin and G. S. Sohi. Analysis for Streamlining Inter-operation Communication in Fine-Grain Parallel Processors. In Proceedings of the 27th Annual ACM/IEEE International Symposium on Microarchitecture, pages 226–236, Dec 1992.Google Scholar
  10. 10.
    M. Franklin and G. S. Sohi. ARB: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(6):552–571, May 1996.MATHCrossRefGoogle Scholar
  11. 11.
    S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, Feb 1998.Google Scholar
  12. 12.
    L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58–69, Oct 02–07, 1998.Google Scholar
  13. 13.
    L. Hammond, et al. Transactional memory coherence and consistency. In ISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 102, 2004.Google Scholar
  14. 14.
    M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA ’93: Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289–300, 1993.Google Scholar
  15. 15.
    E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 186–197, 2007.Google Scholar
  16. 16.
    T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 59–70, 2004.Google Scholar
  17. 17.
    K. E. Moore, J. Bobba, M. J. Morovan, M. D. Hill, and D. A. Wood. LogTM: log-based transactional memory. In HPCA ’06: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pages 254–265, 2006.Google Scholar
  18. 18.
    M. J. Moravan, et al. Supporting nested transactional memory in LogTM. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 359–370, 2006.Google Scholar
  19. 19.
    A. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 181–193, June 2–4, 1997.Google Scholar
  20. 20.
    A. Moshovos. Memory Dependence Prediction, Ph.D. thesis, University of Wisconsin, Dec 1998.Google Scholar
  21. 21.
    S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of the 24th International Symposium on Computer Architecture, pages 206–218, June 1997.Google Scholar
  22. 22.
    II Park, Babak Falsafi, and T. N. Vijayakumar. Implicitly-Multithreaded Processors. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03), page 39, 2003.Google Scholar
  23. 23.
    R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. In ISCA ’05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 494–505, 2005.Google Scholar
  24. 24.
    V. Sarkar and J. Hennessy. Partitioning parallel programs for macro-dataflow. In Proceedings of the Conference on LISP and Functional Programming, pages 202–211, 1986.Google Scholar
  25. 25.
    J. E. Smith. A study of branch prediction strategies. In Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 135–148, May 1981.Google Scholar
  26. 26.
    G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414–425, June 22–24, 1995.Google Scholar
  27. 27.
    J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of 27th Annual International Symposium on Computer Architecture, pages 1–12, June 2000.Google Scholar
  28. 28.
    D. M. Tullsen, S. J. Eggers, and Levy, H. M. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.Google Scholar
  29. 29.
    T. N. Vijaykumar and G. S. Sohi. Task selection for a multiscalar processor. The 31st International Symposium on Microarchitecture (MICRO-31), Dec 1998.Google Scholar
  30. 30.
    H. H. Yang and D. F. Wong. Efficient network flow based min-cut balanced partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(12), Dec 1996.Google Scholar
  31. 31.
    T. Yeh and Y. N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual international Symposium on Computer Architecture, pages 124–134. May 1992.Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  1. 1.Computer Sciences DepartmentUniversity of WisconsinMadisonUSA
  2. 2.Purdue UniversityWest LafayetteUSA

Personalised recommendations