Skip to main content
Log in

Shared write buffer to boost applications on SpMT architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the trend of growing number of integrated processing cores on Chip Multiprocessors, researchers are working hard to increase the available parallelism of software programs so as to efficiently harness the growing computing power. One noticeable direction among these efforts is speculative multi-threading (SpMT), a.k.a thread level speculation, which aims to extract thread level parallelism by splitting a sequential execution thread into several finer ones and execute them on parallel. A SpMT thread is in speculative status before it “knows” all its input data are correct. A speculative thread needs to write to the L1 cache, but its output might be discarded if the speculation eventually fails. However, another speculative thread may have already read in such speculative output. Therefore, some mechanism is needed to support speculative read and write. And because the SpMT threads are extracted from a single thread, they usually share lots of data, thus there might be intense data coherence among the L1 caches. It would be very complicated to support data coherence and speculation together. This Paper proposes a shared write buffer among the SpMT cores. We are able to confine the speculative read and write in the SWB, thus the speculation will not interference with coherence, and the L1 cache design could be drastically simplified. Experiments show that the SWB can capture a big portion of inter-core data sharing, reduce cache coherence, and drastically improve data access performance of SpMT threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Akkary H, Driscoll MA (1998) A dynamic multithreading processor. In: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, MICRO31. IEEE Computer Society Press, Los Alamitos, pp 226–236

  2. Bhowmik A, Franklin M (2003) A fast approximate interprocedural analysis for speculative multithreading compilers. In: Proceedings of the 17th annual international conference on supercomputing, ICS ’03. ACM, New York, pp 32–41. doi:10.1145/782814.782822

  3. Blake G, Dreslinski RG, Mudge T, Flautner K (2010) Evolution of thread-level parallelism in desktop applications. In: ISCA ’10: proceedings of the 37th annual international symposium on computer architecture. ACM, New York, pp 302–313

  4. Chen S, Gibbons PB, Kozuch M, Liaskovitis V, Ailamaki A, Blelloch GE, Falsafi B, Fix L, Hardavellas N, Mowry TC, Wilkerson C (2007) Scheduling threads for constructive cache sharing on cmps. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, SPAA ’07. ACM, New York, pp 105–115. doi:10.1145/1248377.1248396

  5. Dubey P, OBrien K, OBrien KM, Barton C (1995) Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading. Technical report

  6. Franklin M, Sohi GS (1996) ARB: a hardware mechanism for dynamic reordering of memory references. IEEE Trans Comput 45(5):552–571. doi:10.1109/12.509907

    Article  MATH  Google Scholar 

  7. Gopal S, Vijaykumar TN, Smith JE, Sohi GS (1998) Speculative versioning cache. In: 1998 Fourth international symposium on high-performance computer architecture, 1998. Proceedings. IEEE, pp 195–205

  8. Keckler SW, Dally WJ, Maskit D, Carter NP, Chang A, Lee WS (1998) Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In: Proceedings of the 25th annual international symposium on computer architecture, ISCA ’98. IEEE Computer Society, Washington, pp 306–317. doi:10.1145/279358.279399

  9. Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880

    Article  Google Scholar 

  10. Krishnan V, Torrellas J (1999) A chip-multiprocessor architecture with speculative multithreading. IEEE Trans Comput 48(9):866–880. http://portal.acm.org/citation.cfm?id=318107.318113

  11. Marcuello P, González A, Tubella J (1998) Speculative multithreaded processors. In: Proceedings of the 12th international conference on supercomputing—ICS ’98, 4. ACM, IEEE, pp 77–84. doi:10.1145/277830.277850. http://portal.acm.org/citation.cfm?doid=277830.277850

  12. Marcuello P, Tubella J, González A (1999) Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd annual ACM/IEEE international symposium on microarchitecture, MICRO 32. IEEE Computer Society, Washington, pp 230–236

  13. Packirisamy V, Wang S, Zhai A, Hsu WC, Yew PC (2006) Supporting speculative multithreading on simultaneous multithreaded processors. In: Robert Y, Parashar M, Badrinath R, Prasanna V (eds.) High performance computing—HiPC 2006. Lecture notes in computer science, vol 4297. Springer, Berlin, pp 148–158. doi:10.1007/11945918_19

  14. Pugsley SH, Spjut JB, Nellans DW, Balasubramonian R (2010) SWEL: hardware cache coherence protocols to map shared data onto shared caches. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques—PACT ’10, p 465. doi:10.1145/1854273.1854331

  15. Puiggali J, Szymanski BK, Jové T, Marzo JL (2013) Dynamic branch speculation in a speculative parallelization architecture for computer clusters. Concurr Comput Pract Exp 25(7):932–960. doi:10.1002/cpe.2872

    Article  Google Scholar 

  16. Roth A, Sohi GS (2001) Speculative data-driven multithreading. In: HPCA ’01: proceedings of the 7th international symposium on high-performance computer architecture. IEEE Computer Society, Washington

  17. Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23:414–425. doi:10.1145/223982.224451

    Article  Google Scholar 

  18. Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: SIGARCH computer architecture news, ISCA ’00, vol 28. ACM, New York, pp. 1–12. doi:10.1145/339647.339650

  19. Tsai JYTJY, Yew PCYPC (1996) The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation. In: Proceedings of the 1996 conference on parallel architectures and compilation technique, pp 35–46. doi:10.1109/PACT.1996.552553

  20. Vijaykumar TN, Gopal S, Smith JE, Sohi G (2001) Speculative versioning cache. IEEE Trans Parallel Distrib Syst 12:1305–1317. doi:10.1109/71.970565

    Article  Google Scholar 

  21. Ye J, Chen T (2012) Exploring potential parallelism of sequential programs with superblock reordering. In: IEEE HPCC-2012. doi:10.1109/HPCC.2012.12

  22. Ye J, Yan H, Hou H, Chen T (2014) Potential thread-level-parallelism exploration with superblock reordering. Computing 96(6):545–564. doi:10.1007/s00607-014-0387-8

    Article  Google Scholar 

  23. Ye JM, Cao M, Qu Z, Chen T (2012) Regional cache organization for NoC based many-core processors. J Comput Syst Sci. doi:10.1016/j.jcss.2012.05.002

    Google Scholar 

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China under Grant No. 61070001, the National Natural Science Foundation of Zhejiang Province No. LQ12F02017, the Special Funds for Key Program of the China No. 2011ZX0302-004-002 and 2012ZX01031001-003, the Key Science Foundation of Zhejiang Province under Grand No. 2010C11048, Open Fund of Mobile Network Application Technology Key Laboratory of Zhejiang Province, Innovation Group of New Generation of Mobile Internet Software and Services of Zhejiang Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, M., Ye, J., Chen, T. et al. Shared write buffer to boost applications on SpMT architecture. J Supercomput 73, 3508–3525 (2017). https://doi.org/10.1007/s11227-016-1710-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1710-2

Keywords

Navigation