Potential thread-level-parallelism exploration with superblock reordering

Ye, John; Yan, Hui; Hou, Honglun; Chen, Tianzhou

doi:10.1007/s00607-014-0387-8

Potential thread-level-parallelism exploration with superblock reordering

Published: 12 February 2014

Volume 96, pages 545–564, (2014)
Cite this article

Computing Aims and scope Submit manuscript

John Ye¹,
Hui Yan²,
Honglun Hou² &
…
Tianzhou Chen¹

262 Accesses
7 Citations
Explore all metrics

Abstract

The growing number of processing cores in a single CPU is demanding more parallelism from sequential programs. But in the past decades few work has succeeded in automatically exploiting enough parallelism, which casts a shadow over the many-core architecture and the automatic parallelization research. However, actually few work was tried to understand the nature, or amount, of the potentially available parallelism in programs. In this paper we will analyze at runtime the dynamic data dependencies among superblocks of sequential programs. We designed a meta re-arrange buffer to measure and exploit the available parallelism, with which the superblocks are dynamically analyzed, reordered and dispatched to run in parallel on an ideal many-core processor, while the data dependencies and program correctness are still maintained. In our experiments, we observed that with the superblock reordering, the potential speedup ranged from 1.08 to 89.60. The results showed that the potential parallelism of normal programs was still far from fully exploited by existing technologies. This observation makes the automatic parallelization a promising research direction for many-core architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient High-Level Programming in Plain Java

Article 05 December 2022

GPU Architecture

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

Article Open access 26 April 2018

References

Basupalli V, Yuki T, Rajopadhye S, Morvan A, Derrien S, Quinton P, Wonnacott D (2011) ompVerify: polyhedral analysis for the OpenMP programmer. In: Chapman B, Gropp W, Kumaran K, Müller M (eds) OpenMP in the Petascale Era, vol 6665., Lecture notes in computer scienceSpringer, Berlin, pp 37–53
Chapter Google Scholar
Blake G, Dreslinski RG, Mudge T, Flautner K (2010) Evolution of thread-level parallelism in desktop applications. In: ISCA ’10: Proceedings of the 37th annual international symposium on Computer architecture, ACM, New York, pp 302–313
Hammacher C, Streit K, Hack S, Zeller A (2009) Profiling java programs for parallelism. In: Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering., IWMSE ’09IEEE Computer Society, Washington DC, pp 49–55
Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukolun K (2000) The stanford hydra CMP. IEEE Micro 20(2):71–84
Article Google Scholar
Hwu WMW, Mahlke SA, Chen WY, Chang PP, Warter NJ, Bringmann RA, Ouellette RG, Hank RE, Kiyohara T, Haab GE, Holm JG, Lavery DM (1993) The superblock: an effective technique for VLIW and superscalar compilation. J Supercomput 7(1–2):229–248
Article Google Scholar
Islam MM (2007) On the limitations of compilers to exploit thread-Level parallelism in embedded applications. Computer and information science, ACIS international conference, pp 60–66
Johnson M (1991) Superscalar microprocessor design
Jones D, Marlow S, Singh S (2009) Parallel performance tuning for haskell. In: Proceedings of the 2nd ACM SIGPLAN symposium on Haskell, Haskell ’09ACM, New York, pp 81–92
Nethercote N (2004) Dynamic binary analysis and instrumentation. Ph.D. thesis, Univ. of Cambridge, Cambridge
Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not 42(6):89–100
Article Google Scholar
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. In: SIGGRAPH ’08: ACM SIGGRAPH 2008 classes, ACM, New York, pp 1–14
Ooi CL, Kim SW, Park I, Eigenmann R, Falsafi B, Vijaykumar TN (2001) Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In: Proceedings of the 15th international conference on supercomputing, ICS ’01ACM, New York, NY, pp 368–380
Ottoni G, Rangan R, Stoler A, August DI (2005) Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture, MICRO 38IEEE computer society, Washington DC, pp 105–118
Packirisamy V, Zhai A, Chung Hsu W, Chung Yew P (2009) Fook Ngai T Exploring speculative parallelism in SPEC2006. In: international symposium on performance analysis of systems and software, pp 77–88
Schaumont PR (2010) Analysis of control flow and data flow a practical introduction to hardware/software codesign. A practical introduction to hardware/software codesign, chap., 3 Springer, Boston, pp 71–91
Shobaki G, Wilken KD (2004) Optimal superblock scheduling using enumeration. In: international symposium on microarchitecture, pp 283–293
Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23:414–425
Article Google Scholar
Terboven C, An Sarholz S (2008) OpenMP on multicore architectures. In: A practical programming model for the multi-core era, pp 54–64
Thornton JE (1965) Parallel operation in the control data 6600. In: Proceedings of the October 27–29., 1964, fall joint computer conference, Part II: very high speed computer systems, AFIPS ’64 (Fall, part II) ACM, New York, pp 33–40
Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1):25–33
Article MATH Google Scholar
Wittenbrink CM, Kilgariff E, Prabhu A (2011) Fermi GF100 GPU architecture. Micro IEEE 31(2):50–59
Article Google Scholar
Ye J (2012) Potential parallelism analysis tool of sequential programs [source code]. https://github.com/zjutoe/fat
Ye J, Chen T (2012) Exploring potential parallelism of sequential programs with superblock reordering. In: IEEE HPCC-2012
Zhong H, Mehrara M, Lieberman SA, Mahlke SA (2008) Uncovering hidden loop level parallelism in sequential applications. In: International symposium on high-performance computer, architecture, pp 290–301
Zier DA, Lee B (2010) Performance evaluation of dynamic speculative multithreading with the cascadia architecture. IEEE Tran Parallel Distrib Syst 21(1):47–59
Article Google Scholar

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China under Grant No. 61070001, the National Natural Science Foundation of Zhejiang Province No. LQ12F02017, the Special Funds for Key Program of the China No. 2011ZX0302-004-002 and 2012ZX01031001-003, the Key Science Foundation of Zhejiang Province under Grand No. 2010C11048, Open Fund of Mobile Netwok Application Technology Key Labotatory of Zhejiang Province, Innovation Group of New Generation of Mobile Internet Software and Services of Zhejiang Province.

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, 310028, People’s Republic China
John Ye & Tianzhou Chen
City College, Zhejiang University, Hangzhou, 310028, People’s Republic China
Hui Yan & Honglun Hou

Authors

John Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Honglun Hou
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhou Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Ye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, J., Yan, H., Hou, H. et al. Potential thread-level-parallelism exploration with superblock reordering. Computing 96, 545–564 (2014). https://doi.org/10.1007/s00607-014-0387-8

Download citation

Received: 31 January 2013
Accepted: 26 January 2014
Published: 12 February 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00607-014-0387-8

Keywords

Mathematics Subject Classification

68M07 (Mathematical problems of computer architecture)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Potential thread-level-parallelism exploration with superblock reordering

Abstract

Access this article

Similar content being viewed by others

Efficient High-Level Programming in Plain Java

GPU Architecture

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Potential thread-level-parallelism exploration with superblock reordering

Abstract

Access this article

Similar content being viewed by others

Efficient High-Level Programming in Plain Java

GPU Architecture

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation