$${\mathrm{DS}}_{\mathrm{spirit}}$$ : a data dependence and stride reference patterns profiling infrastructure

Yu, Hairong; Li, Guohui; Shu, LihChyun

doi:10.1007/s11227-015-1612-8

${\mathrm{DS}}_{\mathrm{spirit}}$: a data dependence and stride reference patterns profiling infrastructure

Published: 16 January 2016

Volume 72, pages 770–788, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hairong Yu¹,
Guohui Li¹ &
LihChyun Shu²

203 Accesses
1 Citation
Explore all metrics

Abstract

Despite the widespread use of multi-core processors in modern computer systems, developing software tools so as to make best use of available computing resources has never been more urgent. This is because a considerable amount of spurious dependence and cache misses lurking in general-purpose applications restricts seriously the extraction of potential parallelism on the nowadays prevalent multi-core machines. Existing tools are limited in their ability to thoroughly detect data dependence and provide prefetched objects simultaneously. Further, some of the tools are unable to profile large-scale applications. To address this problem, we propose a novel profiler, called ${\mathrm{DS}}_{\mathrm{spirit}}$, that performs both data dependence and stride reference profiling. Data dependence profiling employs a hash-based scheme to detect actual data dependence while filtering out useless dependence via timestamps. Stride reference profiling employs value profiling to profile the stride pattern for each dynamic load and select the profitable loads as prefetched objects for compilers. To demonstrate the effectiveness of ${\mathrm{DS}}_{\mathrm{spirit}}$, we have evaluated it using several SPEC CPU2006, MPI2007 and OMP2012 benchmarks on an Intel i7-4700 machine. Experimental results show that ${\mathrm{DS}}_{\mathrm{spirit}}$ produces accurate profiling results, including expected data dependence and prefetched objects, which in turn contributes to more opportunities for extracting parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dissecting the Phytium 2000+ Memory Hierarchy via Microbenchmarking

BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism

A Generic High-Performance Method for Deinterleaving Scientific Data

References

Allen R, Kennedy K (2002) Optimizing compilers for modern architectures. Morgan Kaufmann, San Francisco
Bridges M, Vachharajani N, Zhang Y, Jablin T, August D (2007) Revisiting the sequential programming model for multi-core. In: Proceedings of the 40th Annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 69–84
Bridges MJ (2008) The velocity compiler: extracting efficient multicore execution from legacy sequential codes. Ph.D. thesis, Princeton University
Campanoni S, Jones T, Holloway G, Reddi VJ, Wei GY, Brooks D (2012) Helix: automatic parallelization of irregular programs for chip multiprocessing. In: Proceedings of the tenth international symposium on code generation and optimization. ACM, pp 84–93
Chen T, Lin J, Dai X, Hsu WC, Yew PC (2004) Data dependence profiling for speculative optimizations. In: Compiler construction. Springer, pp 57–72
Ding C, Shen X, Kelsey K, Tice C, Huang R, Zhang C (2007) Software behavior oriented parallelization. In: ACM SIGPLAN Notices, vol 42. ACM, pp 223–234
Eustace A, Srivastava A (1995) Atom: a flexible interface for building high performance program analysis tools. In: Proceedings of the USENIX technical conference proceedings. USENIX Association, pp 25–25
Henning JL (2006) Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4):1–17
Article MathSciNet Google Scholar
Ketterlin A, Clauss P (2012) Profiling data-dependence to assist parallelization: framework, scope, and optimization. In: Proceedings of the 2012 45th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 437–448
Kim M, Kim H, Luk CK (2010) Sd3: a scalable approach to dynamic data-dependence profiling. In: Proc. of the 43rd annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 535–546
Lattner C, Adve V (2004) Llvm: a compilation framework for lifelong program analysis & transformation. In: Proc. of the international symposium on code generation and optimization (CGO). IEEE, pp 75–86
Levon J (2004) Oprofile manual. Victoria University of Manchester
Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN notices, vol 40. ACM, pp 190–200
de Melo AC (2010) The new linuxperftools. In: Slides from Linux Kongress
Müller MS, Baron J, Brantley WC, Feng H, Hackenberg D, Henschel R, Jost G, Molka D, Parrott C, Robichaux J, et al (2012) Spec omp2012 an application benchmark suite for parallel systems using openmp. In: OpenMP in a heterogeneous world. Springer, pp 223–236
Müller MS, van Waveren M, Lieberman R, Whitney B, Saito H, Kumaran K, Baron J, Brantley WC, Parrott C, Elken T et al (2010) Spec mpi2007 an application benchmark suite for parallel systems using mpi. Concurr Comput Pract Exp 22(2):191–205
Google Scholar
Nethercote N (2004) Dynamic binary analysis and instrumentation. Ph.D. thesis, PhD thesis, University of Cambridge
Nethercote N, Seward J (2007) How to shadow every byte of memory used by a program. In: Proceedings of the 3rd international conference on virtual execution environments. ACM, pp 65–74
Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Not 42(6):89–100
Article Google Scholar
Ottoni G, Rangan R, Stoler A, August DI (2005) Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th Annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, p 12
Raman A, Kim H, Mason TR, Jablin TB, August DI (2010) Speculative parallelization using software multi-threaded transactions. In: ACM SIGARCH computer architecture news, vol 38. ACM, pp 65–76
Raman E, Ottoni G, Raman A, Bridges MJ, August DI (2008) Parallel-stage decoupled software pipelining. In: Proceedings of the 6th annual IEEE/ACM international symposium on code generation and optimization. ACM, pp 114–123
Rangan R, Vachharajani N, Vachharajani M, August DI (2004) Decoupled software pipelining with the synchronization array. In: Proceedings of the 13th International conference on parallel architectures and compilation techniques. IEEE Computer Society, pp 177–188
Rauchwerger L, Padua D (1994) The privatizing doall test: a run-time technique for doall loop identification and array privatization. In: Proceedings of the 8th international conference on supercomputing. ACM, pp 33–43
Rul S, Vandierendonck H, De Bosschere K (2008) Extracting coarse-grain parallelism in general-purpose programs. In: Proceedings of the 13th ACM SIGPLAN Symposium on principles and practice of parallel programming. ACM, pp 281–282
Rul S, Vandierendonck H, De Bosschere K (2010) A profile-based tool for finding pipeline parallelism in sequential programs. Parallel Comput 36(9):531–551
Article MATH Google Scholar
Steffan JG, Colohan C, Zhai A, Mowry TC (2005) The stampede approach to thread-level speculation. ACM Trans Comput Syst (TOCS) 23(3):253–300
Article Google Scholar
Tian C, Feng M, Nagarajan V, Gupta R (2008) Copy or discard execution model for speculative parallelization on multicores. In: Proceedings of the 41st annual IEEE/ACM International symposium on microarchitecture. IEEE Computer Society, pp 330–341
Tournavitis G, Franke B (2010) Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques. ACM, pp 377–388
Tournavitis G, Wang Z, Franke B, OBoyle M (2009) Towards a holistic approach to auto-parallelization. In: 2009 Conference on programming language design and implementation (PLDI)
Vachharajani N, Rangan R, Raman E, Bridges MJ, Ottoni G, August DI (2007) Speculative decoupled software pipelining. In: Proceedings of the 16th international conference on parallel architecture and compilation techniques. IEEE Computer Society, pp 49–59
Vachharajani NA (2008) Intelligent speculation for pipelined multithreading. Princeton University
Vandierendonck H, Rul S, De Bosschere K (2010) The paralax infrastructure: automatic parallelization with a helping hand. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques. ACM, pp 389–400
Wang Z, O’Boyle MF (2009) Mapping parallelism to multi-cores: a machine learning based approach. In: ACM Sigplan notices, vol 44. ACM, pp 75–84
Xin B, Sumner WN, Zhang X (2008) Efficient program execution indexing. In: ACM SIGPLAN notices, vol 43, no 6. ACM, pp 238–248
Yu H, Ko HJ, Li Z (2013) General data structure expansion for multi-threading. In: Proceedings of the 34th ACM SIGPLAN conference on programming language design and implementation. ACM, pp 243–252
Yu H, Li Z (2012) Fast loop-level data dependence profiling. In: Proceedings of the 26th ACM international conference on supercomputing. ACM, pp 37–46
Zhang X, Navabi A, Jagannathan S (2009) Alchemist: a transparent dependence distance profiling infrastructure. In: Proceedings of the 7th annual IEEE/ACM international symposium on code generation and optimization, pp 47–58
Zhao Q, Sim JE, Wong WF, Rudolph L (2006) Dep: detailed execution profile. In: Proceedings of the 15th international conference on parallel architectures and compilation techniques. ACM, pp 154–163
Zhong H, Mehrara M, Lieberman S, Mahlke S (2008) Uncovering hidden loop level parallelism in sequential applications. In: Proc. of IEEE 14th international symposium on high performance computer architecture (HPCA). IEEE, pp 290–301
Zilles C, Sohi G (2002) Master/slave speculative parallelization. In: Microarchitecture, 2002. (MICRO-35). Proceedings 35th Annual IEEE/ACM international symposium on. IEEE, pp 85–96

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Hairong Yu & Guohui Li
National Cheng Kung University, Tainan, Taiwan
LihChyun Shu

Authors

Hairong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Guohui Li
View author publications
You can also search for this author in PubMed Google Scholar
LihChyun Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to LihChyun Shu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, H., Li, G. & Shu, L. ${\mathrm{DS}}_{\mathrm{spirit}}$: a data dependence and stride reference patterns profiling infrastructure. J Supercomput 72, 770–788 (2016). https://doi.org/10.1007/s11227-015-1612-8

Download citation

Published: 16 January 2016
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11227-015-1612-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

\({\mathrm{DS}}_{\mathrm{spirit}}\): a data dependence and stride reference patterns profiling infrastructure

Abstract

Access this article

Similar content being viewed by others

Dissecting the Phytium 2000+ Memory Hierarchy via Microbenchmarking

BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism

A Generic High-Performance Method for Deinterleaving Scientific Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

\({\mathrm{DS}}_{\mathrm{spirit}}\): a data dependence and stride reference patterns profiling infrastructure

Abstract

Access this article

Similar content being viewed by others

Dissecting the Phytium 2000+ Memory Hierarchy via Microbenchmarking

BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism

A Generic High-Performance Method for Deinterleaving Scientific Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation