Efficient data race detection for async-finish parallelism

Raman, Raghavan; Zhao, Jisheng; Sarkar, Vivek; Vechev, Martin; Yahav, Eran

doi:10.1007/s10703-012-0143-7

Efficient data race detection for async-finish parallelism

Published: 13 March 2012

Volume 41, pages 321–347, (2012)
Cite this article

Formal Methods in System Design Aims and scope Submit manuscript

Raghavan Raman¹,
Jisheng Zhao¹,
Vivek Sarkar¹,
Martin Vechev² &
…
Eran Yahav³

370 Accesses
13 Citations
Explore all metrics

Abstract

A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, including determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10× or more) for use in mainstream software development.

In this paper, we present an efficient dynamic race detection algorithm that handles both the async-finish task-parallel programming model used in languages such as X10 and Habanero Java (HJ) and the spawn-sync constructs used in Cilk.

We have implemented our algorithm in a tool called TaskChecker and evaluated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experimental results indicate that our approach performs well in practice, incurring an average slowdown of 3.05× compared to a serial execution in the optimized case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GT-Race: Graph Traversal Based Data Race Detection for Asynchronous Many-Task Parallelism

Dynamic Determinacy Race Detection for Task Parallelism with Futures

An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection

Notes

The construct for mutual exclusion is called atomic in X10 and isolated in HJ.
The ESP-bags algorithm is precise and sound when the program contains async and finish constructs only. When the program contains isolated constructs, it is precise but not sound (i.e., there may be false negatives).
As advocated in [16], we use the isolated keyword instead of atomic to make explicit the fact that the construct supports weak isolation rather than strong atomicity.
We refer to a sync that is executed under some condition in a function body as a conditional sync.
We refer to an execution of a statement as either a dynamic statement instance or a statement instance.
The definition of the static version of MHP can be found in [2].
A node is considered both an ancestor and a descendant of itself.
The depth of a node in a tree is the length of the path from the root to the node.
This is assuming there are no asyncs outside any finish in the program. If there are any such asyncs, then the only sequential code regions in the program are the regions outside the outermost finish and before the first such async.

References

Agarwal S, Barik R, Bonachea D, Sarkar V, Shyamasundar RK, Yelick K (2007) Deadlock-free scheduling of X10 computations with bounded resources. In: SPAA ’07: Proceedings of the 19th symposium on parallel algorithms and architectures. ACM, New York, pp 229–240
Chapter Google Scholar
Agarwal S, Barik R, Sarkar V, Shyamasundar RK (2007) May-happen-in-parallel analysis of ×10 programs. In: PPoPP ’07: Proceedings of the 12th symposium on principles and practice of parallel programming. ACM, New York, pp 183–193
Google Scholar
Barik R, Budimlic Z, Cave V, Chatterjee S, Guo Y, Peixotto D, Raman R, Shirako J, Tasirlar S, Yan Y, Zhao Y, Sarkar V (2009) The habanero multicore software research project. In: OOPSLA ’09: Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications, New York, NY, USA. ACM, New York, pp 735–736
Chapter Google Scholar
Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y (1995) Cilk: an efficient multithreaded runtime system. In: Proceedings of the fifth ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP, Oct 1995, pp 207–216
Chapter Google Scholar
Blumofe RD, Leiserson CE (1999) Scheduling multithreaded computations by work stealing. J ACM 46(5):720–748
Article MathSciNet MATH Google Scholar
Bocchino R, Adve V, Adve S, Snir M (2009) Parallel programming must be deterministic by default. In: First USENIX workship on hot topics in parallelism (HOTPAR 2009)
Google Scholar
Bodden E, Lam P, Hendren L (2010) Clara: a framework for statically evaluating finite-state runtime monitors. In: 1st international conference on runtime verification (RV), Nov 2010. LNCS, vol 6418. Springer, Berlin, pp 74–88
Chapter Google Scholar
Charles P, Grothoff C, Saraswat VA, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the twentieth annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA, Oct, pp 519–538
Chapter Google Scholar
Cheng G-I, Feng M, Leiserson CE, Randall KH, Stark AF (1998) Detecting data races in Cilk programs that use locks. In: Proceedings of the tenth annual ACM symposium on parallel algorithms and architectures (SPAA ’98), Puerto Vallarta, Mexico, June 28–July 2 1998, pp 298–309
Chapter Google Scholar
Dijkstra EW Cooperating sequential processes. 65–138
Feng M, Leiserson CE (1997) Efficient detection of determinacy races in Cilk programs. In: SPAA ’97: proceedings of the ninth annual ACM symposium on parallel algorithms and architectures. ACM, New York, pp 1–11
Chapter Google Scholar
Flanagan C, Freund SN (2009) Fasttrack: efficient and precise dynamic race detection. In: PLDI ’09: proceedings of the 2009 ACM SIGPLAN conference on programming language design and implementation. ACM, New York, pp 121–133
Chapter Google Scholar
Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: PLDI’98, NY, USA, 1998. ACM, New York, pp 212–223
Google Scholar
Guo Y, Barik R, Raman R, Sarkar V (2009) Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS ’09: proceedings of the international symposium on parallel&distributed processing. IEEE Computer Society, Washington, pp 1–12
Google Scholar
Habanero Java http://habanero.rice.edu/hj
Larus JR, Rajwar R (2006) Transactional memory. Morgan and Claypool, San Francisco
Google Scholar
Lea D (2000) A java fork/join framework. In: JAVA ’00: proceedings of the ACM 2000 conference on Java Grande. ACM, New York, pp 36–43
Chapter Google Scholar
Lee EA (2006) The problem with threads. Computer 39(5):33–42
Article Google Scholar
Lee JK, Palsberg J (2010) Featherweight ×10: a core calculus for async-finish parallelism. In: PPoPP ’10: proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel computing. ACM, New York, pp 25–36
Chapter Google Scholar
Leijen D, Schulte W, Burckhardt S (2009) The design of a task parallel library. In: OOPSLA ’09: proceeding of the 24th ACM SIGPLAN conference on object oriented programming systems languages and applications. ACM, New York, pp 227–242
Chapter Google Scholar
Mellor-Crummey J (1993) Compile-time support for efficient data race detection in shared-memory parallel programs. In: PADD ’93: proceedings of the 1993 ACM/ONR workshop on parallel and distributed debugging, New York, NY, USA, 1993. ACM, New York, pp 129–139
Chapter Google Scholar
Purandare R, Dwyer MB, Elbaum S (2010) Monitor optimization via stutter-equivalent loop transformation. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, New York, NY, USA, 2010, OOPSLA ’10. ACM, New York, pp 270–285
Chapter Google Scholar
Sadowski C, Freund SN, Flanagan C (2009) SingleTrack: A dynamic determinism checker for multithreaded programs. In: Programming languages and systems. Lecture notes in computer science, vol 5502. Springer, Berlin, pp 394–409
Chapter Google Scholar
Tarjan RE (1975) Efficiency of a good but not linear set union algorithm. J ACM 22:215–225
Article MathSciNet MATH Google Scholar
Tarjan RE (1983) Data structures and network algorithms. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Vallée-Rai R et al. (1999) Soot—a Java optimization framework. In: Proceedings of CASCON 1999, pp 125–135
Google Scholar
Zhao J, Sarkar V (2011) Intermediate language extensions for parallelism. In: VMIL’11, pp 333–334
Google Scholar

Download references

Acknowledgements

We would like to thank Jacob Burnim and Koushik Sen from UC Berkeley, Jaeheon Yi and Cormac Flanagan from UC Santa Cruz, and John Mellor-Crummey from Rice University for their feedback on an earlier version of this paper. We thank Charles Leiserson for pointing out the conditional sync example. We are grateful to Jill Delsigne for her assistance with copy-editing the final version of this paper. We also thank the US-Israel Binational Foundation (BSF) for their support.

Author information

Authors and Affiliations

Rice University, 6100 Main St, Houston, TX, 77005, USA
Raghavan Raman, Jisheng Zhao & Vivek Sarkar
UNG H 14, ETH Zürich, Universitätstrasse 19, Zürich, 8092, Switzerland
Martin Vechev
Technion–Israel Institute of Technology, Taub Building 734, Haifa, 32000, Israel
Eran Yahav

Authors

Raghavan Raman
View author publications
You can also search for this author in PubMed Google Scholar
Jisheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Martin Vechev
View author publications
You can also search for this author in PubMed Google Scholar
Eran Yahav
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghavan Raman.

Additional information

E. Yahav is a Deloro Fellow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raman, R., Zhao, J., Sarkar, V. et al. Efficient data race detection for async-finish parallelism. Form Methods Syst Des 41, 321–347 (2012). https://doi.org/10.1007/s10703-012-0143-7

Download citation

Published: 13 March 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10703-012-0143-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient data race detection for async-finish parallelism

Abstract

Access this article

Similar content being viewed by others

GT-Race: Graph Traversal Based Data Race Detection for Asynchronous Many-Task Parallelism

Dynamic Determinacy Race Detection for Task Parallelism with Futures

An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient data race detection for async-finish parallelism

Abstract

Access this article

Similar content being viewed by others

GT-Race: Graph Traversal Based Data Race Detection for Asynchronous Many-Task Parallelism

Dynamic Determinacy Race Detection for Task Parallelism with Futures

An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation