Skip to main content
Log in

Tuning lock-based multicore program based on sliding windows to tolerate data race

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Because in-house debugging and test are difficult to discover all potential data races in multicore programs, it is necessary and significant to tolerate the potential data races in the production-run phase to secure the correct execution. However, the existing tolerating methods are limited to some kinds of data races. This paper proposes a new data-race tolerating approach, which can detect and adjust the data races whether it is in the protection of critical section or lack of protection to improve the correctness of multicore programs. It uses sliding windows to accommodate the memory instructions in critical section or recent memory instructions lack of protection and detects the potential data races which are more likely to cause errors. Then, by delaying the critical reversion points, data races are adjusted to reduce the probability of software failure. To implement the tolerating approach, the current multicore processor need not change its original cache coherence protocol and just adds very little hardware. Simulation results show that it brings low hardware, low bandwidth overhead, and negligible slowdown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Netzer RHB, Miller BP (1992) What are race conditions?: some issues and formalizations. ACM Lett Program Lang Syst (LOPLAS) 1(1):74–88

    Article  Google Scholar 

  2. Wu J, Cui H, Yang J (2010) Bypassing races in live applications with execution filters. OSDI 10:1–3

    Google Scholar 

  3. Ratanaworabhan P et al (2012) Efficient runtime detection and toleration of asymmetric races. IEEE Trans Comput 61(4):548–562

    Article  MathSciNet  Google Scholar 

  4. Rajamani S, Ramalingam G, Ranganath VP, Vaswani K (2009) ISOLATOR: dynamically ensuring isolation in concurrent programs. ASPLOS 44:181–192

    Article  Google Scholar 

  5. Qi, S et al (2012) Pacman: tolerating asymmetric data races with unintrusive hardware. In: IEEE 18th International Symposium on High Performance Computer Architecture (HPCA), IEEE

  6. Qi, S et al (2014) Dynamically detecting and tolerating if-condition data races. In: IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), IEEE, 2014

  7. Orosa L, Lourenço J (2016) A hardware approach to detect, expose and tolerate high level data races. In: The 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). IEEE, pp 159–167

  8. Lucia B, Ceze L, Strauss K (2010) ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations. ACM SIGARCH Comput Arch News 38(3):222–233

    Article  Google Scholar 

  9. Marathe VJ, Dice D (2014) Systems and methods for detecting and tolerating atomicity violations between concurrent code blocks. U.S. Patent No. 8,732,682

  10. Abadi M, Harris T, Mehrara M (2009) Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

  11. Lucia B, Devietti J, Strauss K, Ceze L (2008) Atom-aid: detecting and surviving atomicity violations. In: International Symposium on Computer Architecture

  12. Jin G et al (2012) Automated concurrency-bug fixing. OSDI 12(2012):221–236

    Google Scholar 

  13. Yu J, Narayanasamy S (2010) Tolerating concurrency bugs using transactions as lifeguards. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society

  14. Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The Splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp 24–36

  15. SDTimes. Testers spend too much time testing. http://www.sdtimes.com/SearchResult/31134. Accessed 2012

  16. Muzahid A, Suárez D, Qi S et al (2009) SigRace: signature-based data race detection. ACM SIGARCH Comput Arch News 37(3):337–348

    Article  Google Scholar 

  17. Savage S, Burrows M, Nelson G et al (1997) Eraser: a dynamic data race detector for multithreaded programs. ACM Trans Comput Syst (TOCS) 15(4):391–411

    Article  Google Scholar 

  18. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. CACM 13(7):422–426

    Article  Google Scholar 

  19. Lusk E, Boyle J, Butler R, Disz T, Glickfeld B, Overbeek R, Patterson J, Stevens R (1988) Portable programs for parallel processors. Rinehart & Winston, Holt

    Google Scholar 

  20. Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Arch News 33:92–99

    Article  Google Scholar 

  21. Orosa L, Lourenço J (2014) A hardware approach for detecting, exposing and tolerating high level atomicity violations. In: Workshop on Dependable Multicore and Transactional Memory Systems (DMTM)

  22. Lucia B, Ceze L (2013) Cooperative empirical failure avoidance for multithreaded programs. ACM SIGPLAN Notices 48(4):39–50

    Article  Google Scholar 

  23. Krena B, Letko Z, Tzoref R, Ur S, Vojnar T (2007) Healing data races on-the-fly. In: ACM Workshop on Parallel and Distributed Systems: Testing and Debugging

  24. Ratanaworabhan P et al (2012) Hardware support for enforcing isolation in lock-based parallel programs. In: Proceedings of the 26th ACM International Conference on Supercomputing. ACM

  25. Zhang W et al (2013) ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution. ACM SIGARCH Comput Arch News 41(1):113–126

    Google Scholar 

  26. Liu P, Tripp O, Zhang C (2014) Grail: context-aware fixing of concurrency bugs. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM

  27. Tchamgoue GM, Kim KH, Jun YK (2016) EventHealer: bypassing data races in event-driven programs. J Syst Softw 118:208–220

    Article  Google Scholar 

Download references

Acknowledgements

The research has been supported by National Natural Youth Science Foundation of China (61502123), Heilongjiang Provincial Youth Science Foundation (QC2015084), and the National Key R&D Plan of China (2017YFB1302701). We thank the anonymous reviewers and our group members for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, S., Chen, Z. & Sun, G. Tuning lock-based multicore program based on sliding windows to tolerate data race. J Supercomput 75, 7872–7894 (2019). https://doi.org/10.1007/s11227-019-02921-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02921-7

Keywords

Navigation