Skip to main content

LightPlay: Efficient Replay with GPUs

  • 800 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 8967)


Previous deterministic replay systems reduce the runtime overhead by either relying on hardware support or by relaxing the determinism requirements for replay. We propose LightPlay that fulfills stricter determinism requirements with low overhead without requiring hardware or OS support. LightPlay guarantees that the memory state after each instruction instance in a replay run is the same as in original run. It reduces logging overhead using a lightweight thread local technique that avoids synchronization between threads during the recording run. GPUs are used to efficiently identify the memory ordering constraints that produce the same memory states before the replay run. LightPlay incurs low space overhead for logging as it only stores the part of log where data races occur. During the logging run LightPlay is 20x–100x faster than logging the total order and requires only 1 % space overhead.


  • Memory Access
  • Time Slice
  • Memory Instruction
  • Data Race
  • Runtime Overhead

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This work is supported by NSF grants CNS-1157377 and CCF-0905509 to UCR.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-17473-0_22
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-17473-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 16.
Fig. 17.
Fig. 18.


  1. Altekar, G., Stoica, I.: Odr: output-deterministic replay for multicore debugging. In: SOSP, pp: 193–206 (2009)

    Google Scholar 

  2. Bhansali, S., Chen, W.-K., de Jong, S., Edwards, A., Murray, R., Drinić, M., Mihočka, D., Chau, J.: Framework for instruction-level tracing and analysis of program executions. In: VEE, pp. 154–163 (2006)

    Google Scholar 

  3. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: PACT (2008)

    Google Scholar 

  4. Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)

    CrossRef  Google Scholar 

  5. Dunlap, G,W., Lucchetti, D.G., Fetterman, M.A., Chen P.M.: Execution replay of multiprocessor virtual machines. In: VEE (2008)

    Google Scholar 

  6. Hower, D.R., Hill, M.D.: Rerun: exploiting episodes for lightweight memory race recording. In: ISCA, pp. 265–276 (2008)

    Google Scholar 

  7. Huang, J., Liu, P., Zhang, C.: Leap: lightweight deterministic multi-processor replay of concurrent java programs. In: FSE, pp. 207–216 (2010)

    Google Scholar 

  8. King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: USENIX (2005)

    Google Scholar 

  9. LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Trans. Comput. 36(4), 471–482 (1987)

    CrossRef  Google Scholar 

  10. Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Offline symbolic analysis to infer total store order. In: HPCA. IEEE (2011)

    Google Scholar 

  11. Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Pereira. Offline symbolic analysis for multi-processor execution replay. In: MICRO, pp. 564–575 (2009)

    Google Scholar 

  12. Lee, D., Wester, B., Veeraraghavan, K., Narayanasamy, S., Chen, P.M., Flinn, J.: Respec: efficient online multiprocessor replayvia speculation and external determinism. In: ASPLOS, pp. 77–90 (2010)

    Google Scholar 

  13. Montesinos, P., Ceze, L., Torrellas, J.: Delorean: recording and deterministically replaying shared-memory multiprocessor execution efficiently. In: ISCA, pp. 289–300 (2008)

    Google Scholar 

  14. Nagarajan, V., Gupta, R.: Ecmon: exposing cache events for monitoring. In: ISCA, pp. 34–360 (2009)

    Google Scholar 

  15. Narayanasamy, S., Pereira, C., Calder, B.: Recording shared memory dependencies using strata. In: ASPLOS, pp. 229–240 (2006)

    Google Scholar 

  16. Park, S., Zhou, Y., Xiong, W., Yin, Z., Kaushik, R., Lee, K.H., Lu, S.: Pres: probabilistic replay with execution sketching on multiprocessors. In: SOSP, pp. 177–192 (2009)

    Google Scholar 

  17. Srinivasan, S.M., Kandula, S., Andrews, C.R., Zhou, Y.: Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In: USENIX (2004)

    Google Scholar 

  18. Tucek, J., Lu, S., Huang, C., Xanthos, S., Zhou, Y.: Triage: diagnosing production run failures at the user’s site. In: SOSP (2007)

    Google Scholar 

  19. Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. In: ASPLOS, pp. 15–26 (2011)

    Google Scholar 

  20. Vlachos, E., Goodstein, M.L., Kozuch, M.A., Chen, S., Falsafi, B., Gibbons, P.B., Mowry, T.C.: Paralog: enabling and accelerating online parallel monitoring of multithreaded applications. In: ASPLOS, pp. 271–284 (2010)

    Google Scholar 

  21. Weeratunge, D., Zhang, X., Jagannathan, S.: Analyzing multicore dumps to facilitate concurrency bug reproduction. In: ASPLOS (2010)

    Google Scholar 

  22. Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder" for enabling full-system multiprocessor deterministic replay. In: ISCA, pp. 122–135 (2003)

    Google Scholar 

  23. Zamfir, C., Candea, G.: Execution synthesis: a technique for automated software debugging. In: EuroSys, pp. 321–334 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Min Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Feng, M., Khorasani, F., Gupta, R., Bhuyan, L.N. (2015). LightPlay: Efficient Replay with GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)