LightPlay: Efficient Replay with GPUs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8967)


Previous deterministic replay systems reduce the runtime overhead by either relying on hardware support or by relaxing the determinism requirements for replay. We propose LightPlay that fulfills stricter determinism requirements with low overhead without requiring hardware or OS support. LightPlay guarantees that the memory state after each instruction instance in a replay run is the same as in original run. It reduces logging overhead using a lightweight thread local technique that avoids synchronization between threads during the recording run. GPUs are used to efficiently identify the memory ordering constraints that produce the same memory states before the replay run. LightPlay incurs low space overhead for logging as it only stores the part of log where data races occur. During the logging run LightPlay is 20x–100x faster than logging the total order and requires only 1 % space overhead.


Memory Access Time Slice Memory Instruction Data Race Runtime Overhead 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Altekar, G., Stoica, I.: Odr: output-deterministic replay for multicore debugging. In: SOSP, pp: 193–206 (2009)Google Scholar
  2. 2.
    Bhansali, S., Chen, W.-K., de Jong, S., Edwards, A., Murray, R., Drinić, M., Mihočka, D., Chau, J.: Framework for instruction-level tracing and analysis of program executions. In: VEE, pp. 154–163 (2006)Google Scholar
  3. 3.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: PACT (2008)Google Scholar
  4. 4.
    Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)CrossRefGoogle Scholar
  5. 5.
    Dunlap, G,W., Lucchetti, D.G., Fetterman, M.A., Chen P.M.: Execution replay of multiprocessor virtual machines. In: VEE (2008)Google Scholar
  6. 6.
    Hower, D.R., Hill, M.D.: Rerun: exploiting episodes for lightweight memory race recording. In: ISCA, pp. 265–276 (2008)Google Scholar
  7. 7.
    Huang, J., Liu, P., Zhang, C.: Leap: lightweight deterministic multi-processor replay of concurrent java programs. In: FSE, pp. 207–216 (2010)Google Scholar
  8. 8.
    King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: USENIX (2005)Google Scholar
  9. 9.
    LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Trans. Comput. 36(4), 471–482 (1987)CrossRefGoogle Scholar
  10. 10.
    Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Offline symbolic analysis to infer total store order. In: HPCA. IEEE (2011)Google Scholar
  11. 11.
    Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Pereira. Offline symbolic analysis for multi-processor execution replay. In: MICRO, pp. 564–575 (2009)Google Scholar
  12. 12.
    Lee, D., Wester, B., Veeraraghavan, K., Narayanasamy, S., Chen, P.M., Flinn, J.: Respec: efficient online multiprocessor replayvia speculation and external determinism. In: ASPLOS, pp. 77–90 (2010)Google Scholar
  13. 13.
    Montesinos, P., Ceze, L., Torrellas, J.: Delorean: recording and deterministically replaying shared-memory multiprocessor execution efficiently. In: ISCA, pp. 289–300 (2008)Google Scholar
  14. 14.
    Nagarajan, V., Gupta, R.: Ecmon: exposing cache events for monitoring. In: ISCA, pp. 34–360 (2009)Google Scholar
  15. 15.
    Narayanasamy, S., Pereira, C., Calder, B.: Recording shared memory dependencies using strata. In: ASPLOS, pp. 229–240 (2006)Google Scholar
  16. 16.
    Park, S., Zhou, Y., Xiong, W., Yin, Z., Kaushik, R., Lee, K.H., Lu, S.: Pres: probabilistic replay with execution sketching on multiprocessors. In: SOSP, pp. 177–192 (2009)Google Scholar
  17. 17.
    Srinivasan, S.M., Kandula, S., Andrews, C.R., Zhou, Y.: Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In: USENIX (2004)Google Scholar
  18. 18.
    Tucek, J., Lu, S., Huang, C., Xanthos, S., Zhou, Y.: Triage: diagnosing production run failures at the user’s site. In: SOSP (2007)Google Scholar
  19. 19.
    Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. In: ASPLOS, pp. 15–26 (2011)Google Scholar
  20. 20.
    Vlachos, E., Goodstein, M.L., Kozuch, M.A., Chen, S., Falsafi, B., Gibbons, P.B., Mowry, T.C.: Paralog: enabling and accelerating online parallel monitoring of multithreaded applications. In: ASPLOS, pp. 271–284 (2010)Google Scholar
  21. 21.
    Weeratunge, D., Zhang, X., Jagannathan, S.: Analyzing multicore dumps to facilitate concurrency bug reproduction. In: ASPLOS (2010)Google Scholar
  22. 22.
    Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder" for enabling full-system multiprocessor deterministic replay. In: ISCA, pp. 122–135 (2003)Google Scholar
  23. 23.
    Zamfir, C., Candea, G.: Execution synthesis: a technique for automated software debugging. In: EuroSys, pp. 321–334 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.NEC Laboratories AmericaPrincetonUSA
  2. 2.University of CaliforniaRiversideUSA

Personalised recommendations