LightPlay: Efficient Replay with GPUs
- 751 Downloads
Abstract
Previous deterministic replay systems reduce the runtime overhead by either relying on hardware support or by relaxing the determinism requirements for replay. We propose LightPlay that fulfills stricter determinism requirements with low overhead without requiring hardware or OS support. LightPlay guarantees that the memory state after each instruction instance in a replay run is the same as in original run. It reduces logging overhead using a lightweight thread local technique that avoids synchronization between threads during the recording run. GPUs are used to efficiently identify the memory ordering constraints that produce the same memory states before the replay run. LightPlay incurs low space overhead for logging as it only stores the part of log where data races occur. During the logging run LightPlay is 20x–100x faster than logging the total order and requires only 1 % space overhead.
Keywords
Memory Access Time Slice Memory Instruction Data Race Runtime OverheadReferences
- 1.Altekar, G., Stoica, I.: Odr: output-deterministic replay for multicore debugging. In: SOSP, pp: 193–206 (2009)Google Scholar
- 2.Bhansali, S., Chen, W.-K., de Jong, S., Edwards, A., Murray, R., Drinić, M., Mihočka, D., Chau, J.: Framework for instruction-level tracing and analysis of program executions. In: VEE, pp. 154–163 (2006)Google Scholar
- 3.Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: PACT (2008)Google Scholar
- 4.Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14(1), 80–107 (1996)CrossRefGoogle Scholar
- 5.Dunlap, G,W., Lucchetti, D.G., Fetterman, M.A., Chen P.M.: Execution replay of multiprocessor virtual machines. In: VEE (2008)Google Scholar
- 6.Hower, D.R., Hill, M.D.: Rerun: exploiting episodes for lightweight memory race recording. In: ISCA, pp. 265–276 (2008)Google Scholar
- 7.Huang, J., Liu, P., Zhang, C.: Leap: lightweight deterministic multi-processor replay of concurrent java programs. In: FSE, pp. 207–216 (2010)Google Scholar
- 8.King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: USENIX (2005)Google Scholar
- 9.LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Trans. Comput. 36(4), 471–482 (1987)CrossRefGoogle Scholar
- 10.Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Offline symbolic analysis to infer total store order. In: HPCA. IEEE (2011)Google Scholar
- 11.Lee, D., Said, M., Narayanasamy, S., Yang, Z.: Pereira. Offline symbolic analysis for multi-processor execution replay. In: MICRO, pp. 564–575 (2009)Google Scholar
- 12.Lee, D., Wester, B., Veeraraghavan, K., Narayanasamy, S., Chen, P.M., Flinn, J.: Respec: efficient online multiprocessor replayvia speculation and external determinism. In: ASPLOS, pp. 77–90 (2010)Google Scholar
- 13.Montesinos, P., Ceze, L., Torrellas, J.: Delorean: recording and deterministically replaying shared-memory multiprocessor execution efficiently. In: ISCA, pp. 289–300 (2008)Google Scholar
- 14.Nagarajan, V., Gupta, R.: Ecmon: exposing cache events for monitoring. In: ISCA, pp. 34–360 (2009)Google Scholar
- 15.Narayanasamy, S., Pereira, C., Calder, B.: Recording shared memory dependencies using strata. In: ASPLOS, pp. 229–240 (2006)Google Scholar
- 16.Park, S., Zhou, Y., Xiong, W., Yin, Z., Kaushik, R., Lee, K.H., Lu, S.: Pres: probabilistic replay with execution sketching on multiprocessors. In: SOSP, pp. 177–192 (2009)Google Scholar
- 17.Srinivasan, S.M., Kandula, S., Andrews, C.R., Zhou, Y.: Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In: USENIX (2004)Google Scholar
- 18.Tucek, J., Lu, S., Huang, C., Xanthos, S., Zhou, Y.: Triage: diagnosing production run failures at the user’s site. In: SOSP (2007)Google Scholar
- 19.Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. In: ASPLOS, pp. 15–26 (2011)Google Scholar
- 20.Vlachos, E., Goodstein, M.L., Kozuch, M.A., Chen, S., Falsafi, B., Gibbons, P.B., Mowry, T.C.: Paralog: enabling and accelerating online parallel monitoring of multithreaded applications. In: ASPLOS, pp. 271–284 (2010)Google Scholar
- 21.Weeratunge, D., Zhang, X., Jagannathan, S.: Analyzing multicore dumps to facilitate concurrency bug reproduction. In: ASPLOS (2010)Google Scholar
- 22.Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder" for enabling full-system multiprocessor deterministic replay. In: ISCA, pp. 122–135 (2003)Google Scholar
- 23.Zamfir, C., Candea, G.: Execution synthesis: a technique for automated software debugging. In: EuroSys, pp. 321–334 (2010)Google Scholar