International Journal of Parallel Programming

, Volume 41, Issue 2, pp 212–235 | Cite as

A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation

  • Oscar Almer
  • Igor Böhm
  • Tobias Edler von Koch
  • Björn Franke
  • Stephen Kyle
  • Volker SeekerEmail author
  • Christopher Thompson
  • Nigel Topham


In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art instruction set simulators (Iss) for single-core machines reach or exceed the performance levels of speed-optimised silicon implementations of embedded processors, the same does not hold for multi-core simulators where large performance penalties are to be paid. In this paper we develop a fast and scalable simulation methodology for multi-core platforms based on parallel and just-in-time (Jit) dynamic binary translation (Dbt). Our approach can model large-scale multi-core configurations, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded multi-core platform implementing the ARCompact instruction set architecture (Isa). We have evaluated our parallel simulation methodology against the industry standard Splash-2 and Eembc MultiBench benchmarks and demonstrate simulation speeds up to 25,307 Mips on a 32-core x86 host machine for as many as 2,048 target processors whilst exhibiting minimal and near constant overhead, including memory considerations.


Instruction set simulators Just-in-time compilation Multicore processors Parallel dynamic binary translation Scalable multicore simulation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argollo E., Falcón A., Faraboschi P., Monchiero M., Ortega D.: COTSon: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43, 52–61 (2009). doi: 10.1145/1496909.1496921 CrossRefGoogle Scholar
  2. 2.
    August D., Chang J., Girbal S., Gracia-Perez D., Mouchard G., Penry D.A., Temam O., Vachharajani N.: Unisim: an open simulation environment and library for complex architecture design and collaborative development. IEEE Comput. Archit. Lett. 6, 45–48 (2007). doi: 10.1109/L-CA.2007.12 CrossRefGoogle Scholar
  3. 3.
    Austin T., Larson E., Ernst D.: SimpleScalar: an infrastructure for computer system modeling. Computer 35, 59–67 (2002). doi: 10.1109/2.982917 CrossRefGoogle Scholar
  4. 4.
    Aycock J.: A brief history of just-in-time. ACM Comput. Surv. 35, 97–113 (2003)CrossRefGoogle Scholar
  5. 5.
    Bellard, F.: QEMU, a fast and portable dynamic translator. In: Proceedings of the 2005 USENIX Annual Technical Conference, ATEC ’05, pp. 41–41. USENIX Association, Berkeley, CA, USA (2005)Google Scholar
  6. 6.
    Böhm, I., Franke, B., Topham, N.P.: Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In: Kurdahi, F.J., Takala J. (eds.) ICSAMOS, pp. 1–10. IEEE (2010)Google Scholar
  7. 7.
    Böhm, I., Edler von Koch, T.J., Kyle, S., Franke, B., Topham, N.: Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), ACM (2011)Google Scholar
  8. 8.
    Chen J., Annavaram M., Dubois M.: SlackSim: a platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20–29 (2009). doi: 10.1145/1577129.1577134 CrossRefGoogle Scholar
  9. 9.
    Chidester M., George A.: Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. 12, 176–200 (2002). doi: 10.1145/643114.643116 CrossRefGoogle Scholar
  10. 10.
    Chiou D., Angepat H., Patil N., Sunwoo D.: Accurate functional-first multicore simulators. IEEE Comput. Archit. Lett. 8, 64–67 (2009). doi: 10.1109/L-CA.2009.44 CrossRefGoogle Scholar
  11. 11.
    Chiou, D., Sunwoo, D., Angepat, H., Kim, J., Patil, N., Reinhart, W., Johnson, D.: Parallelizing computer system simulators. In: Parallel and Distributed Processing, 2008, IPDPS 2008. IEEE International Symposium on, pp. 1–5 (2008). doi: 10.1109/IPDPS.2008.4536407
  12. 12.
    Chung E.S., Nurvitadhi E., Hoe J.C., Falsafi B., Mai K.: PROToFLEX: FPGA-accelerated hybrid functional simulator. Parallel Distrib. Process. Symp. Int. 0, 326 (2007). doi: 10.1109/IPDPS.2007.370516 Google Scholar
  13. 13.
    Chung E.S., Papamichael M.K., Nurvitadhi E., Hoe J.C., Mai K., Falsafi B.: ProtoFlex: towards scalable, full-system multiprocessor simulations using FPGAs. ACM Trans. Reconfigurable Technol. Syst. 2, 15–11532 (2009). doi: 10.1145/1534916.1534925 CrossRefGoogle Scholar
  14. 14.
    Covington, R., Dwarkada, S., Jump, J.R., Sinclair, J.B., Madala, S.: The efficient simulation of parallel computer systems. Int. J. Comput. Simul. 1(1), 31–58 (1991)Google Scholar
  15. 15.
    EnCore embedded processor. URL:
  16. 16.
    Hardavellas N., Somogyi S., Wenisch T.F., Wunderlich R.E., Chen S., Kim J., Falsafi B., Hoe J.C., Nowatzyk A.G.: SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Perform. Eval. Rev. 31, 31–34 (2004). doi: 10.1145/1054907.1054914 CrossRefGoogle Scholar
  17. 17.
    Hughes, C., Pai, V., Ranganathan, P., Adve, S.: RSIM: simulating shared-memory multiprocessors with ILP processors. Computer (2002)Google Scholar
  18. 18.
    Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP: a multi-core simulation methodology. In: Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS 2006), Boston, Massachusetts (2006)Google Scholar
  19. 19.
    Lantz, R.: Parallel SimOS: scalability and performance for large system simulation (2007).
  20. 20.
    Lantz, R.: Fast functional simulation with parallel Embra. In: Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation (2008)Google Scholar
  21. 21.
    Magnusson P.S., Christensson M., Eskilson J., Forsgren D., Hållberg G., Högberg J., Larsson F., Moestedt A., Werner B.: Simics: a full system simulation platform. Computer 35, 50–58 (2002). doi: 10.1109/2.982916 CrossRefGoogle Scholar
  22. 22.
    Martin M.M.K., Sorin D.J., Beckmann B.M., Marty M.R., Xu M., Alameldeen A.R., Moore K.E., Hill M.D., Wood D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 92–99 (2005). doi: 10.1145/1105734.1105747 CrossRefGoogle Scholar
  23. 23.
    Miller, J.E.M., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. In: The 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2010)Google Scholar
  24. 24.
    Monchiero M., Ahn J.H., Falcón A., Ortega D., Faraboschi P.: How to simulate 1000 cores. SIGARCH Comput. Archit. News 37, 10–19 (2009). doi: 10.1145/1577129.1577133 CrossRefGoogle Scholar
  25. 25.
    Mukherjee S.S., Reinhardt S.K., Falsafi B., Litzkow M., Hill M.D., Wood D.A., Huss-Lederman S., Larus J.R.: Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. IEEE Concurr. 8, 12–20 (2000). doi: 10.1109/4434.895100 Google Scholar
  26. 26.
    PCSX2. URL:
  27. 27.
    Penry, D.A., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D.I., Connors, D.: Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In: in Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture, pp. 29–40 (2006)Google Scholar
  28. 28.
    Reinhardt, S.K., Hill, M.D., Larus, J.R., Lebeck, A.R., Lewis, J.C., Wood, D.A.: The wisconsin wind tunnel: virtual prototyping of parallel computers. In: Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’93, pp. 48–60. ACM, New York, NY, USA (1993). doi: 10.1145/166955.166979
  29. 29.
    Sui, X., Wu, J., Yin, W., Zhou, D., Gong, Z.: MALsim: a functional-level parallel simulation platform for CMPs. In: 2nd International Conference on Computer Engineering and Technology (ICCET) 2010, vol. 2, p. V2, IEEE (2010)Google Scholar
  30. 30.
    Synopsys Inc.: ARCompact instruction set architecture. URL:
  31. 31.
    Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., Asanović K.: RAMP gold: an FPGA-based architecture simulator for multiprocessors. In: Proceedings of the 47th Design Automation Conference, DAC ’10, pp. 463–468. ACM, New York, NY, USA (2010). doi: 10.1145/1837274.1837390
  32. 32.
    Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., Patterson, D.: A case for FAME: FPGA architecture model execution. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp. 290–301. ACM, New York, NY, USA (2010). doi: 10.1145/1815961.1815999
  33. 33.
    The Embedded Microprocessor Benchmark Consortium: MultiBench 1.0 Multicore Benchmark Software (02 February 2010)Google Scholar
  34. 34.
    Wang K., Zhang Y., Wang H., Shen X.: Parallelization of IBM mambo system simulator in functional modes. ACM SIGOPS Oper. Syst. Rev. 42(1), 71–76 (2008)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Wawrzynek J., Patterson D., Oskin M., Lu S.L., Kozyrakis C., Hoe J.C., Chiou D., Asanovic K.: RAMP: research accelerator for multiple processors. IEEE Micro 27, 46–57 (2007). doi: 10.1109/MM.2007.39 CrossRefGoogle Scholar
  36. 36.
    Wentzlaff, D., Agarwal, A.: Constructing virtual architectures on a tiled processor. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’06, pp. 173–184. IEEE Computer Society, Washington, DC, USA (2006). doi: 10.1109/CGO.2006.11
  37. 37.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, pp. 24–36. ACM, New York, NY, USA (1995). doi: 10.1145/223982.223990
  38. 38.
    Zheng, G., Kakulapati, G., Kalé, L.V.: BigSim: a parallel simulator for performance prediction of extremely large parallel machines. In: Parallel and Distributed Processing Symposium, International, vol. 1, p. 78b (2004). doi: 10.1109/IPDPS.2004.1303013
  39. 39.
    Zhong R., Zhu Y., Chen W., Lin M., Wong W.F.: An inter-core communication enabled multi-core simulator based on simplescalar. Advanced Information Networking and Applications Workshops, International Conference 1, 758–763 (2007). doi: 10.1109/AINAW.2007.87 Google Scholar
  40. 40.
    Zhu, X., Malik, S.: Using a communication architecture specification in an application-driven retargetable prototyping platform for multiprocessing. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’04, vol. 2, pp. 21–244. IEEE Computer Society, Washington, DC, USA (2004)Google Scholar
  41. 41.
    Zhu, X., Wu, J., Sui, X., Yin, W., Wang, Q., Gong, Z.: PCAsim: a parallel cycle accurate simulation platform for CMPs. In: Proceedings of the 2010 International Conference on Computer Design and Applications (ICCDA), pp. V1-597–V1-601 (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Oscar Almer
    • 1
  • Igor Böhm
    • 1
  • Tobias Edler von Koch
    • 1
  • Björn Franke
    • 1
  • Stephen Kyle
    • 1
  • Volker Seeker
    • 1
    Email author
  • Christopher Thompson
    • 1
  • Nigel Topham
    • 1
  1. 1.Institute for Computing Systems ArchitectureUniversity of EdinburghEdinburghUK

Personalised recommendations