Node Performance and Energy Analysis with the Sniper Multi-core Simulator

  • Trevor E. CarlsonEmail author
  • Wim Heirman
  • Kenzo Van Craeynest
  • Lieven Eeckhout
Conference paper


Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a limited simulation time budget. By bringing together accurate high-abstraction analytical models with fast parallel simulation, architects can trade off accuracy with simulation speed to allow for longer application runs, covering a larger portion of the hardware design space. Sniper provides this balance allowing long-running simulations to be modeled much faster than with detailed cycle-accurate simulation, while still providing the detail necessary to observe core-uncore interactions across the entire system. With per-function advanced visualization and coupled power and energy simulations, the Sniper multi-core simulator can provide a fast and accurate way both to understand and optimize software for current and future hardware systems.


Cache Coherence Simulation Speed Interval Simulation Barrier Synchronization Branch Predictor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Mathijs Rogiers for his invaluable work on the visualization features of Sniper and the anonymous reviewers for their valuable feedback. This work is supported by Intel and the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT). Additional support is provided by the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007–2013) / ERC Grant agreement no. 259295. Experiments were run on computing infrastructure at the ExaScience Lab, Leuven, Belgium; the Intel HPC Lab, Swindon, UK; and the VSC Flemish Supercomputer Center.


  1. 1.
    Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for Full System Simulation. ACM SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)CrossRefGoogle Scholar
  2. 2.
    Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Seattle, pp. 52:1–52:12 (Nov 2011)Google Scholar
  3. 3.
    Chen, J., Dabbiru, L.K., Wong, D., Annavaram, M., Dubois, M.: Adaptive and speculative slack simulations of CMPs on CMPs. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Atlanta, pp. 523–534 (Dec 2010)Google Scholar
  4. 4.
    Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. (TOCS) 27(2), 42–53 (2009)Google Scholar
  5. 5.
    Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.: A top-down approach to architecting CPI component performance counters. Micro, IEEE 27(1), 84–93 (2007)CrossRefGoogle Scholar
  6. 6.
    Eyerman, S., Smith, J., Eeckhout, L.: Characterizing the branch misprediction penalty. In: Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, pp. 48–58 (Apr 2006)Google Scholar
  7. 7.
    Fujimoto, R.M.: Parallel discrete event simulation. Commun. ACM 33(10), 30–53 (1990)CrossRefGoogle Scholar
  8. 8.
    Genbrugge, D., Eyerman, S., Eeckhout, L.: Interval simulation: raising the level of abstraction in architectural simulation. In: Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, pp. 307–318 (Feb 2010)Google Scholar
  9. 9.
    Heirman, W., Sarkar, S., Carlson, T.E., Hur, I., Eeckhout, L.: Power-aware multi-core simulation for early design stage hardware/software co-optimization. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, pp. 3–12 (Sept 2012)Google Scholar
  10. 10.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Chicago, pp. 190–200 (June 2005)Google Scholar
  11. 11.
    Miller, J.E., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. In: Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, pp. 1–12 (Jan 2010)Google Scholar
  12. 12.
    Patil, H., Pereira, C., Stallcup, M., Lueck, G., Cownie, J.: PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Toronto, pp. 2–11 (Apr 2010)Google Scholar
  13. 13.
    Reinhardt, S.K., Hill, M.D., Larus, J.R., Lebeck, A.R., Lewis, J.C., Wood, D.A.: The Wisconsin wind tunnel: virtual prototyping of parallel computers. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Santa Clara, pp. 48–60 (May 1993)Google Scholar
  14. 14.
    Uzelac, V., Milenkovic, A.: Experiment flows and microbenchmarks for reverse engineering of branch predictor structures. In: Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, pp. 207–217 (Apr 2009)Google Scholar
  15. 15.
    Williams, S., Waterman, A., Patterson, D.A.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (Apr 2009)CrossRefGoogle Scholar
  16. 16.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22th International Symposium on Computer Architecture (ISCA), Portofino, pp. 24–36 (June 1995)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Trevor E. Carlson
    • 1
    Email author
  • Wim Heirman
    • 2
  • Kenzo Van Craeynest
    • 1
  • Lieven Eeckhout
    • 1
  1. 1.Ghent UniversityGentBelgium
  2. 2.Intel, ExaScience LabLeuvenBelgium

Personalised recommendations