International Journal of Parallel Programming

, Volume 43, Issue 1, pp 130–157 | Cite as

BADCO: Behavioral Application-Dependent Superscalar Core Models

  • Ricardo A. Velásquez
  • Pierre Michaud
  • André Seznec
Article
  • 134 Downloads

Abstract

Microarchitecture research and development rely heavily on simulators. The ideal simulator should be simple and easy to develop, it should be precise, accurate and very fast. But the ideal simulator does not exist, and microarchitects use different sorts of simulators at different stages of the development of a processor, depending on which is most important, accuracy or simulation speed. Approximate microarchitecture models, which trade accuracy for simulation speed, are very useful for research and design space exploration, provided the loss of accuracy remains acceptable. Behavioral superscalar core modeling is a possible way to trade accuracy for simulation speed in situations where the focus of the study is not the core itself. In this approach, a superscalar core is viewed as a black box emitting requests to the uncore at certain times. A behavioral core model can be connected to a detailed uncore model. Behavioral core models are built from detailed simulations. Once the time to build the model is amortized, important simulation speedups can be obtained. We describe and study a new method for defining behavioral models for modern superscalar cores. The proposed Behavioral Application-Dependent Superscalar Core model, BADCO, predicts the execution time of a thread running on a superscalar core with an error less than 10 % in most cases. We show that BADCO is qualitatively accurate, being able to predict how performance changes when we change the uncore. The simulation speedups we obtained are typically between one and two orders of magnitude.

Keywords

Multicore simulation Approximate simulation Behavioral core models Performance evaluation 

References

  1. 1.
    Austin, T., Larson, E., Ernst, D.: SimpleScalar: an infrastructure for computer system modeling. IEEE Comput. 35(2), 59–67 (2002). http://www.simplescalar.com/ Google Scholar
  2. 2.
    Chen, X.E., Aamodt, T.M.: Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs. In: Proceedings of the 41st International Symposium on Microarchitecture (2008)Google Scholar
  3. 3.
    Cho, S., Demetriades, S., Evans, S., Jin, L., Lee, H., Lee, K., Moeng, M.: TPTS : a novel framework for very fast manycore processor architecture simulation. In: Proceedings of the 37th International Conference on Parallel Processing (2008)Google Scholar
  4. 4.
    Durbhakula, M., Pai, V.S., Adve, S.: Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors. In: Proceedings of the 5th International Symposium on High-Performance Computer Architecture (1999)Google Scholar
  5. 5.
    Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2) (2009)Google Scholar
  6. 6.
    Eyerman, S., Smith, J.E., Eeckhout, L.: Characterizing the branch misprediction penalty. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011)Google Scholar
  7. 7.
    Fields, B.A., Bodik, R., Hill, M.D., Newburn, C.J.: Using interaction costs for microarchitectural bottleneck analysis. In: Proceedings of the 36th International Symposium on Microarchitecture (2003)Google Scholar
  8. 8.
    Fields, B., Rubin, S., Bodik, R.: Focusing processor policies via critical-path prediction. In: Proceedings of the 28th International Symposium on Computer Architecture (2001)Google Scholar
  9. 9.
    Genbrugge, D., Eyerman, S., Eeckhout, L.: Interval simulation : raising the level of abstraction in architectural simulation. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture (2010)Google Scholar
  10. 10.
    Goldschmidt, S.R., Hennessy, J.L.: The accuracy of trace-driven simulations of multiprocessors. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (1993)Google Scholar
  11. 11.
    \({\ddot{\rm I}}\)pek, E., McKee, S., Caruana, R., de Supinski, B., Schulz, M.: Efficiently Exploring Architectural Design Spaces Via Predictive Modeling, vol. 40. ACM (2006)Google Scholar
  12. 12.
    Joseph, P., Vaswani, K., Thazhuthaveetil, M.: Construction and use of linear regression models for processor performance analysis. In: High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, pp. 99–108. IEEE (2006)Google Scholar
  13. 13.
    Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP : a multi-core simulation methodology. In: Workshop on Modeling, Benchmarking and Simulation (2006)Google Scholar
  14. 14.
    Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: Proceedings of the 31st International Symposium on Computer Architecture (2004)Google Scholar
  15. 15.
    Lee, K., Cho, S.: In-N-Out : reproducing out-of-order superscalar processor behavior from reduced in-order traces. In: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2011)Google Scholar
  16. 16.
    Lee, K., Evans, S., Cho, S.: Accurately approximating superscalar processor performance from traces. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009)Google Scholar
  17. 17.
    Li, Y., Lee, B., Brooks, D., Hu, Z., Skadron, K.: CMP design space exploration subject to physical constraints. In: Proceedings of the 12th International Symposium on High Performance Computer Architecture (2006)Google Scholar
  18. 18.
    Loh, G., Subramaniam, S., Xie, Y.: Zesto : a cycle-level simulator for highly detailed microarchitecture exploration. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009)Google Scholar
  19. 19.
    Loh, G.: A time-stamping algorithm for efficient performance estimation of superscalar processors. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2001)Google Scholar
  20. 20.
    Moses, J., Illikkal, R., Iyer, R., Huggahalli, R., Newell, D.: ASPEN : towards effective simulation of threads & engines in evolving platforms. In: Proceedings of the 12th IEEE / ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (2004)Google Scholar
  21. 21.
    Mutlu, O., Kim, H., Armstrong, D., Patt, Y.: Understanding the effects of wrong-path memory references on processor performance. In: Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture, pp. 56–64. ACM (2004)Google Scholar
  22. 22.
    Noonburg, D.B., Shen, J.P.: Theoretical modeling of superscalar processor performance. In: Proceedings of the 27th International Symposium on Microarchitecture (1994)Google Scholar
  23. 23.
    Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., Valero, M.: Trace-driven simulation of multithreaded applications. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011)Google Scholar
  24. 24.
    Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation on x86 hardware. IEEE Micro 30(6), 46–56 (2010)CrossRefGoogle Scholar
  25. 25.
    Sendag, R., Yilmazer, A., Yi, J., Uht, A.: Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems. In: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pp. 10-pp. IEEE (2006)Google Scholar
  26. 26.
    Sorin, D.J., Pai, V.S., Adve, S.V., Vernon, M.K., Wood, D.A.: Analytic evaluation of shared-memory systems with ILP processors. In: Proceedings of the 25th International Symposium on Computer Architecture (1998)Google Scholar
  27. 27.
    Zhao, L., Iyer, R., Moses, J., Illikkal, R., Makineni, S., Newell, D.: Exploring large-scale CMP architectures using ManySim. IEEE Micro 27(4), 21–33 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Ricardo A. Velásquez
    • 1
  • Pierre Michaud
    • 2
  • André Seznec
    • 2
  1. 1.Instituto Tecnológico MetropolitanoMedellínColombia
  2. 2.INRIA/IRISARennesFrance

Personalised recommendations