Advertisement

Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors

  • Guoping Long
  • Dongrui Fan
  • Junchao Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5704)

Abstract

An important issue of current multi-core processors is the off-chip bandwidth sharing. Sharing is helpful to improve resource utilization and but more importantly and it may cause performance degradation due to contention. However and there is not enough research work on characterizing the workloads from bandwidth perspective. Moreover and the understanding of the impact of the bandwidth constraint on performance is still limited. In this paper and we propose the phase execution model and and evaluate the arithmetic to memory ratio (AMR) of each phase to characterize the bandwidth requirements of arbitrary programs. We apply the model to a set of SPEC benchmark programs and obtain two results. First and we propose a new taxonomy of workloads based on their bandwidth requirements. Second and we find that prefetching techniques are useful to improve system throughput of multi-core processors only when there is enough spare memory bandwidth.

Keywords

multi-core architecture phase model memory bandwidth 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Uhlig, R., Mudge, T.: Trace-driven memory simulation: A suvey. ACM Computing Surveys 29(2) (June 1997)Google Scholar
  2. 2.
    Tan, G.M., Fan, D.R., Zhang, J.C., Russo, A., Gao, G.R.: Experience on optimizing irregular computation for memory hierarchy in manycore architecture. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2008)Google Scholar
  3. 3.
    Yuan, N., Yu, L., Fan, D.: An efficient and flexible task management for many-core architecture. In: Proceedings of Workshop on Software and Hardware Challenges of Manycore Platforms, In conjunction with the 35th International Symposium on Computer Architecture (June 2008)Google Scholar
  4. 4.
    Long, G.P., Fan, D.R., Zhang, J.C.: Architectural support for cilk computations on many-core architectures. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2009)Google Scholar
  5. 5.
    Hu, W.W., Zhang, F.X., Li, Z.S.: Microarchitecture and performance of godson-2 processor. Journal of Computer Science and Technology 20(2) (2005)Google Scholar
  6. 6.
    Rob, A.P., Mandal, F.A., Lim, M.Y.: Empirical evaluation of multi-core memory concurrency initial version (January 2009)Google Scholar
  7. 7.
    Weidendorfer, J.: Understanding memory access bottlenecks on multicore (2007)Google Scholar
  8. 8.
    Ahsan, B., Zahran, M.: Cache performance, system performance, and off-chip bandwidth... pick any two. In: Proceedings of INA-OCMC (2009)Google Scholar
  9. 9.
    Long, G.P., Fan, D.R., Zhang, J.C., Song, F.L., Yuan, N., Lin, W.: A performance model of dense matrix operations on many-core architectures. In: Proceedings of European Conference on Parallel and Distributed Computing (August 2008)Google Scholar
  10. 10.
    Tan, G.M., Sun, N.H., Gao, G.R.: A parallel dynamic programming algorithm on a multi-core architecture. In: Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (2007)Google Scholar
  11. 11.
    Chou, Y.: Low-cost epoch-based correlation prefetching for commercial applications. In: Proceedings of International Symposium on Microarchitecture (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Guoping Long
    • 1
  • Dongrui Fan
    • 1
  • Junchao Zhang
    • 1
  1. 1.Key Laboratory of Computer Systems and Architecture and Institute of Computing TechnologyChinese Academy of ScienceBeijingChina

Personalised recommendations