Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors
An important issue of current multi-core processors is the off-chip bandwidth sharing. Sharing is helpful to improve resource utilization and but more importantly and it may cause performance degradation due to contention. However and there is not enough research work on characterizing the workloads from bandwidth perspective. Moreover and the understanding of the impact of the bandwidth constraint on performance is still limited. In this paper and we propose the phase execution model and and evaluate the arithmetic to memory ratio (AMR) of each phase to characterize the bandwidth requirements of arbitrary programs. We apply the model to a set of SPEC benchmark programs and obtain two results. First and we propose a new taxonomy of workloads based on their bandwidth requirements. Second and we find that prefetching techniques are useful to improve system throughput of multi-core processors only when there is enough spare memory bandwidth.
Keywordsmulti-core architecture phase model memory bandwidth
Unable to display preview. Download preview PDF.
- 1.Uhlig, R., Mudge, T.: Trace-driven memory simulation: A suvey. ACM Computing Surveys 29(2) (June 1997)Google Scholar
- 2.Tan, G.M., Fan, D.R., Zhang, J.C., Russo, A., Gao, G.R.: Experience on optimizing irregular computation for memory hierarchy in manycore architecture. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2008)Google Scholar
- 3.Yuan, N., Yu, L., Fan, D.: An efficient and flexible task management for many-core architecture. In: Proceedings of Workshop on Software and Hardware Challenges of Manycore Platforms, In conjunction with the 35th International Symposium on Computer Architecture (June 2008)Google Scholar
- 4.Long, G.P., Fan, D.R., Zhang, J.C.: Architectural support for cilk computations on many-core architectures. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (February 2009)Google Scholar
- 5.Hu, W.W., Zhang, F.X., Li, Z.S.: Microarchitecture and performance of godson-2 processor. Journal of Computer Science and Technology 20(2) (2005)Google Scholar
- 6.Rob, A.P., Mandal, F.A., Lim, M.Y.: Empirical evaluation of multi-core memory concurrency initial version (January 2009)Google Scholar
- 7.Weidendorfer, J.: Understanding memory access bottlenecks on multicore (2007)Google Scholar
- 8.Ahsan, B., Zahran, M.: Cache performance, system performance, and off-chip bandwidth... pick any two. In: Proceedings of INA-OCMC (2009)Google Scholar
- 9.Long, G.P., Fan, D.R., Zhang, J.C., Song, F.L., Yuan, N., Lin, W.: A performance model of dense matrix operations on many-core architectures. In: Proceedings of European Conference on Parallel and Distributed Computing (August 2008)Google Scholar
- 10.Tan, G.M., Sun, N.H., Gao, G.R.: A parallel dynamic programming algorithm on a multi-core architecture. In: Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (2007)Google Scholar
- 11.Chou, Y.: Low-cost epoch-based correlation prefetching for commercial applications. In: Proceedings of International Symposium on Microarchitecture (2007)Google Scholar