International Journal of Parallel Programming

, Volume 36, Issue 3, pp 347–360

Performance Advantage of Reconfigurable Cache Design on Multicore Processor Systems

  • Jie Tao
  • Marcel Kunze
  • Fabian Nowak
  • Rainer Buchty
  • Wolfgang Karl
Article
  • 136 Downloads

Abstract

With the trends of microprocessor design towards multicore, cache performance becomes more important because an off-chip access would be increasingly expensive due to the competition across the processor cores. A question arises: How to design the cache architecture to prevent a performance bottleneck caused by data accesses? This work studies a reconfigurable cache architecture that can be dynamically configured for meeting the individual demand of running applications. Using a self-developed cache simulator, we first examined how different cache organization and configuration influence the parallel execution of OpenMP applications. The experimental results show that applications benefit from a flexible cache with reconfigurability. This motivated us to go a step further and develop a hardware prototype of this novel architecture.

Keywords

Cache performance Multicore processor Simulation Reconfigurable architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chandra, R. et al.: Parallel Programming in OpenMP. Number 978-1-55860-671-5 in ISBN. Morgan Kaufmann (2000)Google Scholar
  2. 2.
    Pacheco, P.: Parallel Programming with MPI. Number 978-1-55860-339-4 in ISBN. Morgan Kaufmann (1996)Google Scholar
  3. 3.
    Fung, S.: Improving Cache Locality for Thread-Level Speculation. Master’s thesis, University of Toronto (2005)Google Scholar
  4. 4.
    Wang, Z., Sha, E., Hu, X.: Combined partitioning and data padding for scheduling multiple loop nests. In: Proceedings of the 2001 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 67–75 (2001)Google Scholar
  5. 5.
    Somnath G., Margaret M. and Sharad M. (1998). Precise miss analysis for program transformations with caches of arbitrary associativity. ACM SIG-PLAN Notices 33(11): 228–239 CrossRefGoogle Scholar
  6. 6.
    Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the last line of defense before hitting the memory wall for CMPs. In: Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA’04), pp. 176–185, Madrid, Spain, February 2004Google Scholar
  7. 7.
    Molnos, A.M., Cotofana, S.D., Heijligers, M.J.M., van Eijndhoven, J.T.J.: Static cache partitioning robustness analysis for embedded on-chip multi-processors. In: Proceedings of the 3rd Conference on Computing Frontiers (CF’06), pp. 353–360, Ischia, Italy, May 2006Google Scholar
  8. 8.
    Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: Proceedings of the 3rd Conference on Computing frontiers (CF’06), pp. 361–372, Ischia, Italy, May 2006Google Scholar
  9. 9.
    Carvalho, M.B., Goes, L., Martins, C.: Dynamically reconfigurable cache architecture using adaptive block allocation policy. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS), April 2006Google Scholar
  10. 10.
    Gibson, J., Kunz, R., Ofelt, D., Horowitz, M., Hennessy, J., Heinrich, M.: FLASH vs. (simulated) FLASH: closing the simulation loop. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 49–58, November 2000Google Scholar
  11. 11.
    Herrod, S.A.: Using Complete Machine Simulation to Understand Computer System Behavior. Ph.D. thesis, Stanford University, February 1998Google Scholar
  12. 12.
    Magnusson, P.S., Werner, B.: Efficient Memory Simulation in SimICS. In: Proceedings of the 8th Annual Simulation Symposium. Phoenix, Arizona, USA, April 1995Google Scholar
  13. 13.
    Austin T., Larson E. and Ernst D. (2002). SimpleScalar: an infrastructure for computer system modeling. Computer 35(2): 59–67 CrossRefGoogle Scholar
  14. 14.
    Curtis-Maury, M., Ding, X., Antonopoulos, C., Nikolopoulos, D.: An evaluation of OpenMP on current and emerging multithreaded/multicore processors. In: Proceedings of the First International Workshop on OpenMP (IWOMP), Eugene, Oregon USA, June 2005Google Scholar
  15. 15.
    WWW.Cachegrind: a Cache-miss Profiler. Available at http://developer.kde.org/sewardj/docs-2.2.0/cg_main.html#cg-top
  16. 16.
    Nethercote, N., Seward, J.: Valgrind: a program supervision framework. In: Proceedings of the Third Workshop on Runtime Verification (RV’03), Boulder, Colorado, USA, July 2003. Available at http://developer.kde.org/sewardj
  17. 17.
    Martonosi M., Gupta A. and Anderson T. (1995). Tuning memory performance of sequential and parallel programs. Computer 28(4): 32–40 CrossRefGoogle Scholar
  18. 18.
    Benitez, D., Moure, J.C., Rexachs, D.I., Luque, E.: Evaluation of the field-programmable cache: performance and energy consumption. In: CF ’06: Proceedings of the 3rd Conference on Computing Frontiers, pp. 361–372 (2006)Google Scholar
  19. 19.
    Gordon-Ross, A., Vahid, F., Dutt, N.: Fast configurable-cache tuning with a unified second-level cache. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 323–326 (2005)Google Scholar
  20. 20.
    Abella J., González A., Vera X. and O’Boyle M. (2005). IATAC: a smart predictor to turn-off L2 cache lines. ACM Trans Arch Code Optim 2(1): 55–77 CrossRefGoogle Scholar
  21. 21.
    Ishihara, T., Fallah, F.: A non-uniform cache architecture for low power system design. In: ISLPED ’05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, pp. 363–368 (2005)Google Scholar
  22. 22.
    Saito, H. et al.: Large system performance of SPEC OMP2001 benchmarks. In Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) High performance computing: 4th International Symposium, ISHPC 2002. Proceedings, Volume 2327 of Lecture Notes in Computer Science, pp. 370–379, May 2002Google Scholar
  23. 23.
    Bailey, D. et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University, March 1994Google Scholar
  24. 24.
    Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center, October 1999Google Scholar
  25. 25.
    Nowak, F., Buchty, R., Karl, W.: Adaptive cache infrastructure: supporting dynamic program changes following dynamic program behavior. In: Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA 2008), Dresden, Germany, February 2008Google Scholar
  26. 26.
    Buchty, R., Nowak, F., Karl, W.: A Run-time Reconfigurable Cache Architecture. In: Proceedings of the International Conference ParCo 2007, Volume 15 of Advances in Parallel Computing, ISBN 978-3-9810843-4-4, pp. 757–766. IOS Press, Juelich, Germany, September 2007Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Jie Tao
    • 1
    • 2
  • Marcel Kunze
    • 2
  • Fabian Nowak
    • 3
  • Rainer Buchty
    • 3
  • Wolfgang Karl
    • 3
  1. 1.Department of Computer Science and TechnologyJilin UniversityChangchunPeople’s Republic of China
  2. 2.Steinbuch Centre for Computing, Forschungszentrum KarlsruheKarlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.Institut für Technische InformatikUniversität Karlsruhe, Karlsruhe Institute of  TechnologyKarlsruheGermany

Personalised recommendations