Optimal Cache Replacement Policy for Matrix Multiplication

  • Nenad AnchevEmail author
  • Marjan Gusev
  • Sasko Ristov
  • Blagoj Atanasovski
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 207)


Matrix multiplication is compute intensive, memory demand and cache intensive algorithm. It performs O(N 3) operations, demands storing O(N 2) elements and accesses O(N) times each element, where N is the matrix size. Implementation of cache intensive algorithms can achieve speedups due to cache memory behavior if the algorithms frequently reuse the data. A block replacement of already stored elements is initiated when the requirements exceed the limitations of cache size. Cache misses are produced when data of replaced block is to be used again. Several cache replacement policies are proposed to speedup different program executions.

In this paper we analyze and compare two most implemented cache replacement policies First-In-First-Out (FIFO) and Least-Recently-Used (LRU). The results of the experiments show the optimal solutions for sequential and parallel dense matrix multiplication algorithm. As the number of operations does not depend on cache replacement policy, we define and determine the average memory cycles per instruction that the algorithm performs, since it mostly affects the performance.


FIFO HPC LRU Performance Speedup 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Zoubi, H., Milenkovic, A., Milenkovic, M.: Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In: Proceedings of the 42nd Annual Southeast Regional Conference, ACM-SE 42, pp. 267–272. ACM, New York (2004)CrossRefGoogle Scholar
  2. 2.
    Duong, N., Cammarota, R., Zhao, D., Kim, T., Veidenbaum, A.: SCORE: A Score-Based Memory Cache Replacement Policy. In: Emer, J. (ed.) JWAC 2010 - 1st JILP Worshop on Computer Architecture Competitions: Cache Replacement Championship, Saint Malo, France (2010)Google Scholar
  3. 3.
    Gupta, R., Tokekar, S.: Proficient pair of replacement algorithms on 11 and l2 cache for merge sort. J. of Computing 2(3), 171–175 (2010)Google Scholar
  4. 4.
    Gusev, M., Ristov, S.: Matrix multiplication performance analysis in virtualized shared memory multiprocessor. In: MIPRO, 2012 Proc. of the 35th International Convention, pp. 264–269. IEEE Conference Publications (2012)Google Scholar
  5. 5.
    Gusev, M., Ristov, S.: Performance gains and drawbacks using set associative cache. Journal of Next Generation Information Technology (JNIT) 3(3), 87–98 (2012)CrossRefGoogle Scholar
  6. 6.
    He, L., Sun, Y., Zhang, C.: Adaptive Subset Based Replacement Policy for High Performance Caching. In: Emer, J. (ed.) JWAC 2010 - 1st JILP Worshop on Computer Architecture Competitions: Cache Replacement Championship, Saint Malo, France (2010)Google Scholar
  7. 7.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture, 5th edn. A Quantitative Approach (2012)Google Scholar
  8. 8.
  9. 9.
    Ishii, Y., Inaba, M., Hiraki, K.: Cache Replacement Policy Using Map-based Adaptive Insertion. In: Emer, J. (ed.) JWAC 2010 - 1st JILP Worshop on Computer Architecture Competitions: Cache Replacement Championship, Saint Malo, France (2010)Google Scholar
  10. 10.
    Jaleel, A., Theobald, K.B., Steely Jr., S.C., Emer, J.: High performance cache replacement using re-reference interval prediction (rrip). SIGARCH Comput. Archit. News 38(3), 60–71 (2010)CrossRefGoogle Scholar
  11. 11.
    Janapsatya, A., Ignjatović, A., Peddersen, J., Parameswaran, S.: Dueling clock: adaptive cache replacement policy based on the clock algorithm. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2010, pp. 920–925 (2010)Google Scholar
  12. 12.
    Lira, J., Molina, C., González, A.: Lru-pea: a smart replacement policy for non-uniform cache architectures on chip multiprocessors. In: Proceedings of the 2009 IEEE International Conference on Computer Design, ICCD 2009, pp. 275–281. IEEE Press, Piscataway (2009)CrossRefGoogle Scholar
  13. 13.
  14. 14.
    Pimple, M., Sathe, S.: Architecture aware programming on multi-core systems. International Journal of Advanced Computer Science and Applications (IJACSA) 2, 105–111 (2011)Google Scholar
  15. 15.
    Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: Adaptive insertion policies for high performance caching. SIGARCH Comput. Archit. News 35(2), 381–391 (2007)CrossRefGoogle Scholar
  16. 16.
    Reineke, J., Grund, D.: Relative competitive analysis of cache replacement policies. Sigplan Not. 43(7), 51–60 (2008)CrossRefGoogle Scholar
  17. 17.
    Ristov, S., Gusev, M.: Achieving maximum performance for matrix multiplication using set associative cache. In: 2012 The 8th Int. Conf. on. Computing Technology and Information Management (ICCM 2012), vol. 2, pp. 542–547 (2012)Google Scholar
  18. 18.
    Zhang, K., Wang, Z., Chen, Y., Zhu, H., Sun, X.H.: Pac-plru: A cache replacement policy to salvage discarded predictions from hardware prefetchers. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2011, pp. 265–274. IEEE Computer Society, Washington, DC (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nenad Anchev
    • 1
    Email author
  • Marjan Gusev
    • 1
  • Sasko Ristov
    • 1
  • Blagoj Atanasovski
    • 1
  1. 1.Faculty of Computer Science and EngineeringSs. Cyril and Methodious UniversitySkoipjeMacedonia

Personalised recommendations