Advertisement

Efficient Cache Simulation for Affine Computations

  • Wenlei BaoEmail author
  • Prashant Singh Rawat
  • Martin Kong
  • Sriram Krishnamoorthy
  • Louis-Noel Pouchet
  • P. Sadayappan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11403)

Abstract

Trace based cache simulation are common techniques in design space exploration. In this paper, we develop an efficient strategy to simulate cache behavior for affine computations. Our framework exploits the regularity of polyhedral programs to implement a cache set partition transformation to parallelize both trace generation and simulation. We demonstrate that our framework accurately models the cache behavior of polyhedral programs while achieving significant improvements in simulation time. Extensive evaluations show that our proposed framework systematically outperforms the time-partition based parallel cache simulation.

Notes

Acknowledgments

We thank the anonymous referees for the feedback and many suggestions that helped in improving the presentation. This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Awards 66905 and DE-SC0014135, program manager Lucy Nowell, by the U.S. National Science Foundation through awards 1513120 and 1731612, and by computational resources from the Ohio Supercomputer Center. Pacific Northwest National Laboratory is operated by Battelle for DOE under Contract DE-AC05-76RL01830.

References

  1. 1.
    Agarwal, A., Hennessy, J., Horowitz, M.: An analytical cache model. ACM Trans. Comput. Syst. (TOCS) 7(2), 184–215 (1989)CrossRefGoogle Scholar
  2. 2.
    Bao, W., Tavarageri, S., Ozguner, F., Sadayappan, P.: PWCET: power-aware worst case execution time analysis. In: 2014 43rd International Conference on Parallel Processing Workshops, pp. 439–447, September 2014Google Scholar
  3. 3.
    Bao, W.: Power aware WCET analysis (2014)Google Scholar
  4. 4.
    Bao, W., et al.: Static and dynamic frequency scaling on multicore CPUs. ACM Trans. Arch. Code Optim. (TACO) 13(4), 51:1–51:26 (2016).  https://doi.org/10.1145/3011017CrossRefGoogle Scholar
  5. 5.
    Bao, W., Krishnamoorthy, S., Pouchet, L.N., Rastello, F., Sadayappan, P.: PolyCheck: dynamic verification of iteration space transformations on affine programs. SIGPLAN Not. 51(1), 539–554 (2016).  https://doi.org/10.1145/2914770.2837656CrossRefzbMATHGoogle Scholar
  6. 6.
    Barriga, L., Ayani, R.: Parallel cache simulation on multiprocessor workstattions. In: 1993 International Conference on Parallel Processing, ICPP 1993, vol. 1, pp. 171–174. IEEE (1993)Google Scholar
  7. 7.
    Bastoul, C.: Generating loops for scanning polyhedra: CLooG users guide. Polyhedron 2, 10 (2004)Google Scholar
  8. 8.
    Conte, T.M., Hirsch, M.A., Hwu, W.M.: Combining trace sampling with single pass methods for efficient cache simulation. IEEE Trans. Comput. 47(6), 714–720 (1998)CrossRefGoogle Scholar
  9. 9.
    Dundar, M., Kou, Q., Zhang, B., He, Y., Rajwa, B.: Simplicity of kmeans versus deepness of deep learning: a case of unsupervised feature learning with limited data. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 883–888. IEEE (2015)Google Scholar
  10. 10.
    Edler, J., Hill, M.D.: Dinero IV trace-driven uniprocessor cache simulator (1999). http://www.cs.wisc.edu/markhill
  11. 11.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Prog. 21(6), 389–420 (1992)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ghosh, S., Martonosi, M., Malik, S.: Precise miss analysis for program transformations with caches of arbitrary associativity. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VIII, pp. 228–239. ACM, New York (1998).  https://doi.org/10.1145/291069.291051
  13. 13.
    Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst. (TOPLAS) 21(4), 703–746 (1999)CrossRefGoogle Scholar
  14. 14.
    Girbal, S., et al.: Semi-automatic composition of loop transformations. Int. J. Parallel Prog. 34(3), 261–317 (2006)CrossRefGoogle Scholar
  15. 15.
    Heidelberger, P., Stone, H.S.: Parallel trace-driven cache simulation by time partitioning. In: 1990 Proceedings of the Simulation Conference, Winter, pp. 734–737. IEEE (1990)Google Scholar
  16. 16.
    Hill, M.D., Smith, A.J.: Evaluating associativity in CPU caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)CrossRefGoogle Scholar
  17. 17.
    Hong, C., et al.: Effective padding of multidimensional arrays to avoid cache conflict misses. SIGPLAN Not. 51(6), 129–144 (2016).  https://doi.org/10.1145/2980983.2908123CrossRefGoogle Scholar
  18. 18.
    Zhang, J., Lu, X., Panda, D.: High performance MPI library for container-based HPC cloud on InfiniBand clusters, August 2016Google Scholar
  19. 19.
    Kiesling, T.: Approximate time-parallel cache simulation. In: Proceedings of the 36th Conference on Winter Simulation, pp. 345–354. Winter Simulation Conference (2004)Google Scholar
  20. 20.
    Kiesling, T., Pohl, S.: Time-parallel simulation with approximative state matching. In: Proceedings of the Eighteenth Workshop on Parallel and Distributed Simulation, pp. 195–202. ACM (2004)Google Scholar
  21. 21.
    Lauterbach, G.: Accelerating architectural simulation by parallel execution of trace samples. In: 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, vol. 1, pp. 205–210. IEEE (1994)Google Scholar
  22. 22.
    Li, M., Lu, X., Hamidouche, K., Zhang, J., Panda, D.K.: Mizan-RMA: accelerating Mizan graph processing framework with MPI RMA. In: 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp. 42–51, December 2016Google Scholar
  23. 23.
    Li, M., Potluri, S., Hamidouche, K., Jose, J., Panda, D.K.: Efficient and truly passive MPI-3 RMA using InfiniBand atomics. In: Proceedings of the 20th European MPI Users’ Group Meeting, EuroMPI 2013, pp. 91–96. ACM, New York (2013).  https://doi.org/10.1145/2488551.2488573
  24. 24.
    Li, M., Hamidouche, K., Lu, X., Subramoni, H., Zhang, J., Panda, D.K.: Designing MPI library with on-demand paging (ODP) of InfiniBand: challenges and benefits. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 37:1–37:11. IEEE Press, Piscataway (2016). http://dl.acm.org/citation.cfm?id=3014904.3014954
  25. 25.
    Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970)CrossRefGoogle Scholar
  26. 26.
    Nicol, D.M., Greenberg, A.G., Lubachevsky, B.D.: Massively parallel algorithms for trace-driven cache simulations. IEEE Trans. Parallel Distrib. Syst. 5(8), 849–859 (1994)CrossRefGoogle Scholar
  27. 27.
    Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)Google Scholar
  28. 28.
    Pieper, J.J., Mellan, A., Paul, J.M., Thomas, D.E., Karim, F.: High level cache simulation for heterogeneous multiprocessors. In: Proceedings of the 41st Annual Design Automation Conference, pp. 287–292. ACM (2004)Google Scholar
  29. 29.
    Pouchet, L.N.: Polybench: the polyhedral benchmark suite (2012). http://www.cs.ucla.edu/pouchet/software/polybench
  30. 30.
    Puzak, T.R.: Analysis of cache replacement-algorithms (1985)Google Scholar
  31. 31.
    Schuff, D.L., Kulkarni, M., Pai, V.S.: Accelerating multicore reuse distance analysis with sampling and parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 53–64. ACM, New York (2010).  https://doi.org/10.1145/1854273.1854286
  32. 32.
    Sugumar, R.A., Abraham, S.G.: Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. (TOCS) 13(1), 32–56 (1995)CrossRefGoogle Scholar
  33. 33.
    Sugumar, R.A.: Multi-configuration simulation algorithms for the evaluation of computer architecture designs (1993)Google Scholar
  34. 34.
    Uhlig, R.A., Mudge, T.N.: Trace-driven memory simulation: a survey. ACM Comput. Surv. (CSUR) 29(2), 128–170 (1997)CrossRefGoogle Scholar
  35. 35.
    Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Fukuda, K., Hoeven, J., Joswig, M., Takayama, N. (eds.) ICMS 2010. LNCS, vol. 6327, pp. 299–302. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15582-6_49CrossRefGoogle Scholar
  36. 36.
    Verdoolaege, S., Grosser, T.: Polyhedral extraction tool. In: Second International Workshop on Polyhedral Compilation Techniques (IMPACT 2012), Paris, France (2012)Google Scholar
  37. 37.
    Wan, H., Gao, X., Long, X., Wang, Z.: GCSim: a GPU-based trace-driven simulator for multi-level cache. In: Dou, Y., Gruber, R., Joller, J.M. (eds.) APPT 2009. LNCS, vol. 5737, pp. 177–190. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03644-6_14CrossRefGoogle Scholar
  38. 38.
    Wu, M.J., Yeung, D.: Efficient reuse distance analysis of multicore scaling for loop-based parallel programs. ACM Trans. Comput. Syst. 31(1), 1:1–1:37 (2013).  https://doi.org/10.1145/2427631.2427632CrossRefGoogle Scholar
  39. 39.
    Wu, Y., Muntz, R.: Stack evaluation of arbitrary set-associative multiprocessor caches. IEEE Trans. Parallel Distrib. Syst. 6(9), 930–942 (1995)CrossRefGoogle Scholar
  40. 40.
    Zhang, B., et al.: Trust from the past: Bayesian personalized ranking based link prediction in knowledge graphs. In: SDM Workshop on Mining Networks and Graphs (MNG 2016) (2016)Google Scholar
  41. 41.
    Zhang, B., Dundar, M., Hasan, M.A.: Bayesian non-exhaustive classification a case study: online name disambiguation using temporal record streams. In: CIKM 2016 Proceedings of the 25th ACM International Conference on Information and Knowledge Management, pp. 1341–1350. ACM (2016)Google Scholar
  42. 42.
    Zhang, B., Dundar, M., Hasan, M.A.: Bayesian non-exhaustive classification for active online name disambiguation. arXiv preprint arXiv:1708.04531 (2017)
  43. 43.
    Zhang, B., Hasan, M.A.: Name disambiguation in anonymized graphs using network embedding. In: The 26th ACM International Conference on Information and Knowledge Management (CIKM 2017) (2017)Google Scholar
  44. 44.
    Zhang, B., Mohammed, N., Dave, V., Hasan, M.A.: Feature selection for classification under anonymity constraint. Trans. Data Priv. 10, 1–25 (2017)Google Scholar
  45. 45.
    Zhang, B., Saha, T.K., Al Hasan, M.: Name disambiguation from link data in a collaboration graph. In: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 81–84. IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Wenlei Bao
    • 1
    Email author
  • Prashant Singh Rawat
    • 1
  • Martin Kong
    • 2
  • Sriram Krishnamoorthy
    • 3
  • Louis-Noel Pouchet
    • 4
  • P. Sadayappan
    • 1
  1. 1.The Ohio State UniversityColumbusUSA
  2. 2.Brookhaven National LaboratoryUptonUSA
  3. 3.Pacific Northwest National LaboratoryRichlandUSA
  4. 4.Colorado State UniversityFort CollinsUSA

Personalised recommendations