Skip to main content
Log in

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Dataflow is a parallel and generic model of computation that is agnostic of the underlying multi/many-core architecture executing it. State-of-the-art frameworks allow fast development of dataflow applications providing memory, communicating, and computing optimizations by design time exploration. However, the frameworks usually do not consider cache memory behavior when generating code. A generally accepted idea is that bigger and multi-level caches improve the performance of applications. This work evaluates such a hypothesis in a broad experiment campaign adopting different multi-core configurations related to the number of cores and cache parameters (size, sharing, controllers). The results show that bigger is not always better, and the foreseen future of more cores and bigger caches do not guarantee software-free better performance for dataflow applications. Additionally, this work investigates the adoption of two memory management strategies for dataflow applications: Copy-on-Write (CoW) and Non-Temporal Memory transfers (NTM). Experimental results addressing state-of-the-art applications show that NTM and CoW can contribute to reduce the execution time to -5.3% and \(-15.8\%\), respectively. CoW, specifically, shows improvements up to -21.8% in energy consumption with -16.8% of average among 22 different cache configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

Availability of Data and Material

Not applicable.

Code Availability

Not applicable.

Notes

  1. SIGSEGV is a synchronously-generated signal and is guaranteed to be delivered to the causing POSIX thread [22].

References

  1. Furtunato, A. F. A., Georgiou, K., Eder, K., & Xavier-De-Souza, S. (2020). When parallel speedups hit the memory wall. IEEE Access, 8, 79225–79238. https://doi.org/10.1109/ACCESS.2020.2990418

    Article  Google Scholar 

  2. Pelcat, M., Desnos, K., Heulot, J., Guy, C., Nezan, J., Aridhi, S. (2014). Preesm: A dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In: European Embedded Design in Education and Research Conference (EDERC), pp. 36–40. https://doi.org/10.1109/EDERC.2014.6924354

  3. Carlson, T. E., Heirman, W., & Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. https://doi.org/10.1145/2063384.2063454

  4. Slingerland, N., & Smith, A. (2001). Cache Performance for Multimedia Applications. In: International Conference on Supercomputing (ICS), ICS ’01, pp. 204–217. ACM, New York. https://doi.org/10.1145/377792.377833

  5. Alves, M. A. Z., Freitas, H. C., & Navaux, P. O. A. (2009). Investigation of shared l2 cache on many-core processors. In: International Conference on Architecture of Computing Systems, pp. 1–10

  6. Garcia, V., Gomez-Luna, J., Grass, T., Rico, A., Ayguade, E., & Pena, A. (2016). Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE, New York (2016). https://doi.org/10.1109/IISWC.2016.7581277

  7. Domagala, L., van Amstel, D., & Rastello, F. (2016). Generalized Cache Tiling for Dataflow Programs. In: SIGPLAN/SIGBED, LCTES, pp. 52–61. ACM, New York. https://doi.org/10.1145/2907950.2907960

  8. Maghazeh, A., Chattopadhyay, S., Eles, P., & Peng, Z. (2019). Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications. In: Design, Automation, and Test in Europe (DATE), pp. 570–575. IEEE, Florence. https://doi.org/10.23919/DATE.2019.8714861

  9. Stoutchinin, A., & Benini, L. (2019). Streamdrive: A dynamic dataflow framework for clustered embedded architectures. Journal of Signal Processing System, 91(3–4), 275–301. https://doi.org/10.1007/s11265-018-1351-1

    Article  Google Scholar 

  10. Basilio, B. (2021). Fraguela and Diego Andrade: A software cache autotuning strategy for dataflow computing with upc++ depspawn. Computational and Mathematical Methods 1(1), 1–14. https://doi.org/10.1002/cmm4.1148

    Article  MathSciNet  Google Scholar 

  11. Bovet, D. P., & Cesati, M. (2006). Understanding the Linux kernel, 3rd edn., chap. 10, p. 295. O’Reilly

  12. Intel Corporation. (2020). Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes. Intel Corporation

  13. Le, Q. T., Stern, J., & Brenner, S. (2020). Fast memcpy with SPDK and Intel® I/OAT DMA Engine. Retrieved March 15, 2021. https://software.intel.com/content/www/us/en/develop/articles/fast-memcpy-using-spdk-and-ioat-dma-engine.html

  14. Desnos, K., Pelcat, M., Nezan, J. F., & Aridhi, S. (2016). On memory reuse between inputs and outputs of dataflow actors. ACM Transactions on Embedded Computing Systems 15(2). https://doi.org/10.1145/2871744

  15. Kurd, N., Mosalikanti, P., Neidengard, M., Douglas, J., & Kumar, R. (2009). Next generation intel core micro-architecture (nehalem) clocking. IEEE Journal of Solid-State Circuits, 44(4), 1121–1129. https://doi.org/10.1109/JSSC.2009.2014023

    Article  Google Scholar 

  16. Kim, T., Sun, Z., Chen, H., Wang, H., & Tan, S. X. (2017). Energy and lifetime optimizations for dark silicon manycore microprocessor considering both hard and soft errors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(9), 2561–2574. https://doi.org/10.1109/TVLSI.2017.2707401

  17. Rathore, V., Chaturvedi, V., Singh, A., Srikanthan, T., & Shafique, M. (2020). Longevity framework: Leveraging online integrated aging-aware hierarchical mapping and vf-selection for lifetime reliability optimization in manycore processors. IEEE Transactions on Computers pp. 1–1. https://doi.org/10.1109/TC.2020.3006571

  18. PREESM. (2021). PREESM Applications Repository (https://github.com/preesm/preesm-apps).

  19. Hamzah, R., & Ibrahim, H. (2015). Literature Survey on Stereo Vision Disparity Map Algorithms. Journal of Sensors, 16(1), 1–23. https://doi.org/10.1155/2016/8742920

    Article  Google Scholar 

  20. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 vol.2. https://doi.org/10.1109/ICCV.1999.790410

  21. Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., & Jouppi, N. P. (2009). Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In: International Symposium on Microarchitecture (MICRO), pp. 469–480. IEEE, New York, NY, USA.

  22. IEEE. (2017). IEEE Standard for Information Technology–Portable Operating System Interface (POSIX(R)) Base Specifications, Issue 7. IEEE Std 1003.1-2017 1(1), 1–3951. https://doi.org/10.1109/IEEESTD.2018.8277153

Download references

Funding

This work is supported by the Agence Nationale de la Recherche under Grant No.: ANR-17-CE24-0018 We would like to give special thanks to the PREESM and Sniper communities for actively participating in the development of the tools which offer solid basements to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Ruaro.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghasemi, A., Ruaro, M., Cataldo, R. et al. The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications. J Sign Process Syst 94, 721–738 (2022). https://doi.org/10.1007/s11265-021-01730-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01730-7

Keywords

Navigation