A Compute Cache System for Signal Processing Applications

Vieira, João; Roma, Nuno; Falcao, Gabriel; Tomás, Pedro

doi:10.1007/s11265-020-01626-y

A Compute Cache System for Signal Processing Applications

Published: 12 April 2021

Volume 93, pages 1173–1186, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

João Vieira ORCID: orcid.org/0000-0003-0038-2830¹,
Nuno Roma¹,
Gabriel Falcao² &
…
Pedro Tomás¹

287 Accesses
2 Citations
Explore all metrics

Abstract

Nowadays, processing systems are constrained by the low efficiency of their memory subsystems. Although memories evolved into faster and more efficient devices through the years, they were still unable to keep up with the computational power offered by processors, i.e., feed the processors with the data they require at the rhythm it is consumed. Consequently, with the advent of Big Data, the need for fetching large amounts of data from memory became the most prominent performance bottleneck. Naturally, several approaches seeking to mitigate this problem have arisen through the years, such as application-specific accelerators and Near Data Processing (NDP) solutions. However, none were capable to offer a satisfactory general-purpose solution without imposing rather limiting constraints. For instance, NDP solutions often require the programmer to have low-level knowledge of how data is physically stored in memory. In this paper, we propose an alternative mechanism that operates at the cache level, leveraging both proximity to the data and the parallelism enabled by accessing an entire cache line per cycle. We detail the internal architecture of the Cache Compute System (CCS) and demonstrate its integration with a conventional high-performance ARM Cortex-A53 Central Processing Unit (CPU). Furthermore, we assess the performance benefits of the novel CCS using an extensive set of microbenchmarks as well as six kernels widely used in the context of Convolutional Neural Networks (CNNs) and clustering algorithms. Results show that the CCS provides performance improvements ranging from 3.9× to 40.6× regarding the six tested kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

Cache Memory Architectures for Handling Big Data Applications: A Survey

Compute-in-Memory Architecture

References

Wulf, W.A., & McKee, S.A. (1995). Hitting the memory wall: implications of the obvious. SIGARCH Computer Architecture News, 23(1), 20–24.
Article Google Scholar
Vieira, J., Duarte, R.P., & Neto, H.C. (2019). kNN-STUFF: kNN streaming unit for Fpgas. IEEE Access, 7, 170864–170877.
Article Google Scholar
Aga, S., Jeloka, S., Subramaniyan, A., Narayanasamy, S., Blaauw, D.T., & Das, R. (2017). Compute caches. In HPCA (pp. 481–492): IEEE Computer Society.
Vieira, J., Giacomin, E., Qureshi, Y.M., Zapater, M., Tang, X., Kvatinsky, S., Atienza, D., & Gaillardon, P. (2019). A product engine for energy-efficient execution of binary neural networks using resistive memories (pp. 160–165): IEEE.
Ghose, S., Hsieh, K., Boroumand, A., Ausavarungnirun, R., & Mutlu, O. (2018). Enabling the adoption of processing-in-memory: challenges, mechanisms, future research directions. arXiv:1802.00320.
Kim, N.S., & Mehra, P. (2019). Practical near-data processing to evolve memory and storage devices into mainstream heterogeneous computing systems. In DAC (p. 22): ACM.
Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J.P., Hu, M., Williams, R.S., & Srikumar, V. (2016). ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ISCA (pp. 14–26): IEEE Computer Society.
Vieira, J., Roma, N., Tomás, P., Ienne, P., & Falcao, G. (2018). Exploiting compute caches for memory bound vector operations. In SBAC-PAD (pp. 197–200): IEEE.
Vieira, J., Roma, N., Falcao, G., & Tomás, P. (2020). Processing convolutional neural networks on cache. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1658–1662): IEEE.
Li, S., Niu, D., Malladi, K.T., Zheng, H., Brennan, B., & Xie, Y. (2017). DRISA: a DRAM-based reconfigurable in-situ accelerator. In MICRO (pp. 288–301): ACM.
Seshadri, V., Lee, D., Mullins, T., Hassan, H., Boroumand, A., Kim, J., Kozuch, M.A., Mutlu, O., Gibbons, P.B., & Mowry, T.C. (2017). Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In MICRO (pp. 273–287): ACM.
Seshadri, V., Hsieh, K., Boroumand, A., Lee, D., Kozuch, M.A., Mutlu, O., Gibbons, P.B., & Mowry, T.C. (2015). Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters, 14(2), 127–131.
Article Google Scholar
Li, S., Xu, C., Zou, Q., Zhao, J., Lu, Y., & Xie, Y. (2016). Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In DAC (pp. 173:1–173:6): ACM.
Yitbarek, S.F., Yang, T., Das, R., & Austin, T.M. (2016). Exploring specialized near-memory processing for data intensive operations. In DATE (pp. 1449–1452): IEEE.
Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., & Xie, Y. (2016). PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ISCA (pp. 27–39): IEEE Computer Society.
Cheng, M., Xia, L., Zhu, Z., Cai, Y., Xie, Y., Wang, Y., & Yang, H. (2019). TIME: a training-in-memory architecture For RRAM-based deep neural networks. IEEE Trans. on CAD of Integrated Circuits and Systems, 38(5), 834–847.
Article Google Scholar
Wang, Y., Chen, W., Yang, J., & Li, T. (2018). Towards memory-efficient allocation of CNNs on processing-in-memory architecture. IEEE Trans. Parallel Distrib. Syst., 29(6), 1428– 1441.
Article Google Scholar
Subramaniyan, A., Wang, J., Balasubramanian, E.R.M., Blaauw, D.T., Sylvester, D., & Das, R. (2017). Cache automaton. In MICRO (pp. 259–272): ACM.
Wang, X., Yu, J., Augustine, C., Iyer, R.R., & Das, R. (2019). Bit prudent in-cache acceleration of deep convolutional neural networks. In HPCA (pp. 81–93): IEEE.
Eckert, C., Wang, X., Wang, J., Subramaniyan, A., Sylvester, D., Blaauw, D.T., Das, R., & Iyer, R.R. (2019). Neural cache: bit-serial in-cache acceleration of deep neural networks. IEEE Micro, 39 (3), 11–19.
Article Google Scholar
Nag, A., Ramachandra, C.N., Balasubramonian, R., Stutsman, R., Giacomin, E., Kambalasubramanyam, H., & Gaillardon, P. (2019). GenCache: leveraging in-cache operators for efficient sequence alignment. In MICRO (pp. 334–346): ACM.
Ahn, J., Yoo, S., Mutlu, O., & Choi, K. (2015). PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In ISCA (pp. 336–348): ACM.
Cong, J., & Xiao, B. (2014). Minimizing computation in convolutional neural networks. In ICANN. Volume 8681 of Lecture Notes in Computer Science (pp. 281–290): Springer.
Giacomin, E., Greenberg-Toledo, T., Kvatinsky, S., & Gaillardon, P. (2019). A robust digital rram-based convolutional block for low-power image processing and learning applications. IEEE Trans. Circuits Syst. I Regul. Pap., 66-I(2), 643–654.
Article Google Scholar
Pouyan, P., Amat, E., Hamdioui, S., & Rubio, A. (2016). RRAM variability and its mitigation schemes. In 2016 26th international workshop on power and timing modeling, optimization and simulation (PATMOS) (pp. 141–146): IEEE.
Liu, X., Zhou, M., Rosing, T.S., & Zhao, J. (2019). HR³AM: a heat resilient design for RRAM-based neuromorphic computing. In ISLPED (pp. 1–6): IEEE.
Bo, C., Wang, K., Fox, J.J., & Skadron, K. (2016). Entity resolution acceleration using the automata processor. In BigData (pp. 311–318): IEEE Computer Society.
Qureshi, Y.M., Simon, W.A., Zapater, M., Atienza, D., & Olcoz, K. (2019). Gem5-X: a gem5-based system level simulation framework to optimize many-core platforms. In SpringSim (pp. 1–12): IEEE.

Download references

Author information

Authors and Affiliations

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
João Vieira, Nuno Roma & Pedro Tomás
Instituto de Telecomunicações, University of Coimbra, Coimbra, Portugal
Gabriel Falcao

Authors

João Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Roma
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Falcao
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Tomás
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Vieira.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work supported by national funds through Fundação para a Ciência e a Tecnologia (FCT), under projects UIDB/50021/2020 and PTDC/EEI-HAC/30485/2017–HAnDLE (INESC-ID), UIDB/ EEA/50008/2020 (Instituto de Telecomunicações), and research grant SFRH/BD/144047/2019.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vieira, J., Roma, N., Falcao, G. et al. A Compute Cache System for Signal Processing Applications. J Sign Process Syst 93, 1173–1186 (2021). https://doi.org/10.1007/s11265-020-01626-y

Download citation

Received: 13 July 2020
Revised: 03 November 2020
Accepted: 10 December 2020
Published: 12 April 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11265-020-01626-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Compute Cache System for Signal Processing Applications

Abstract

Access this article

Similar content being viewed by others

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

Cache Memory Architectures for Handling Big Data Applications: A Survey

Compute-in-Memory Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Compute Cache System for Signal Processing Applications

Abstract

Access this article

Similar content being viewed by others

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

Cache Memory Architectures for Handling Big Data Applications: A Survey

Compute-in-Memory Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation