Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation

Venkataraman, Kalyana Sundaram; Li, Yiran; Wu, Qi; Xie, Ningde; Sun, Hongbin; Zheng, Nanning; Zhang, Tong

doi:10.1007/s11265-012-0724-0

Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation

Published: 18 January 2013

Volume 73, pages 11–24, (2013)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Kalyana Sundaram Venkataraman¹,
Yiran Li¹,
Qi Wu¹,
Ningde Xie²,
Hongbin Sun³,
Nanning Zheng³ &
…
Tong Zhang¹

553 Accesses
Explore all metrics

Abstract

This paper studies the feasibility and potential of using planar embedded DRAM (eDRAM), which is completely compatible with CMOS logic process, to improve circuit implementation efficiency of memory-hungry signal processing algorithms. In spite of its apparent cell area efficiency advantage over SRAM, planar eDRAM is not being widely used in practice, mainly due to its very short retention time (e.g., few \(\upmu \)s and even a few hundreds ns). In this work, we contend that short retention time may not necessarily be a fundamental issue for implementing signal processing algorithms because they typically handle streaming data, which exhibits regular and predictable data access patterns, and has a large algorithm/architecture design space. This study elaborates on the rationale and application of using a planar eDRAM in memory-hungry signal processing circuit implementations, and discusses the possible algorithm and architecture design strategies to better embrace the use of planar eDRAM. For the purpose of demonstration, we use low-density parity-check (LDPC) code decoding and motion estimation in video encoding as test vehicles. Beyond a straightforward SRAM replacement, we propose an interleaved read/write page-mode DRAM operation to reduce planar eDRAM energy consumption by leveraging LDPC code decoding data access pattern, and we investigate the potential of using planar eDRAM to enable a higher degree of image data reuse in motion estimation by proposing a folded scan structure to further improve its effectiveness. We carried out detailed planar eDRAM SPICE simulations at 45 nm node to obtain its characteristics, based on which we quantitatively evaluate the effectiveness of using planar eDRAM in these two case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Article 19 May 2024

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Article 21 September 2023

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

Article 31 May 2023

Notes

In comparison, eDRAM with explicitly fabricated capacitors at extra fabrication cost can achieve much longer retention time, e.g., the eDRAM being used in IBM server processors has 40 \(\upmu \)s retention time [2].

References

Balasubramonian, R., Muralimanohar, N., Jouppi, N. (2009). Cacti: A tool to model large caches. http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.
Barth, J., Reohr, W., Parries, P., Fredeman, G., Golz, J., Schuster, S., Matick, R., Hunter, H.I., C.T., Harig, J., Kim, H., Khan, B., Griesemer, J., Havreluk, R., Yanagisawa, K., Kirihata, T., Iyer, S. (2008). A 500MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE Journal of Solid State Circuits, 43(1), 86–95.
Article Google Scholar
Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G. (2006). Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Transactions on Circuits Systems I, Reg. Papers, 53(6), 578–593.
Article Google Scholar
Chen, C.Y., Huang, C.T., Chen, Y.H., Chen, L.-G. (2006). Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Transactions on Circuits and Systems for Video Technology, 16(4), 553–558.
Article Google Scholar
Cho, H.J., Nemati, F., Roy, R., Gupta, R., Yang, K., Ershov, M., Banna, S., Tarabbia, M., Sailing, C., Hayes, D., Mittal, A., Robins, S. (2005). A novel capacitor-less DRAM cell using thin capacitively-coupled thyristor (TCCT). In Proc. of IEEE international electron devices meeting (IEDM) (pp. 311–314).
Gallager, R.G. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, IT-8, 21–28.
Article MathSciNet Google Scholar
Kim, J., & Park, T. (2007). A novel VLSI architecture for full-search variable block-size motion estimation. In Proc. of IEEE TENCON, Taipei.
Leung, W., Hsu, F., Jones, M.E. (2000). New generation of Z-RAM. In Proc. of IEEE international ASIC/SOC conference (pp. 32–36).
Li, P., & Tang, H. (2010). A low power VLSI Implementation for variable block size motion estimation in H.264/AVC. In Proc. of ISCAS: circuits and systems conf, Paris, France.
Li, Z., Chen, L., Zeng, L., Lin, S., Fong, W. (2006). Efficient encoding of quasi-cyclic low-density parity-check codes. IEEE Transactions on Communications, 54(1), 71–81.
Article Google Scholar
MacKay, D.J.C., & Neal, R.M. (1996). Near Shannon limit performance of low density parity check codes. Electronics Letters, 32, 1645–1646.
Article Google Scholar
Matick, R., & Schuster, S. (2005). Logic-based eDRAM: Origins and rationale for use. IBM Journal of Research and Development, 49, 145–165.
Article Google Scholar
Miles, L., Gambles, J., Maki, G., Ryan, W., Whitaker, S. (2006). An 860-Mb/s (8158,7136) low-density parity-check encoder. IEEE Journal of Solid-State Circuits, 41(8), 1686–1691.
Article Google Scholar
MoSys Inc. http://www.mosys.com/. Accessed 10 Oct 2010.
Natarajan, S., Chung, S., Paris, L., Keshavarzi, A. (2009). Searching for the dream embedded memory. IEEE Solid-State Circuits Magazine, 1, 34–44.
Article Google Scholar
Okhonin, S., Nagoga, M., Carman, E., Beffa, R., Faraoni, E. (2007). New generation of Z-RAM. In Proc. of IEEE international electron devices meeting (IEDM) (pp. 925–928).
Somasekhar, D., Lu, S.L., Bloechel, B., Lai, K., Borkar, S., De, V. (2002). Planar 1T-cell DRAM with MOS storage capacitors in a 130 nm logic technology for high density microprocessor caches. In Proc. of IEEE solid state circuits conf., ESSCIRC, Firenze, Italy.
Somasekhar, D., Yibin, Y., Aseron, P., Lu, S.L., Khellah, M., Howard, J., Ruhl, G., Karnik, T., Borkar, S., De, V., Keshavarzi, A. (2009). 2GHz 2MB 2T gain cell memory macro with 128 GBytes/s bandwidth in a 65 nm logic process technology. IEEE Journal of Solid State Circuits, 44(1), 174–185.
Article Google Scholar
Song, Y., Liu, Z., Ikenaga, T., Goto, S. (2006). VLSI architecture for variable block size motion estimation in H.264/AVC with low cost memory organization. In Proc. of VLSI design automation and test conf, Hsinchu, Taiwan.
Su, Y., & Sun, M.T. (2006). Fast multiple reference frame motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems on Video Technology, 16(3), 447–452.
Article Google Scholar
Sveriges Television (SVT). Video-sequence. http://www.svt.se. Accessed 10 Oct 2010.
Tuan, J.C., Chang, T.S., Jen, C.W. (2002). On the data reuse and memory bandwidth analysis for full-searchblock-matching VLSI architecture. IEEE Transactions on Circuits and Systems on Video Technology, 12(1), 61–72.
Article Google Scholar
Wang, G., Ho, K.C.H., Faltermeier, J., Kong, W., Kim, H., Cai, J. (2006). A 0.127\(\mu \)m\(^2\) high performance 65 nm SOI based embedded DRAM for on-processor applications. In Proc. of international electron devices meeting (IEDM) (pp. 1–4).
Wang, Z., & Cui, Z. (2007). A memory efficient partially parallel decoder architecture for quasi-cyclic LDPC codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(4), 483–488.
Article Google Scholar
Wiberg, N. (1996). Codes and decoding on general graphs. PhD Dissertation, Linkoping University, Sweden.
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems on Video Technology, 13(7), 560–576.
Article Google Scholar
Xiang, B., Shen, R., Pan, A., Bao, D., Zeng, X. (2010). An area-efficient and low-power multirate decoder for quasi-cyclic low-density parity-check codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(10), 1447–1460.
Article Google Scholar
Yap, S., & McCanny, J. (2004). A VLSI architecture for variable block size video motion estimation. IEEE Transactions on Circuits and Systems II: Express Briefs, 51(7), 384–389.
Article Google Scholar
Zhang, K., Huang, X., Wang, Z. (2009). High-throughput layered decoder implementation for quasi-cyclic LDPC codes. IEEE Journal on Selected Areas in Communications, 27(6), 985–994.
Article Google Scholar
Zhong, H., Zhang, T., Haratsch, E.F. (2007). Quasi-cyclic LDPC codes for the magnetic recording channel: code design and VLSI implementation. IEEE Transactions on Magnetics, 43(3), 1118–1123.
Article Google Scholar
Zhang, Z., Anantharam, V., Wainwright, M., Nikolic, B. (2010). An efficient 10gbase-t ethernet ldpc decoder design with low error floors. IEEE Journal of Solid-State Circuits, 45(4), 843–855.
Article Google Scholar
Ndili, O., & Ogunfunmi, T. (2011). Algorithm and architecture co-design of hardware-oriented, modified diamond search for fast motion estimation in H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 21(9), 1214–1227.
Article Google Scholar
Chatterjee, S.K., & Chakrabarti, I. (2010). Low power VLSI architectures for one bit transformation based fast motion estimation. IEEE Transactions on Consumer Electronics, 56(4), 2652–2660.
Article Google Scholar
Murugappa, P., Al-Khayat, R., Baghdadi, A., Jezequel, M. (2011). A flexible high throughput multi-asip architecture for ldpc and turbo decoding. In Proc. of design, automation test in Europe conference exhibition (DATE) (pp. 1–6).

Download references

Author information

Authors and Affiliations

Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Kalyana Sundaram Venkataraman, Yiran Li, Qi Wu & Tong Zhang
Intel, Portland, OR, USA
Ningde Xie
Xian Jiaotong University, Xian, Shaanxi, 710049, People’s Republic of China
Hongbin Sun & Nanning Zheng

Authors

Kalyana Sundaram Venkataraman
View author publications
You can also search for this author in PubMed Google Scholar
Yiran Li
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ningde Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Nanning Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kalyana Sundaram Venkataraman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Venkataraman, K.S., Li, Y., Wu, Q. et al. Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation. J Sign Process Syst 73, 11–24 (2013). https://doi.org/10.1007/s11265-012-0724-0

Download citation

Received: 09 April 2011
Revised: 11 October 2012
Accepted: 03 December 2012
Published: 18 January 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11265-012-0724-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation

Abstract

Access this article

Similar content being viewed by others

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation

Abstract

Access this article

Similar content being viewed by others

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation