Abstract
This paper studies the feasibility and potential of using planar embedded DRAM (eDRAM), which is completely compatible with CMOS logic process, to improve circuit implementation efficiency of memory-hungry signal processing algorithms. In spite of its apparent cell area efficiency advantage over SRAM, planar eDRAM is not being widely used in practice, mainly due to its very short retention time (e.g., few \(\upmu \)s and even a few hundreds ns). In this work, we contend that short retention time may not necessarily be a fundamental issue for implementing signal processing algorithms because they typically handle streaming data, which exhibits regular and predictable data access patterns, and has a large algorithm/architecture design space. This study elaborates on the rationale and application of using a planar eDRAM in memory-hungry signal processing circuit implementations, and discusses the possible algorithm and architecture design strategies to better embrace the use of planar eDRAM. For the purpose of demonstration, we use low-density parity-check (LDPC) code decoding and motion estimation in video encoding as test vehicles. Beyond a straightforward SRAM replacement, we propose an interleaved read/write page-mode DRAM operation to reduce planar eDRAM energy consumption by leveraging LDPC code decoding data access pattern, and we investigate the potential of using planar eDRAM to enable a higher degree of image data reuse in motion estimation by proposing a folded scan structure to further improve its effectiveness. We carried out detailed planar eDRAM SPICE simulations at 45 nm node to obtain its characteristics, based on which we quantitatively evaluate the effectiveness of using planar eDRAM in these two case studies.
Similar content being viewed by others
Notes
In comparison, eDRAM with explicitly fabricated capacitors at extra fabrication cost can achieve much longer retention time, e.g., the eDRAM being used in IBM server processors has 40 \(\upmu \)s retention time [2].
References
Balasubramonian, R., Muralimanohar, N., Jouppi, N. (2009). Cacti: A tool to model large caches. http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.
Barth, J., Reohr, W., Parries, P., Fredeman, G., Golz, J., Schuster, S., Matick, R., Hunter, H.I., C.T., Harig, J., Kim, H., Khan, B., Griesemer, J., Havreluk, R., Yanagisawa, K., Kirihata, T., Iyer, S. (2008). A 500MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE Journal of Solid State Circuits, 43(1), 86–95.
Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G. (2006). Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Transactions on Circuits Systems I, Reg. Papers, 53(6), 578–593.
Chen, C.Y., Huang, C.T., Chen, Y.H., Chen, L.-G. (2006). Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Transactions on Circuits and Systems for Video Technology, 16(4), 553–558.
Cho, H.J., Nemati, F., Roy, R., Gupta, R., Yang, K., Ershov, M., Banna, S., Tarabbia, M., Sailing, C., Hayes, D., Mittal, A., Robins, S. (2005). A novel capacitor-less DRAM cell using thin capacitively-coupled thyristor (TCCT). In Proc. of IEEE international electron devices meeting (IEDM) (pp. 311–314).
Gallager, R.G. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, IT-8, 21–28.
Kim, J., & Park, T. (2007). A novel VLSI architecture for full-search variable block-size motion estimation. In Proc. of IEEE TENCON, Taipei.
Leung, W., Hsu, F., Jones, M.E. (2000). New generation of Z-RAM. In Proc. of IEEE international ASIC/SOC conference (pp. 32–36).
Li, P., & Tang, H. (2010). A low power VLSI Implementation for variable block size motion estimation in H.264/AVC. In Proc. of ISCAS: circuits and systems conf, Paris, France.
Li, Z., Chen, L., Zeng, L., Lin, S., Fong, W. (2006). Efficient encoding of quasi-cyclic low-density parity-check codes. IEEE Transactions on Communications, 54(1), 71–81.
MacKay, D.J.C., & Neal, R.M. (1996). Near Shannon limit performance of low density parity check codes. Electronics Letters, 32, 1645–1646.
Matick, R., & Schuster, S. (2005). Logic-based eDRAM: Origins and rationale for use. IBM Journal of Research and Development, 49, 145–165.
Miles, L., Gambles, J., Maki, G., Ryan, W., Whitaker, S. (2006). An 860-Mb/s (8158,7136) low-density parity-check encoder. IEEE Journal of Solid-State Circuits, 41(8), 1686–1691.
MoSys Inc. http://www.mosys.com/. Accessed 10 Oct 2010.
Natarajan, S., Chung, S., Paris, L., Keshavarzi, A. (2009). Searching for the dream embedded memory. IEEE Solid-State Circuits Magazine, 1, 34–44.
Okhonin, S., Nagoga, M., Carman, E., Beffa, R., Faraoni, E. (2007). New generation of Z-RAM. In Proc. of IEEE international electron devices meeting (IEDM) (pp. 925–928).
Somasekhar, D., Lu, S.L., Bloechel, B., Lai, K., Borkar, S., De, V. (2002). Planar 1T-cell DRAM with MOS storage capacitors in a 130 nm logic technology for high density microprocessor caches. In Proc. of IEEE solid state circuits conf., ESSCIRC, Firenze, Italy.
Somasekhar, D., Yibin, Y., Aseron, P., Lu, S.L., Khellah, M., Howard, J., Ruhl, G., Karnik, T., Borkar, S., De, V., Keshavarzi, A. (2009). 2GHz 2MB 2T gain cell memory macro with 128 GBytes/s bandwidth in a 65 nm logic process technology. IEEE Journal of Solid State Circuits, 44(1), 174–185.
Song, Y., Liu, Z., Ikenaga, T., Goto, S. (2006). VLSI architecture for variable block size motion estimation in H.264/AVC with low cost memory organization. In Proc. of VLSI design automation and test conf, Hsinchu, Taiwan.
Su, Y., & Sun, M.T. (2006). Fast multiple reference frame motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems on Video Technology, 16(3), 447–452.
Sveriges Television (SVT). Video-sequence. http://www.svt.se. Accessed 10 Oct 2010.
Tuan, J.C., Chang, T.S., Jen, C.W. (2002). On the data reuse and memory bandwidth analysis for full-searchblock-matching VLSI architecture. IEEE Transactions on Circuits and Systems on Video Technology, 12(1), 61–72.
Wang, G., Ho, K.C.H., Faltermeier, J., Kong, W., Kim, H., Cai, J. (2006). A 0.127\(\mu \)m\(^2\) high performance 65 nm SOI based embedded DRAM for on-processor applications. In Proc. of international electron devices meeting (IEDM) (pp. 1–4).
Wang, Z., & Cui, Z. (2007). A memory efficient partially parallel decoder architecture for quasi-cyclic LDPC codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(4), 483–488.
Wiberg, N. (1996). Codes and decoding on general graphs. PhD Dissertation, Linkoping University, Sweden.
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems on Video Technology, 13(7), 560–576.
Xiang, B., Shen, R., Pan, A., Bao, D., Zeng, X. (2010). An area-efficient and low-power multirate decoder for quasi-cyclic low-density parity-check codes. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(10), 1447–1460.
Yap, S., & McCanny, J. (2004). A VLSI architecture for variable block size video motion estimation. IEEE Transactions on Circuits and Systems II: Express Briefs, 51(7), 384–389.
Zhang, K., Huang, X., Wang, Z. (2009). High-throughput layered decoder implementation for quasi-cyclic LDPC codes. IEEE Journal on Selected Areas in Communications, 27(6), 985–994.
Zhong, H., Zhang, T., Haratsch, E.F. (2007). Quasi-cyclic LDPC codes for the magnetic recording channel: code design and VLSI implementation. IEEE Transactions on Magnetics, 43(3), 1118–1123.
Zhang, Z., Anantharam, V., Wainwright, M., Nikolic, B. (2010). An efficient 10gbase-t ethernet ldpc decoder design with low error floors. IEEE Journal of Solid-State Circuits, 45(4), 843–855.
Ndili, O., & Ogunfunmi, T. (2011). Algorithm and architecture co-design of hardware-oriented, modified diamond search for fast motion estimation in H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 21(9), 1214–1227.
Chatterjee, S.K., & Chakrabarti, I. (2010). Low power VLSI architectures for one bit transformation based fast motion estimation. IEEE Transactions on Consumer Electronics, 56(4), 2652–2660.
Murugappa, P., Al-Khayat, R., Baghdadi, A., Jezequel, M. (2011). A flexible high throughput multi-asip architecture for ldpc and turbo decoding. In Proc. of design, automation test in Europe conference exhibition (DATE) (pp. 1–6).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Venkataraman, K.S., Li, Y., Wu, Q. et al. Using Planar Embedded DRAM in Memory Intensive Signal Processing Circuits: Case Studies on LDPC Decoding and Motion Estimation. J Sign Process Syst 73, 11–24 (2013). https://doi.org/10.1007/s11265-012-0724-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-012-0724-0