Abstract
This book chapter describes, explores, and analyzes the designs and framework for energy-efficient and reliable edge computing from device to architecture to handle and compute data-intensive tasks and applications. First, we present a comprehensive study regarding magnetic random-access memory (MRAM) as a promising nonvolatile memory component due to its interesting features, including nonvolatility, near-zero standby power, high integration density, and radiation hardness. To enable efficient and reliable computing units, optimized in-memory processing accelerators for data and compute-intensive tasks via algorithm and hardware codesign approaches are discussed. Moreover, two other high attention topics, namely, normally off computing and hardware security, are examined. Thus, two design methodologies are introduced to mitigate MRAM write energy cost while provided benefits are efficiently utilized. The first design methodology approach, referred to as NV-clustering, is developed to realize middleware-transparent intermittent computing. The foundations of our work are advanced from the ground up by extending this emerging MRAM device to discover logic-in-memory methods that leverage intrinsic nonvolatility to realize intermittent robust computation. Then power analysis-resilient circuit (PARC) procedure as an extension of NV-clustering is developed as a power-masked synthesis technique in the presence of power analysis attacks.
Keywords
- Magnetic random-access memory
- In-memory processing
- Normally off computing
- Hardware security
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Almost all the previous checkpointing techniques suffer from data movement overhead, new programming paradigms, and internal and external consistency.
- 2.
Note: SOT-MTJ and SHE-MTJ are used interchangeably in this chapter book.
- 3.
A variation of this design is named PIM-Aligner [57].
References
Wang, Y., Yu, H., Ni, L., et al.: An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14(6), 998–1012 (2015)
Fact sheet: Big data across the federal government (2012) [Online]. Available:
Fong, X., Kim, Y., Yogendra, K., et al.: Spin-transfer torque devices for logic and memory: Prospects and perspectives. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(1), 1–22 (2016)
Li, S., Niu, D., Malladi, K.T., et al.: Drisa: A dram-based reconfigurable in-situ accelerator. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 288–301. ACM, New York (2017)
Li, B., Gu, P., Shan, Y., et al.: Rram-based analog approximate computing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(12), 1905–1917 (2015)
Angizi, S.: Processing-in-memory for data-intensive applications, from device to algorithm. Ph.D. dissertation. Arizona State University, New York (2021)
Cheng, M., Xia, L., Zhu, Z., et al.: Time: A training-in-memory architecture for memristor-based deep neural networks. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE, New York (2017)
Chi, P., Li, S., Xu, C., et al.: Prime: a novel processing-in-memory architecture for neural network computation in reram-based main memory. In: ACM SIGARCH Computer Architecture News, vol. 44(3), pp. 27–39. IEEE Press, New York (2016)
Seshadri, V., Lee, D., Mullins, T., et al.: Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology. In: 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 273–287. IEEE, New York (2017)
Li, S., Xu, C., Zou, Q., et al.: Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In: 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE, New York (2016)
He, Z., Angizi, S., Parveen, F., Fan, D.: Leveraging dual-mode magnetic crossbar for ultra-low energy in-memory data encryption. In: Proceedings of the on Great Lakes Symposium on VLSI 2017, pp. 83–88 (2017)
Angizi, S., Roohi, A., Taheri, M., Fan, D.: Processing-in-memory acceleration of mac-based applications using residue number system: A comparative study. In: Proceedings of the 2021 on Great Lakes Symposium on VLSI, pp. 265–270 (2021)
Angizi, S., He, Z., Parveen, F., Fan, D.: Imce: Energy-efficient bit-wise in-memory convolution engine for deep neural network. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 111–116. IEEE, New York (2018)
Angizi, S., He, Z., Rakin, A.S., Fan, D.: Cmp-pim: an energy-efficient comparator-based processing-in-memory neural network accelerator. In: Proceedings of the 55th Annual Design Automation Conference, p. 105. ACM, New York (2018)
Yin, S., Jiang, Z., Seo, J.-S., Seok, M.: Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks. IEEE J. Solid-State Circuits 55(6), 1733–1743 (2020)
Roohi, A., Angizi, S., Fan, D., DeMara, R.F.: Processing-in-memory acceleration of convolutional neural networks for energy-effciency, and power-intermittency resilience. In: 20th International Symposium on Quality Electronic Design (ISQED), pp. 8–13. IEEE, New York (2019)
Roohi, A., Sheikhfaal, S., Angizi, S., et al.: Apgan: Approximate gan for robust low energy learning from imprecise components. IEEE Trans. Comput. 69(3), 349–360 (2019)
Eckert, C., Wang, X., Wang, J., et al.: Neural cache: Bit-serial in-cache acceleration of deep neural networks, pp. 383–396 (2018)
Lee, B.C., Ipek, E., Mutlu, O., Burger, D.: Architecting phase change memory as a scalable dram alternative. In: ACM SIGARCH Computer Architecture News, vol. 37(3), pp. 2–13. ACM, New York (2009)
Everspin announces sampling of the world’s first 1-gigabit mram product (2016). [Online]. https://www.everspin.com
Baumann, A., Jung, M., Huber, K., et al.: A mcu platform with embedded fram achieving 350na current consumption in real-time clock mode with full state retention and 6.5 μs system wakeup time. In: 2013 Symposium on VLSI Circuits (VLSIC), pp. C202–C203. IEEE, New York (2013)
Chien, T.-K., Chiou, L.-Y., Lee, C.-C., et al.: An energy-efficient nonvolatile microprocessor considering software-hardware interaction for energy harvesting applications. In: 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp. 1–4. IEEE, New York (2016)
Senni, S., Torres, L., Sassatelli, G., Gamatie, A.: Non-volatile processor based on mram for ultra-low-power iot devices. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13(2), 17 (2017)
Senni, S., Torres, L., Benoit, P., et al.: Normally-off computing and checkpoint/rollback for fast, low-power, and reliable devices. IEEE Magn. Lett. 8, 1–5 (2017)
Prenat, G., Jabeur, K., Vanhauwaert, P., et al.: Ultra-fast and high-reliability sot-mram: From cache replacement to normally-off computing. IEEE Trans. Multi-Scale Computing Systems 2(1), 49–60 (2016)
Bishnoi, R., Oboril, F., Tahoori, M.B.: Non-volatile non-shadow flip-flop using spin orbit torque for efficient normally-off computing. In: 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 769–774. IEEE, New York (2016)
Khanna, S., Bartling, S.C., Clinton, M., et al.: An fram-based nonvolatile logic mcu soc exhibiting 100% digital state retention at vdd = 0 v achieving zero leakage with < 400-ns wakeup time for ulp applications. IEEE J. Solid State Circuits 49(1), 95–106 (2014)
Sakimura, N., Tsuji, Y., Nebashi, R., et al.: 10.5 a 90 nm 20 mhz fully nonvolatile microcontroller for standby-power-critical applications. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 184–185. IEEE, New York (2014)
Ransford, B., Sorber, J., Fu, K.: Mementos: System support for long-running computation on rfid-scale devices. In: ACM SIGARCH Computer Architecture News, vol. 39(1), pp. 159–170. ACM, New York (2011)
Lucia, B., Ransford, B.: A simpler, safer programming and execution model for intermittent systems. ACM SIGPLAN Not. 50(6), 575–585 (2015)
Shi, K., Howard, D.: Challenges in sleep transistor design and implementation in low-power designs. In: Proceedings of the 43rd annual Design Automation Conference, pp. 113–116. ACM, New York (2006)
Zhao, H., Glass, B., Amiri, P.K., et al.: Sub-200 ps spin transfer torque switching in in-plane magnetic tunnel junctions with interface perpendicular anisotropy. J. Phys. D. Appl. Phys. 45(2), 025001 (2011)
Rowlands, G., Rahman, T., Katine, J., et al.: Deep subnanosecond spin torque switching in magnetic tunnel junctions with combined in-plane and perpendicular polarizers. Appl. Phys. Lett. 98(10), 102509 (2011)
Roohi, A., Zand, R., DeMara, R.F.: A tunable majority gate-based full adder using current-induced domain wall nanomagnets. IEEE Trans. Magn. 52(8), 1–7 (2016)
Rakin, A.S., Angizi, S., He, Z., Fan, D.: Pim-tgan: A processing-in-memory accelerator for ternary generative adversarial networks. In: 2018 IEEE 36th International Conference on Computer Design (ICCD), pp. 266–273. IEEE, New York (2018)
Roohi, A., Zand, R., Fan, D., DeMara, R.F.: Voltage-based concatenatable full adder using spin hall effect switching. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(12), 2134–2138 (2017)
Gallagher, W.J., Parkin, S.S.: Development of the magnetic tunnel junction mram at ibm: From first junctions to a 16-mb mram demonstrator chip. IBM J. Res. Dev. 50(1), 5–23 (2006)
Chung, S.-W., Kishi, T., Park, J., et al.: 4gbit density stt-mram using perpendicular mtj realized with compact cell structure. In: 2016 IEEE International Electron Devices Meeting (IEDM), pp. 27–1. IEEE, New York (2016)
Garello, K., Yasin, F., Hody, H., et al.: Manufacturable 300 mm platform solution for field-free switching sot-mram. In: 2019 Symposium on VLSI Circuits, pp. T194–T195. IEEE, New York (2019)
Natsui, M., Tamakoshi, A., Honjo, H., et al.: Dual-port field-free sot-mram achieving 90-mhz read and 60-mhz write operations under 55-nm cmos technology and 1.2-v supply voltage. In: 2020 IEEE Symposium on VLSI Circuits, pp. 1–2. IEEE, New York (2020)
Sakhare, S., Perumkunnil, M., Bao, T.H., et al.: Enablement of stt-mram as last level cache for the high performance computing domain at the 5 nm node. In: 2018 IEEE International Electron Devices Meeting (IEDM), pp. 18–3. IEEE, New York (2018)
Kan, J., Park, C., Ching, C., et al.: Systematic validation of 2x nm diameter perpendicular mtj arrays and mgo barrier for sub-10 nm embedded stt-mram with practically unlimited endurance. In: 2016 IEEE International Electron Devices Meeting (IEDM), pp. 27–4. IEEE, New York (2016)
Slaughter, J., Rizzo, N., Janesky, J., et al.: High density ST-MRAM technology. In: 2012 IEEE International Electron Devices Meeting (IEDM), pp. 29–3. IEEE, New York (2012)
Slaughter, J., Nagel, K., Whig, R., et al.: Technology for reliable spin-torque mram products. In: 2016 IEEE International Electron Devices Meeting (IEDM), pp. 21–5. IEEE, New York (2016)
Donahue, M.J.: Oommf user’s guide, version 1.0. -6376 (1999)
Fong, X., Gupta, S.K., Mojumder, N.N., et al.: Knack: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque mram bit-cells. In: 2011 International Conference on Simulation of Semiconductor Processes and Devices, pp. 51–54 (2011)
Zand, R., Roohi, A., Fan, D., DeMara, R.F.: Energy-efficient nonvolatile reconfigurable logic using spin hall effect-based lookup tables. IEEE Trans. Nanotechnol. 16(1), 32–43 (2016)
Panagopoulos, G., Augustine, C., Roy, K.: A framework for simulating hybrid mtj/cmos circuits: Atoms to system approach. In: 2012 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1443–1446. IEEE, New York (2012)
Huai, Y.: Spin-transfer torque mram (stt-mram): Challenges and prospects. AAPPS Bull. 18(6), 33–40 (2008)
He, Z., Zhang, Y., Angizi, S., Gong, B., Fan, D.: Exploring a SOT-MRAM based in-memory computing for data processing. IEEE Trans. Multi-Scale Comput. Syst. 4(4), 676–685 (2018)
Liu, L., Moriyama, T., Ralph, D., Buhrman, R.: Spin-torque ferromagnetic resonance induced by the spin hall effect. Phys. Rev. Lett. 106(3), 036601 (2011)
Liu, L., Pai, C.-F., Li, Y., et al.: Spin-torque switching with the giant spin hall effect of tantalum. Science 336(6081), 555–558 (2012)
Pai, C.-F., Liu, L., Li, Y., et al.: Spin transfer torque devices utilizing the giant spin hall effect of tungsten. Appl. Phys. Lett. 101(12), 122404 (2012)
Angizi, S., He, Z., Awad, A., Fan, D.: Mrima: An mram-based in-memory accelerator. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(5), 1123–1136 (2019)
Angizi, S., Sun, J., Zhang, W., Fan, D.: Graphs: A graph processing accelerator leveraging sot-mram. In: 2019 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 378–383. IEEE, New York (2019)
Angizi, S., Fan, D.: Imc: energy-efficient in-memory convolver for accelerating binarized deep neural network. In: Proceedings of the Neuromorphic Computing Symposium, pp. 1–8 (2017)
Angizi, S., Sun, J., Zhang, W., Fan, D.: Pim-aligner: a processing-in-mram platform for biological sequence alignment. In: 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1265–1270. IEEE, New York (2020)
Angizi, S., Sun, J., Zhang, W., Fan, D.: Aligns: A processing-in-memory accelerator for dna short read alignment leveraging sot-mram. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE, New York (2019)
Zhou, S., Wu, Y., Ni, Z., et al.: Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542. Springer, Berlin (2016)
Shafiee, A., Nag, A., Muralimanohar, N., et al.: ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44(3), 14–26 (2016)
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: A pipelined reram-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552. IEEE, New York (2017)
Angizi, S., He, Z., Reis, D., et al.: Accelerating deep neural networks in processing-in-memory platforms: Analog or digital approach? In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 197–202. IEEE, New York (2019)
Jain, S., Sengupta, A., Roy, K., Raghunathan, A.: RX-CAFFE: Framework for evaluating and training deep neural networks on resistive crossbars. arXiv preprint arXiv:1809.00072 (2018)
Cavigelli, L., Magno, M., Benini, L.: Accelerating real-time embedded scene labeling with convolutional networks. In: Proceedings of the 52nd Annual Design Automation Conference, pp. 1–6 (2015)
Angizi, S., He, Z., Fan, D.: Dima: a depthwise cnn in-memory accelerator. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE, New York (2018)
Roohi, A., Taheri, M., Angizi, S., Fan, D.: Rnsim: Efficient deep neural network accelerator using residue number systems. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–9. IEEE, New York (2021)
Reis, D., Gao, D., Angizi, S., et al.: Modeling and benchmarking computing-in-memory for design space exploration. In: Proceedings of the 2020 on Great Lakes Symposium on VLSI (2020), pp. 39–44
Dong, X., Xu, C., Xie, Y., Jouppi, N.P.: NVSIM: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 31(7), 994–1007 (2012)
DRAM Power Model. https://www.rambus.com/energy/
Jain, S., Ranjan, A., Roy, K., Raghunathan, A.: Computing in memory with spin-transfer torque magnetic RAM. IEEE Trans. Very Large Scale Integr. VLSI Syst. 26(3), 470–483 (2018)
Tang, T., Xia, L., Li, B., et al.: Binary convolutional neural network on rram. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 782–787. IEEE, New Your (2017)
(2011) Ncsu eda freepdk45. [Online]. http://www.eda.ncsu.edu/wiki/FreePDK45:Contents
Synopsys, Inc., Synopsys design compiler, product version 14.9.2014 (2014)
Chen, K., Li, S., Muralimanohar, N., et al.: CACTI-3DD: Architecture-level modeling for 3d die-stacked dram main memory. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), 2012, pp. 33–38. IEEE, New York (2012)
Behin-Aein, B., Datta, D., Salahuddin, S., Datta, S.: Proposal for an all-spin logic device with built-in memory. Nat. Nanotechnol. 5(4), 266 (2010)
Nikonov, D.E., Bourianoff, G.I., Ghani, T.: Proposal of a spin torque majority gate logic. IEEE Electron Device Lett. 32(8), 1128–1130 (2011)
Roohi, A., Menbari, B., Shahbazi, E., Kamrani, M.: A genetic algorithm based logic optimization for majority gate-based qca circuits in nanoelectronics. Quantum Matter 2(3), 219–224 (2013)
Roohi, A., Zand, R., DeMara, R.F.: Synthesis of normally-off boolean circuits: An evolutionary optimization approach utilizing spintronic devices. In: 2018 19th International Symposium on Quality Electronic Design (ISQED), pp. 49–54. IEEE, New York (2018)
Roohi, A., DeMara, R.F.: Nv-clustering: Normally-off computing using non-volatile datapaths. IEEE Trans. Comput. 67(7), 949–959 (2018)
Roohi, A., DeMara, R.F.: IRC cross-layer design exploration of intermittent robust computation units for IoTs. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 354–359. IEEE, New York (2019)
Kimura, H., Fuchikami, T., Maramoto, K., et al.: A 2.4 pj ferroelectric-based non-volatile flip-flop with 10-year data retention capability. In: , 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 21–24. IEEE, New York (2014)
Roohi, A., DeMara, R.F.: PARC: A novel design methodology for power analysis resilient circuits using spintronics. IEEE Trans. Nanotechnol. 18, 885–889 (2019)
Roohi, A., DeMara, R.F., Wang, L., Köse, S.: Secure intermittent-robust computation for energy harvesting device security and outage resilience. In: 2017 IEEE International Conference on Advanced and Trusted Computing (ATC), pp. 1–5. IEEE, New York (2017)
Roohi, A., Zand, R., DeMara, R.F.: Logic-encrypted synthesis for energy-harvesting-powered spintronic-embedded datapath design. In: Proceedings of the 2018 on Great Lakes Symposium on VLSI, pp. 9–14 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Roohi, A., Angizi, S., Fan, D. (2023). Enabling Edge Computing Using Emerging Memory Technologies: From Device to Architecture. In: Iranmanesh, A. (eds) Frontiers of Quality Electronic Design (QED). Springer, Cham. https://doi.org/10.1007/978-3-031-16344-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-16344-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16343-2
Online ISBN: 978-3-031-16344-9
eBook Packages: Computer ScienceComputer Science (R0)