CMOS Processors and Memories pp 97-138 | Cite as

# CMOL/CMOS Implementations of Bayesian Inference Engine: Digital and Mixed-Signal Architectures and Performance/Price – A Hardware Design Space Exploration

## Abstract

In this chapter, we focus on aspects of the hardware implementation of the Bayesian inference framework within the George and Hawkins’ computational model of the visual cortex. This framework is based on Judea Pearl’s Belief Propagation. We then present a “hardware design space exploration” methodology for implementing and analyzing the (digital and mixed-signal) hardware for the Bayesian (polytree) inference framework. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom architectures using both traditional CMOS and hybrid nanotechnology CMOL, and investigating the baseline performance/price of these hardware architectures. The results suggest that hybrid nanotechnology is a promising candidate to implement Bayesian inference. Such implementations utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32–40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a single PC implementation. The assessment of such hypothetical hardware architectures provides a baseline for large-scale implementations of Bayesian inference, and in general, will help guide research trends in intelligent computing (including neuro/cognitive Bayesian systems), and the use of radical new device and circuit technology in these systems.

## Keywords

Bayesian Inference Pearl - belief propagation Cortex CMOS CMOL Nanotechnology Nanogrid Digital Mixed-signal Hardware Nanoarchitectures Methodology Performance Price## Notes

### Acknowledgment

Useful discussions with many colleagues, including Prof. K.K. Likharev, Dr. Changjian Gao, and Prof. G.G. Lendaris are gratefully acknowledged.

## References

- 1.M.S. Zaveri, D. Hammerstrom, CMOL/CMOS implementations of Bayesian polytree inference: digital & mixed-signal architectures and performance/price.
*IEEE Trans. Nanotechnology***9**(2), 194–211 (2010). DOI: 10.1109/TNANO.2009.2028342 - 2.D. Hammerstrom, M.S. Zaveri, Prospects for building cortex-scale CMOL/CMOS circuits: a design space exploration, in
*Proceedings of IEEE Norchip Conference*(Trondheim, Norway, 2009)Google Scholar - 3.C. Gao, D. Hammerstrom, Cortical models onto CMOL and CMOS – architectures and performance/price. IEEE Trans Circ. Syst-I
**54**, 2502–2515 (2007)MathSciNetCrossRefGoogle Scholar - 4.S. Borkar, Electronics beyond nano-scale CMOS, in
*Proceedings of 43rd Annual ACM/IEEE Design Automation Conf.*(San Francisco, CA, 2006), pp. 807–808Google Scholar - 5.R.I. Bahar, D. Hammerstrom, J. Harlow, W.H.J. Jr., C. Lau, D. Marculescu, A. Orailoglu, M. Pedram, Architectures for silicon nanoelectronics and beyond,
*IEEE Computer***40**, 25–33 (2007)CrossRefGoogle Scholar - 6.D. Hammerstrom, A survey of bio-inspired and other alternative architectures, in
*Nanotechnology: Information Technology-II*, ed. by R. Waser, vol. 4 (Wiley-VCH Verlag GmbH: Weinheim, Germany, 2008), pp. 251–285Google Scholar - 7.Intel, 60 years of the transistor: 1947–2007, Intel Corp., Hillsboro, OR (2007), http://www.intel.com/technology/timeline.pdf
- 8.V. Beiu, Grand challenges of nanoelectronics and possible architectural solutions: what do Shannon, von Neumann, Kolmogorov, and Feynman have to do with Moore, in
*Proceedings of 37th IEEE International Symposium on Multiple-Valued Logic*, Oslo, Norway, 2007Google Scholar - 9.D.B. Strukov, K.K. Likharev, CMOL FPGA: a reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices. Nanotechnology
**16**, 888–900 (2005)CrossRefGoogle Scholar - 10.Ö. Türel, J.H. Lee, X. Ma, K. K. Likharev, Architectures for nanoelectronic implementation of artificial neural networks: new results,
*Neurocomputing***64**, 271–283 (2005)Google Scholar - 11.K.K. Likharev, D.V. Strukov, CMOL: devices, circuits, and architectures, in
*Introduction to molecular electronics*, ed. by G. Cuniberti, G. Fagas, K. Richter (Springer, Berlin, 2005), pp. 447–478Google Scholar - 12.D.B. Strukov, K.K. Likharev, Reconfigurable hybrid CMOS/nanodevice circuits for image processing. IEEE Trans. Nanotechnol.
**6**, 696–710 (2007)CrossRefGoogle Scholar - 13.G. Snider, R. Williams, Nano/CMOS architectures using a field-programmable nanowire interconnect. Nanotechnology
**18**, 1–11 (2007)Google Scholar - 14.NAE, Reverse-engineer the brain, Grand challenges for engineering (The U.S. National Academy of Engineering (NAE) of The National Academies, Washington, DC, [online], 2008), http://www.engineeringchallenges.org. Accessed 15 February 2008
- 15.R. Ananthanarayanan, D.S. Modha, Anatomy of a cortical simulator, in
*ACM/IEEE Conference on High Performance Networking and Computing: Supercomputing*, Reno, NV, 2007Google Scholar - 16.D. George, J. Hawkins, A hierarchical Bayesian model of invariant pattern recognition in the visual cortex, in
*Proceedings of International Joint Conference on Neural Networks*(Montreal, Canada, 2005), pp. 1812–1817Google Scholar - 17.T.S. Lee, D. Mumford, Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A. Opt. Image Sci. Vis.
**20**, 1434–1448 (July 2003)CrossRefGoogle Scholar - 18.T. Dean, Learning invariant features using inertial priors,
*Annals of Mathematics and Artificial Intelligence***47**, 223–250 (2006)Google Scholar - 19.G.G. Lendaris, On Systemness and the problem solver: tutorial comments. IEEE Trans. Syst. Man Cy.
**16**, 604–610 (1986)Google Scholar - 20.M.S. Zaveri, CMOL/CMOS hardware architectures and performance/price for Bayesian memory – The building block of intelligent systems, Ph.D. dissertation, Department of Electrical and Computer Engineering, Portland State University, Portland, OR, October 2009Google Scholar
- 21.K.L. Rice, T.M. Taha, C.N. Vutsinas, Scaling analysis of a neocortex inspired cognitive model on the Cray XD1,
*J. Supercomput.***47**, 21–43 (2009)Google Scholar - 22.D. George, A mathematical canonical cortical circuit model that can help build future-proof parallel architecture, Workshop on Technology Maturity for Adaptive Massively Parallel Computing (Intel Inc., Portland, OR, March 2009), http://www.technologydashboard.com/adaptivecomputing/Presentations/MPAC%20Portland__Dileep.pdf
- 23.C. Gao, M.S. Zaveri, D. Hammerstrom, CMOS / CMOL architectures for spiking cortical column, in
*Proceedings of IEEE World Congress on Computational Intelligence – International Joint Conference on Neural Networks*, Hong Kong, 2008, pp. 2442–2449Google Scholar - 24.
- 25.D. Hammerstrom, Digital VLSI for neural networks, in
*The Handbook of Brain Theory and Neural Networks*, ed. by M.A. Arbib (MIT Press, Cambridge, MA, 1998), pp. 304–309Google Scholar - 26.J. Bailey, D. Hammerstrom, Why VLSI implementations of associative VLCNs require connectionmultiplexing, in
*Proceedings of IEEE International Conference on Neural Networks*(San Diego, CA, 1988), pp. 173–180Google Scholar - 27.J. Schemmel, J. Fieres, K. Meier, Wafer-scale integration of analog neural networks, in
*Proc. IEEE World Congress on Computational Intelligence – International Joint Conference on Neural Networks*(Hong Kong, 2008), pp. 431–438Google Scholar - 28.K.A. Boahen, Point-to-point connectivity between neuromorphic chips using address events. IEEE Trans. Circ. Syst. II: Anal. Dig. Sig. Process.
**47**, 416–434 (2000)MATHCrossRefGoogle Scholar - 29.D. George, B. Jaros, The HTM learning algorithms (Numenta Inc., Menlo Park, CA, Whitepaper, March 2007), http://www.numenta.com/for-developers/education/Numenta_HTM_Learning_Algos.pdf
- 30.K.L. Rice, T.M. Taha, and C.N. Vutsinas, Hardware acceleration of image recognition through a visual cortex model,
*Optics Laser Tech.***40**, 795–802 (2008)Google Scholar - 31.C.N. Vutsinas, T.M. Taha, K.L. Rice, A neocortex model implementation on reconfigurable logic with streaming memory, in
*IEEE International Symposium on Parallel and Distributed Processing*(Miami, FL, 2008), pp. 1–8Google Scholar - 32.R.C. O’Reilly, Y. Munakata, J.L. McClelland,
*Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, 1st edn.*(MIT Press, Cambridge, MA, 2000)Google Scholar - 33.J. Hawkins, D. George, Hierarchical temporal memory: Concepts, theory and terminology (Numenta Inc., Menlo Park, CA, Whitepaper, March 2007), http://www.numenta.com/Numenta_HTM_Concepts.pdf
- 34.J. Hawkins, S. Blakeslee,
*On Intelligence*(New York: Times Books, Henry Holt, 2004)Google Scholar - 35.D. Hammerstrom, M.S. Zaveri, Bayesian memory, a possible hardware building block for intelligent systems, AAAI Fall Symp. Series on Biologically Inspired Cognitive Architectures (Arlington, VA) (AAAI Press, Menlo Park, CA, TR FS-08–04, Nov. 2008), p. 81Google Scholar
- 36.J. Pearl,
*Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference*(Morgan Kaufmann, San Francisco, CA, 1988)Google Scholar - 37.R. Genov, G. Cauwenberghs, Charge-mode parallel architecture for vector–matrix multiplication,
*IEEE Trans. Circ. Syst.-II***48**, 930–936 (2001)Google Scholar - 38.
- 39.C. Johansson, A. Lansner, Towards cortex sized artificial neural systems,
*Neural Networks***20**, 48–61 (2007)Google Scholar - 40.R. Granger, Brain circuit implementation: high-precision computation from low-precision components, in
*Replacement Parts for the Brain*, ed. by T. Berger, D. Glanzman (MIT Press, Cambridge, MA, 2005), pp. 277–294Google Scholar - 41.S. Minghua, A. Bermak, An efficient digital VLSI implementation of Gaussian mixture models-based classifier,
*IEEE Trans.VLSI Syst.***14**, 962–974 (2006)Google Scholar - 42.D.B. Strukov, K.K. Likharev, Defect-tolerant architectures for nanoelectronic crossbar memories,
*J. Nanosci. Nanotechnol.***7**, 151–167 (2007)Google Scholar - 43.K.K. Likharev, D.B. Strukov, Prospects for the development of digital CMOL circuits, in
*Proceedings of International Symposium on Nanoscale Architectures*(San Jose, CA, 2007), pp. 109–116Google Scholar - 44.J.M. Rabaey,
*Digital Integrated Circuits: A Design Perspective*(Prentice Hall, Upper Saddle River, NJ, 1996)Google Scholar - 45.N. Weste, D. Harris,
*CMOS VLSI Design - A Circuits and Systems Perspective, 3rd edn.*(Addison Wesley/Pearson, Boston, MA, 2004)Google Scholar - 46.M. Haselman, M. Beauchamp, A. Wood, S. Hauck, K. Underwood, K.S. Hemmert, A comparison of floating point and logarithmic number systems for FPGAs, in
*13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines*(Napa, CA, 2005), pp. 181–190Google Scholar - 47.K.K. Parhi,
*VLSI Digital Signal Processing Systems: Design and Implementation*(Wiley, New York, 1999)Google Scholar - 48.K. Seungchul, L. Yongjoo, J. Wookyeong, L. Yongsurk, Low cost floating point arithmetic unit design, in
*Proceedings of IEEE Asia-Pacific Conference on ASIC*(Taipei, Taiwan, 2002), pp. 217–220Google Scholar - 49.D.M. Lewis, 114 MFLOPS logarithmic number system arithmetic unit for DSP applications,
*IEEE J. Solid-St. Circ.***30**, 1547–1553 (1995)Google Scholar - 50.P.C. Yu, H.-S. Lee, A 2.5-V, 12-b, 5-MSample/s pipelined CMOS ADC,
*IEEE J. Solid-St. Circ.***31**, 1854–1861 (1996)Google Scholar - 51.T.E. Williams, M.A. Horowitz, A Zero-overhead self-timed 160-ns 54-b CMOS divider. IEEE J. Solid-St. Circ.
**26**, 1651–1662 (1991)CrossRefGoogle Scholar - 52.G. Gielen, R. Rutenbar, S. Borkar, R. Brodersen, J.-H. Chern, E. Naviasky, D. Saias, C. Sodini, Tomorrow’s analog: just dead or just different? in
*43rd ACM/IEEE Design Automation Conference*(San Francisco, CA, 2006), pp. 709–710Google Scholar - 53.J.N. Coleman, E.I. Chester, A 32-Bit logarithmic arithmetic unit and its performance compared to floating-point, in
*Proceedings of 14th IEEE Symposium on Computer Arithmetic*(Adelaide, Australia, 1994), pp. 142–151Google Scholar - 54.C. Gao, Hardware architectures and implementations for associative memories – The building blocks of hierarchically distributed memories, Ph.D. dissertation, Department of Electrical and Computer Engineering, Portland State University, Portland, OR, Nov 2008Google Scholar
- 55.P. Narayanan, T. Wang, M. Leuchtenburg, C.A. Moritz, Comparison of analog and digital nanosystems: Issues for the nano-architect, in
*Proc. 2nd IEEE International Nanoelectronics Conference*(Shanghai, China, 2008), pp. 1003–1008Google Scholar - 56.D. George, J. Hawkins, Belief propagation and wiring length optimization as organizing principles for cortical microcircuits (Numenta Inc., Menlo Park, CA, 2007), http://www.stanford.edu/~dil/invariance/Download/CorticalCircuits.pdf
- 57.K.K. Likharev, Hybrid CMOS/nanoelectronic circuits: opportunities and challenges.
*J. Nanoelectron. Optoelectron.***3**, 203–230 (2008)Google Scholar - 58.C. Gao, D. Hammerstrom, CMOL based cortical models, in
*Emerging brain-inspired nano-architectures*, ed. by V. Beiu and U. Rückert (Singapore: World Scientific, 2008 in press)Google Scholar - 59.M. Holler, S. Tam, H. Castro, R. Benson, An electrically trainable artificial neural network (ETANN) with 10240 “floating gate” synapses, in
*International Joint Conference on Neural Networks*(San Diego, CA, 1989), pp. 191–196Google Scholar - 60.S.H. Jo, K.-H. Kim, W. Lu, Programmable resistance switching in nanoscale two-terminal devices.
*Nano Lett.***9**, 496–500 (2009)Google Scholar