Abstract
Spiking Neural Network (SNN), originating from the neural behavior in biology, has been recognized as one of the next-generation neural networks. Conventionally, SNNs can be obtained by converting from pre-trained Artificial Neural Networks (ANNs) by replacing the non-linear activation with spiking neurons without changing the parameters. In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch, especially for ANNs that are trained with batch normalization (BN) layers. To tackle the activation mismatch issue, we first provide a theoretical analysis by decomposing local layer-wise conversion error, and then quantitatively measure how this error propagates throughout the layers using the second-order analysis. Motivated by the theoretical results, we propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch. To further remove the dependency on data, we propose a privacy-preserving conversion regime by distilling synthetic data from source ANN and using it to calibrate the SNN. Extensive experiments for the proposed algorithms are performed on modern architectures and large-scale tasks including ImageNet classification and MS COCO detection. We demonstrate that our method can handle the SNN conversion and effectively preserve high accuracy even in 32 time steps. For example, our calibration algorithms can increase up to 63% accuracy when converting MobileNet against baselines.
Similar content being viewed by others
References
Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G. J., Taba, B., Beakes, M. P., Brezzo, B., Kuang, J. B., Manohar, R., Risk, W. P., Jackson, B. L., & Modha, D. S. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.
Barbi, M., Chillemi, S., Di Garbo, A., & Reale, L. (2003). Stochastic resonance in a sinusoidally forced LIF model with noisy threshold. Biosystems, 71(1–2), 23–28.
Bengio, Y., Léonard, N., & Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
Bi, G., & Poo, M. (1998). Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24), 10464–10472.
Botev, A., Ritter, H., & Barber, D. (2017). Practical gauss-newton optimisation for deep learning. In International conference on machine learning (pp. 557–565). PMLR.
Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., & Huang, T. (2021). Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In International conference on learning representations.
Bu, T., Ding, J., Yu, Z., & Huang, T. (2022). Optimized potential initialization for low-latency spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence (pp. 11–20).
Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C., & Tian, Q. (2019). Data-free learning of student networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3514–3522).
Chowdhury, S. S., Rathi, N., & Roy, K. (2021). One timestep is all you need: Training spiking neural networks with ultra low latency. arXiv preprint arXiv:2110.05929.
Christensen, D. V., Dittmann, R., Linares-Barranco, B., Sebastian, A., Le Gallo, M., Redaelli, A., Slesazeck, S., Mikolajick, T., Spiga, S., Menzel, S., Valov, I., Milano, G., Ricciardi, C., Liang, S.-J., Miao, F., Lanza, M., Quill, T. J., Keene, S. T., Salleo, A., & Pryds, N. (2022). 2022 roadmap on neuromorphic computing and engineering. Neuromorphic Computing and Engineering, 2(2), 022501.
Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24(18), R921–R929.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 113–123).
Davies, M., Srinivasa, N., Lin, T. H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C.-K., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., & Wang, H. (2018). Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
Deng, L., Wu, Y., Hu, X., Liang, L., Ding, Y., Li, G., Zhao, G., Li, P., & Xie, Y. (2020). Rethinking the performance comparison between SNNS and ANNS. Neural Networks, 121, 294–307.
Deng, S., & Gu, S. (2021). Optimal conversion of conventional artificial neural networks to spiking neural networks. In International conference on learning representationshttps://openreview.net/forum?id=FZ1oTwcXchK.
Deng, S., Li, Y., Zhang, S., & Gu, S. (2022). Temporal efficient training of spiking neural network via gradient re-weighting. In International conference on learning representationshttps://openreview.net/forum?id=_XNtisL32jv.
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552.
Diehl, P. U., Neil, D., Binas, J., Cook, M., Liu, S. C., & Pfeiffer, M. (2015). Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN) (pp. 1–8). IEEE.
Diehl, P. U., Zarrella, G., Cassidy, A., Pedroni, B. U., & Neftci, E. (2016). Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware. In 2016 IEEE international conference on rebooting computing (ICRC) (pp. 1–8). IEEE.
Ding, J., Yu, Z., Tian, Y., & Huang, T. (2021). Optimal ann-snn conversion for fast and accurate inference in deep spiking neural networks. In Zhou, Z. H. (ed) Proceedings of the thirtieth international joint conference on artificial intelligence, ijcai-21. international joint conferences on artificial intelligence organization (pp. 2328–2336). https://doi.org/10.24963/ijcai.2021/321https://doi.org/10.24963/ijcai.2021/321, main Track.
Dong, X., Chen, S., & Pan, S. (2017a). Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in Neural Information Processing Systems,30.
Dong, X., Chen, S., & Pan, S. (2017b). Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in Neural Information Processing Systems.
Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., & Keutzer, K. (2019). Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 293–302).
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., & Tian, Y. (2021). Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2661–2671).
Furber, S. B., Galluppi, F., Temple, S., et al. (2014). The spinnaker project. Proceedings of the IEEE, 102(5), 652–665.
Gu, P., Xiao, R., Pan, G., & Tang, H. (2019). STCA: Spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In IJCAI (Vol. 15, pp. 1366–1372).
Han, B., & Roy, K. (2020). Deep spiking neural network: Energy efficiency through time based coding. In European conference on computer vision.
Han, B., Srinivasan, G., & Roy, K. (2020). Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13558–13567).
Hassibi, B., & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. Advances in Neural Information Processing Systems, 5, 164–171.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition.
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., & Li, M. (2019). Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 558–567).
Hebb, D. O. (2005). The organization of behavior: A neuropsychological theory. Psychology Press.
Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500–544.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Iakymchuk, T., Rosado-Muñoz, A., Guerrero-Martínez, J. F., Bataller-Mompeán, M., & Francés-Villora, J. V. (2015). Simplified spiking neural network architecture and stdp learning algorithm applied to image classification. EURASIP Journal on Image and Video Processing, 1, 1–11.
Ikegawa, S. I., Saiin, R., Sawada, Y., & Natori, N. (2022). Rethinking the role of normalization and residual blocks for spiking neural networks. Sensors, 22(8), 2876.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448–456). PMLR.
Iyer, L. R., & Chua, Y. (2020). Classifying neuromorphic datasets with tempotron and spike timing dependent plasticity. In 2020 international joint conference on neural networks (IJCNN) (pp. 1–8). IEEE.
Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Transactions on Neural Networks, 14(6), 1569–1572.
Kheradpisheh, S. R., Ganjtabesh, M., Thorpe, S. J., & Masquelier, T. (2018). STDP-based spiking deep convolutional neural networks for object recognition. Neural Networks, 99, 56–67.
Kim, S., Park, S., Na, B., & Yoon, S. (2020). Spiking-yolo: Spiking neural network for energy-efficient object detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11270–11277).
Kim, Y., & Panda, P. (2021). Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Frontiers in Neuroscience, 15(773), 954.
Kim, Y., Li, Y., Park, H., Venkatesha, Y., & Panda, P. (2022). Neural architecture search for spiking neural networks. In European conference on computer vision (pp. 36–56). Cham: Springer Nature Switzerland.
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Krizhevsky, A., Nair, V., & Hinton, G. (2010). Cifar-10 (canadian institute for advanced research). 5(4), 1 https://www.cs.toronto.edu/~kriz/cifar.html.
Lee, C., Panda, P., Srinivasan, G., & Roy, K. (2018). Training deep spiking convolutional neural networks with stdp-based unsupervised pre-training followed by supervised fine-tuning. Frontiers in Neuroscience, 12, 373945.
Lee, C., Sarwar, S. S., Panda, P., Srinivasan, G., & Roy, K. (2020). Enabling spike-based backpropagation for training deep neural network architectures. Frontiers in Neuroscience, 14, 497482.
Lee, J. H., Delbruck, T., & Pfeiffer, M. (2016). Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience, 10, 508.
Li S. L., & Li, J. P. (2019). Research on learning algorithm of spiking neural network. In 2019 16th international computer conference on wavelet active media technology and information processing (pp. 45–48). IEEE.
Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60.
Li, Y., & Zeng, Y. (2022). Efficient and accurate conversion of spiking neural network with burst spikes. arXiv preprint arXiv:2204.13271.
Li, Y., Dong, X., & Wang, W. (2020b). Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International conference on learning representations https://openreview.net/forum?id=BkgXT24tDS.
Li, Y., Deng, S., Dong, X., Gong, R., & Gu, S. (2021). A free lunch from ANN: Towards efficient, accurate spiking neural networks calibration. In International conference on machine learning (pp. 6316–6325). PMLR.
Li, Y., Gong, R., Tan, X., Yang, Y., Hu, P., Zhang Q., Yu, F., Wang, W., & Gu S. (2021b). Brecq: Pushing the limit of post-training quantization by block reconstruction. In International conference on learning representations https://openreview.net/forum?id=POWv6hDd9XH.
Li, Y., Guo, Y., Zhang, S., Deng, S., Hai, Y., & Gu, S. (2021). Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in Neural Information Processing Systems, 34, 23426–23439.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755). Springer International Publishing.
Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Liu, Y. H., & Wang, X. J. (2001). Spike-frequency adaptation of a generalized leaky integrate-and-fire model neuron. Journal of Computational Neuroscience, 10(1), 25–45.
Liu, Z., Wu, Z., Gan, C., Zhu, L., & Han, S. (2020). Datamix: Efficient privacy-preserving edge-cloud inference. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16 (pp. 578–595). Springer International Publishing.
Lobov, S. A., Mikhaylov, A. N., Shamshin, M., Makarov, V. A., & Kazantsev, V. B. (2020). Spatial properties of STDP in a self-learning spiking neural network enable controlling a mobile robot. Frontiers in Neuroscience, 14, 491341.
Loshchilov, I., & Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.
Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., & Luo, Z. Q. (2022). Training high-performance low-latency spiking neural networks by differentiation on spike representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12444–12453).
Miquel, J. R., Tolu, S., Scholler, F. E., & Galeazzi, R. (2021). Retinanet object detector based on analog-to-spiking neural network conversion. In 2021 8th International Conference on Soft Computing & Machine Intelligence (ISCMI) (pp. 201–205).
Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neural networks. https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html.
Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51–63.
Pei, J., Deng, L., Song, S., Zhao, M., Zhang, Y., Wu, S., Wang, Y., Wu, Y., Yang, Z., Ma, C., Li, G., Han, W., Li, H., Wu, H., Zhao, R., Xie, Y., & Shi, L. P. (2019). Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature, 572(7767), 106–111.
Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollar, P. (2020). Designing network design spaces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10428–10436).
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision (pp. 525–542). Cham: Springer International Publishing.
Rathi, N., & Roy, K. (2021). Diet-snn: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Transactions on Neural Networks and Learning Systems, 34(6), 3174–3182.
Rathi, N., Srinivasan, G., Panda, P., & Roy, K. (2019). Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In International conference on learning representations.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
Roy, D., Chakraborty, I., & Roy, K. (2019). Scaling deep spiking neural networks with binary stochastic activations. In 2019 IEEE International Conference on Cognitive Computing (ICCC) (pp. 50–58). IEEE.
Roy, K., Jaiswal, A., & Panda, P. (2019). Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784), 607–617.
Rueckauer, B., Lungu, I. A., Hu, Y., & Pfeiffer, M. (2016). Theory and tools for the conversion of analog to spiking convolutional neural networks. arXiv: Statistics/Machine Learning (1612.04052).
Rueckauer, B., Lungu, I. A., Hu, Y., Pfeiffer, M., & Liu, S. C. (2017). Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in Neuroscience, 11, 294078.
Santurkar, S., Tsipras, D., Ilyas, A., & Madry, A. (2018). How does batch normalization help optimization?. Advances in Neural Information Processing Systems, 31.
Sengupta, A., Ye, Y., Wang, R., Liu, C., & Roy, K. (2019). Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience, 13, 425055.
Shrestha, S. B., & Orchard, G. (2018). Slayer: Spike layer error reassignment in time. Advances in Neural Information Processing Systems, 31, 1412–1421.
Silver, D., Huang, A., Maddison, C. J., Guez, Arthur, Sifre, Laurent, van den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, Dieleman, Sander, Grewe, Dominik, Nham, John, Kalchbrenner, Nal, Sutskever, Ilya, Lillicrap, Timothy, Leach, Madeleine, Kavukcuoglu, Koray, Graepel, Thore, & Hassabis, Demis. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Suetake, K., Ikegawa, S. I., Saiin, R., & Sawada, Y. (2023). S3NN: Time step reduction of spiking surrogate gradients for training energy efficient single-step spiking neural networks. Neural Networks, 159, 208–219.
Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295–2329.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2820–2828).
Tavanaei, A., Ghodrati, M., Kheradpisheh, S. R., Masquelier, T., & Maida, A. (2019). Deep learning in spiking neural networks. Neural Networks, 111, 47–63.
Theis, L., Korshunova, I., Tejani, A., & Huszar, F. (2018). Faster gaze prediction with dense networks and fisher pruning. arXiv preprint arXiv:1801.05787.
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Junhyuk, O., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., & Silver, D. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
Wang, Y., Zhang, M., Chen, Y., & Qu, H. (2022). Signed neuron with memory: Towards simple, accurate and high-efficient ANN-SNN conversion. In International joint conference on artificial intelligence (pp. 2501–2508).
Wu, J., Chua, Y., Zhang, M., Li, G., Li, H., & Tan, K. C. (2021). A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems, 34(1), 446–460.
Wu, J., Xu, C., Han, X., Zhou, D., Zhang, M., Li, H., & Tan, K. C. (2021). Progressive tandem learning for pattern recognition with deep spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7824–7840.
Wu, Y., Deng, L., Li, G., & Shi, L. (2018). Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience, 12, 331.
Wu, Y., Zhao, R., Zhu, J., Chen, F., Xu, M., Li, G., Song, S., Deng, L., Wang, G., Zheng, H., Pei, J., Zhang, Y., Zhao, M., & Shi, L. (2022). Brain-inspired global-local learning incorporated with neuromorphic computing. Nature Communications, 13(1), 1–14.
Xiao, M., Meng, Q., Zhang, Z., Wang, Y., & Lin, Z. (2021). Training feedback spiking neural networks by implicit differentiation on the equilibrium state. Advances in Neural Information Processing Systems, 34, 14516–14528.
Xiao, M., Meng, Q., Zhang, Z., He, D., & Lin, Z. (2022). Online training through time for spiking neural networks. Advances in Neural Information Processing Systems, 35, 20717-20730.
Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., Jha, N. K. & Kautz, J. (2020). Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8715–8724).
Zheng, H., Wu, Y., Deng, L., Hu, Y., & Li, G. (2021). Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11062–11070).
Acknowledgements
This project is supported by NSFC Key Program 62236009, Shenzhen Fundamental Research Program (General Program) JCYJ20210324140807019, NSFC General Program 61876032, and Key Laboratory of Data Intelligence and Cognitive Computing, Longhua District, Shenzhen. Yuhang Li is supported by the Baidu PhD Fellowship Program and completed this work during his prior research assistantship at UESTC. The authors would like to thank Youngeun Kim for helpful feedback on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Yasuyuki Matsushita.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Major Proof
In order to prove Theorem 4.1, we need to introduce two lemmas.
Lemma A.1
The error term \(\textbf{e}_r^{({n})}\) can be computed using the error from former layer as
where \({\textbf{B}^{(n)}} \) is a diagonal matrix whose values are the derivative of ReLU activation.
Proof
Without loss of generality, we will show that the equation holds for any two consecutive layers \((\ell +1)\) and \((\ell )\). Recall that the error term \(\textbf{e}_c^{(\ell +1)}\) is the error between different input activation. Suppose \(f(\textbf{a})=\textrm{ReLU}(\textbf{Wa})\) and \(\textbf{e}_r^{(\ell +1)}=f(\textbf{x}^{(\ell )}) - f(\bar{\textbf{s}}^{(\ell )})\), we can rewrite \(\textbf{e}_r^{(\ell +1)}\) also using the Taylor expansion, give by
where \(\textbf{e}^{(\ell )}\) is the difference between \(\textbf{x}^{(\ell )}\) and \(\bar{\textbf{s}}^{(\ell )}\). For the first order derivative, we easily write it as
where \(\textbf{B}^{(\ell +1)}\in \{0, 1\}^{c_{\ell +1}\times c_{\ell +1}}\) is a diagonal matrix, each element on the diagonal is the derivative of the ReLU function. \(c_{\ell +1}\) denotes the number of neurons in layer \(\ell +1\). To calculate the second order derivative, we need to differentiate the above equation with respect to \(\textbf{x}^{(\ell )}\). First, the matrix \(\textbf{B}^{(\ell +1)}\) is not a function of \(\textbf{x}^{(\ell )}\) since it consists of only constants. Second, weight parameters \(\textbf{W}^{(\ell )}\) also is not a function of \(\textbf{x}^{(\ell )}\). Therefore, we can safely ignore the second-order term and other higher order terms in Eq. (2). Thus, Eq. (A1) holds. \(\square \)
Lemma A.2
((Botev et al., 2017) Eq. (8)) The Hessian matrix of activation in layer \((\ell )\) can be recursively computed by
Proof
Let \(\textbf{H}_{a, b}^{(\ell )}\) be the Hessian matrix (loss L with respect to \(\ell \)-th layer activation). We can calculate it by
Note that term \(\frac{\partial \textbf{x}_{i}^{(\ell +1)}}{\partial \textbf{z}_{i}^{(\ell )}}\) is the derivative of the ReLU function, which is a constant and can be either 0 or 1, therefore it has no gradient with respect to \(\textbf{x}_{b}^{(\ell )}\). Further differentiating the above equation, we have
To this end, we can rewrite it using matrix form. Thus, the Lemma holds. \(\square \)
Now, we can prove our theorem with the above two lemmas.
Proof
According to Lemma A.1, the error in the last layer can be rewritten by
Applying this equation to the second-order objective, we have
According to Lemma A.2, the Hessian of activation can be derived recursively as
Thus, the first term in Eq. (13) can be rewritten to \(\textbf{e}^{(n-1), \top }\)\(\textbf{H}^{(n-1)}\textbf{e}^{(n-1)}\). For the second term, we can upper bound it using Cauchy-Schwarz inequality \((x^\top Ay \le \sqrt{x^\top A xy^\top A y})\) and inequality of arithmetic and geometric mean \((\sqrt{x^\top A xy^\top A y}) \le \frac{1}{2}(x^\top A x+y^\top A y))\) by treating A as \(\textbf{H}^{(n)}\). Therefore, the second term is upper bounded by
To this end, we can rewrite Eq. (13) as
\(\square \)
Appendix B Generalization to Under/Over-Fire Error
Here, we show that our analysis of error propagation and the calibration algorithm can generalize to under-fire and over-fire errors. Recall that in the case when \(\textbf{v}^{\ell }(T)\) is not in \(\left[ 0, V^{(\ell )}_{th}\right) \), under-fire and over-fire happen and we could not use a closed-form expression for \(\bar{\textbf{s}}^{(\ell +1)}\). Here, we use function \(\textrm{AvgIF}\) to denote the function of the averaged spike from the IF neuron:
The \(\mathrm {AvgIF(\textbf{W}^{\ell }\bar{\textbf{s}}^{(\ell )})}\) returns \(\frac{m}{T}V_{th}^{\ell }\) where m is the number of output spikes. Note that with under/over-fire error the m cannot be analytically computed. Similarly, we define \(\textbf{e}_c^{\ell }=\textrm{ReLU}(\textbf{W}^{\ell }\bar{\textbf{s}}^{(\ell )})-\textrm{AvgIF}(\textbf{W}^{\ell }\bar{\textbf{s}}^{(\ell )})\) as the local conversion error, which contains clipping error, flooring error, over-fire error, and under-fire error. Then, we can rewrite Eq. (13) as
Here, the accumulated error \(\textbf{e}_r^{(n)}\) is the same with Eq. (13) and can be eliminated using second-order analysis. As for \(\textbf{e}_c\), it can be addressed by our parameter calibration algorithm since our method is a data-driven approach. During calibration, the actual \(\textbf{e}_c\) in each layer is computed using \(\bar{\textbf{s}}^{(\ell +1)}-\textbf{x}^{(\ell )}\). Therefore, all errors can be reduced by calibration.
Appendix C Experiments Details
1.1 C.1 ImageNet Pre-training
The ImageNet dataset (Deng et al., 2009) contains 120 M training images and 50 k validation images. For training pre-processing, we random crop and resize the training images to 224 \(\times \) 224. We additionally apply CollorJitter with brightness = 0.2, contrast = 0.2, saturation = 0.2, and hue = 0.1. For test images, they are center-cropped to the same size. For all architectures we tested, the Max Pooling layers are replaced to Average Pooling layers. The ResNet-34 contains a deep-stem layer (i.e., three 3\(\times \)3 conv. layers to replace the original 7 \(\times \) 7 first conv. layer) as described in (He et al., 2019). We use Stochastic Gradients Descent with a momentum of 0.9 as the optimizer. The learning rate is set to 0.1 and followed by a cosine decay schedule (Loshchilov & Hutter, 2016). Weight decay is set to \(10^{-4}\), and the networks are optimized for 120 epochs. We also apply label smooth (Szegedy et al., 2016)(factor = 0.1) and EMA update with 0.999 decay rate to optimize the model. For the MobileNet pre-trained model, we download it from pytorchcvFootnote 2.
1.2 C.2 CIFAR Pre-training
The CIFAR 10 and CIFAR100 dataset (Krizhevsky et al., 2010) contains 50k training images and 10k validation images. We set padding to 4 and randomly cropped the training images to 32 \(\times \) 32. Other data augmentations include (1)random horizontal flip, (2) Cutout (DeVries & Taylor, 2017) and (3) AutoAugment (Cubuk et al., 2019). For ResNet-20, we follow prior works (Han et al., 2020; Han & Roy, 2020) who modify the official network structures proposed in (He et al., 2016) to make a fair comparison. The modified ResNet-20 contains 3 stages with 4x more channels. For VGG-16 without BN layers, we add Dropout with a 0.25 drop rate to regularize the network. For the model with BN layers, we use Stochastic Gradients Descent with a momentum of 0.9 as the optimizer. The learning rate is set to 0.1 and followed by a cosine decay schedule (Loshchilov & Hutter, 2016). Weight decay is set to \(5\times 10^{-4}\) and the networks are optimized for 300 epochs. For networks without BN layers, we set weight decay to \(10^{-4}\) and learning rate to 0.005.
1.3 C.3 COCO Pre-training
COCO dataset (Lin et al., 2014) contains 121,408 images, with 883k objection annotations, and 80 image classes. The median image resolution is 640 \(\times \) 480. The backbone ResNet50 is pre-trained on ImageNet dataset. We train RetinaNet and Faster R-CNN with 12 epochs, using SGD with the learning rate of 0.000625, decayed at 8, 11-th epoch by a factor of 10. The momentum is 0.9 and we use linear warmup learning in the first epoch. The batch size per GPU is set to 2 and in total we use 8 GPUs for pre-training. The weight decay is set to \(1e-4\). To stabilize the training, we freeze the update in the BN layers (as they have low performance in small batch-size training), and the update in the first stage and stem layer of the backbone.
1.4 C.4 COCO Conversion Details
For the calibration dataset, we use 128 training images taken from the MS COCO dataset for calibration. The image resolution is set to 800 (max size 1333). Note that in object detection the image resolution is varying image-by-image. Therefore we cannot set an initial membrane potential at a space dimension (which requires a fixed dimension). Similar to bias calibration, we can compute the reduced mean value in each channel. Here we set the initial membrane potential as the \(T\times \mu (\textbf{e}^{(i)})\). Learning hyper-parameters are kept the same with ImageNet experiments.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Deng, S., Dong, X. et al. Error-Aware Conversion from ANN to SNN via Post-training Parameter Calibration. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02046-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11263-024-02046-2