Quantized convolutional neural networks through the lens of partial differential equations

Ben-Yair, Ido; Ben Shalom, Gil; Eliasof, Moshe; Treister, Eran

doi:10.1007/s40687-022-00354-y

Quantized convolutional neural networks through the lens of partial differential equations

Research
Published: 06 September 2022

Volume 9, article number 58, (2022)
Cite this article

Research in the Mathematical Sciences Aims and scope Submit manuscript

Ido Ben-Yair¹,
Gil Ben Shalom¹,
Moshe Eliasof¹ &
…
Eran Treister ORCID: orcid.org/0000-0002-5351-0966¹

234 Accesses
1 Citation
Explore all metrics

Abstract

Quantization of convolutional neural networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs, especially on low-resource edge devices. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis. First, we harness the total variation (TV) approach to apply edge-aware smoothing to the feature maps throughout the network. This aims to reduce outliers in the distribution of values and promote piecewise constant maps, which are more suitable for quantization. Secondly, we consider symmetric and stable variants of common CNNs for image classification and graph convolutional networks for graph node classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates. As a result, stable quantized networks behave similarly to their non-quantized counterparts even though they rely on fewer parameters. We also find that at times, stability even aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained, low-power or real-time applications like autonomous driving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

A comprehensive survey on model compression and acceleration

Article 08 February 2020

Error-Aware Conversion from ANN to SNN via Post-training Parameter Calibration

Article 08 April 2024

Data availability

Data sharing is not applicable to this article as all the data sets that are used during the current study are publicly available online. The code for reproducing the results is available at https://github.com/BGUCompSci/CNNQuantizationThroughPDEs.

Notes

We assume that the ReLU activation function is used in between any convolution operator, resulting in non-negative activation maps, and can be quantized using an unsigned scheme. If a different activation function is used that is not non-negative, like \(\tanh ()\), signed quantization should be used instead.
https://github.com/VainF/DeepLabV3Plus-Pytorch

References

Alt, T., Peter, P., Weickert, J., Schrader, K.: Translating numerical concepts for PDEs into neural architectures. In: Scale Space and Variational Methods in Computer Vision: 8th International Conference, p. 294–306. Springer-Verlag, Berlin (2021)
Alt, T., Schrader, K., Augustin, M., Peter, P., Weickert, J.: Connections between numerical algorithms for PDEs and neural networks. J. Math. Imaging Vis. (2022)
Ambrosio, L., Tortorelli, V.M.: Approximation of functional depending on jumps by elliptic functional via t-convergence. Commun. Pure Appl. Math. 43, 999–1036 (1990)
Article MathSciNet MATH Google Scholar
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. NeurIPS 7948–7956 (2019)
Bengio, Y.: Estimating or propagating gradients through stochastic neurons for conditional computation. preprint arXiv:arXiv1305.2982 (2013)
Blalock, D., Ortiz, J., Frankle, J., Guttag, J.: What is the state of neural network pruning? MLSys (2020)
Bodner, B.J., Ben Shalom, G., Treister, E.: GradFreeBits: gradient free bit allocation for mixed precision neural networks. arXiv preprint arXiv:2102.09298 (2022)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Cai, W., Li, W.: Weight normalization based quantization for deep neural network compression. arXiv preprint arXiv:1907.00593 (2019)
Chamberlain, B., Rowbottom, J., Gorinova, M.I., Bronstein, M., Webb, S., Rossi, E.: GRAND: graph neural diffusion. In: Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 1407–1418 (2021)
Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001). https://doi.org/10.1109/83.902291
Article MATH Google Scholar
Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci. 5(3), 30 (2018)
Article MathSciNet MATH Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, M., Wei, Z., Huang, Z., Ding, B., Li, Y.: Simple and deep graph convolutional networks. In: 37th International Conference on Machine Learning (ICML), vol. 119, pp. 1725–1735 (2020)
Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems, pp. 6571–6583 (2018)
Chen, Y., Xie, Y., Song, L., Chen, F., Tang, T.: A survey of accelerator architectures for deep neural networks. Engineering 6(3), 264–274 (2020)
Article Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process. Mag. 35(1), 126–136 (2018)
Article Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: Parameterized clipping activation quantized neural networks. arXiv preprints (2018). arXiv:1805.06085
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Eliasof, M., Haber, E., Treister, E.: PDE-GCN: novel architectures for graph neural networks motivated by partial differential equations. In: Advances in Neural Information Processing Systems, vol. 34, pp. 3836–3849 (2021)
Eliasof, M., Treister, E.: DiffGCN: graph convolutional networks via differential operators and algebraic multigrid pooling. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. pp. 18016–18027 (2020)
Ephrath, J., Eliasof, M., Ruthotto, L., Haber, E., Treister, E.: LeanConvNets: low-cost yet effective convolutional neural networks. IEEE J. Sel. Top. Signal Process. 14(4), 894–904 (2020)
Article Google Scholar
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. arXiv preprint arXiv:1902.08153 (2019)
Gholami, A., Keutzer, K., Biros, G.: ANODE: unconditionally accurate memory-efficient gradients for neural odes. In: IJCAI, pp. 730–736 (2019)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Goodfellow, I., McDaniel, P., Papernot, N.: Making machine learning robust against adversarial inputs. Commun. ACM 61(7), 56–66 (2018)
Article Google Scholar
Google, LLC et al.: gemmlowp: a small self-contained low-precision GEMM library (1999). https://github.com/google/gemmlowp
Gou, J., Yu, B., Maybank, S., Tao, D.: Knowledge distillation: a survey. arXiv preprint arXiv:2006.05525 (2020)
Gunther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., Gauger, N.R.: Layer-parallel training of deep residual neural networks. SIAM J. Math. Data Sci. 2(1), 1–23 (2020)
Article MathSciNet MATH Google Scholar
Haber, E., Lensink, K., Triester, E., Ruthotto, L.: IMEXnet: a forward stable deep neural network. arXiv preprint arXiv:1903.02639 (2019)
Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Probl (1) (2017)
Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article MATH Google Scholar
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for MobileNetv3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 187:1-187:30 (2017)
MathSciNet MATH Google Scholar
Jakubovitz, D., Giryes, R.: Improving DNN robustness to adversarial attacks using Jacobian regularization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 514–529 (2018)
Jin, Q., Yang, L., Liao, Z., Qian, X.: Neural network quantization with scale-adjusted training. In: British Machine Vision Conference (BMVC) (2020)
Jung, S., Son, C., Lee, S., Son, J., Han, J.J., Kwak, Y., Hwang, S.J., Choi, C.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4345–4354 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: The International Conference on Learning Representations (ICLR) (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. University of Toronto, Toronto, Ontario, Technical Report (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. (NIPS) 61, 1097–1105 (2012)
Google Scholar
LeCun, Y., Boser, B.E., Denker, J.S.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)
Li, Y., Dong, X., Wang, W.: Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. In: International Conference on Learning Representations (ICLR) (2019)
Liu, Y., Zhang, W., Wang, J.: Zero-shot adversarial quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574–2582 (2016)
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1325–1334 (2019)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pp. 807–814. Omnipress, Madison, WI, USA (2010)
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639 (1990)
Article Google Scholar
Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: 26th ICML, pp. 873–880 (2009). https://doi.org/10.1145/1553374.1553486
Ren, P., Xiao, Y., Chang, X., Huang, P., Li, Z., Chen, X., Wang, X.: A comprehensive survey of neural architecture search: challenges and solutions. arXiv preprint arXiv:2006.02903 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241 (2015)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D Nonlinear Phenomena 60, 259–268 (1992)
Article MathSciNet MATH Google Scholar
Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 352–364 (2020)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93 (2008)
Google Scholar
Thorpe, M., Nguyen, T.M., Xia, H., Strohmer, T., Bertozzi, A., Osher, S., Wang, B.: GRAND++: graph neural diffusion with a source term. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=EMxu-dzvJk
Uhlich, S., Mauch, L., Cardinaux, F., Yoshiyama, K., Garcia, J., Tiedemann, S., Kemp, T., Nakamura, A.: Mixed precision DNNs: all you need is a good parametrization. In: ICLR (2020)
Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017)
Article MathSciNet MATH Google Scholar
Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998)
MATH Google Scholar
Xhonneux, L.P.A.C., Qu, M., Tang, J.: Continuous graph neural networks. In: Proceedings of the 37th International Conference on Machine Learning (2020)
Xu, X., Lu, Q., Yang, L., Hu, S., Chen, D., Hu, Y., Shi, Y.: Quantization of fully convolutional networks for accurate biomedical image segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8300–8308 (2018). https://doi.org/10.1109/CVPR.2018.00866
Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semi-supervised learning with graph embeddings. In: International Conference on Machine Learning, pp. 40–48. PMLR (2016)
Yin, P., Zhang, S., Lyu, J., Osher, S., Qi, Y., Xin, J.: Blended coarse gradient descent for full quantization of deep neural networks. Res. Math. Sci. 6(1), 1–23 (2019)
Article MathSciNet MATH Google Scholar
Zhang, D.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. ECCV (2018)
Zhang, L., Schaeffer, H.: Forward stability of ResNet and its variants. J. Math. Imaging Vis. 62(3), 328–351 (2020)
Article MathSciNet MATH Google Scholar
Zhao, R., Hu, Y., Dotzel, J., De Sa, C., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. ICML 97, 7543–7552 (2019)
Google Scholar
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhou, Y., Moosavi-Dezfooli, S.M., Cheung, N.M., Frossard, P.: Adaptive quantization for deep neural network. In: AAAI, pp. 4596–4604 (2018)

Download references

Author information

Authors and Affiliations

Computer Science Department, Ben-Gurion University of the Negev, Beer Sheva, Israel
Ido Ben-Yair, Gil Ben Shalom, Moshe Eliasof & Eran Treister

Authors

Ido Ben-Yair
View author publications
You can also search for this author in PubMed Google Scholar
Gil Ben Shalom
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Eliasof
View author publications
You can also search for this author in PubMed Google Scholar
Eran Treister
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ido Ben-Yair.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research reported in this paper was supported by the Israel Innovation Authority through the Avatar consortium, and by Grant No. 2018209 from the United States - Israel Binational Science Foundation (BSF), Jerusalem, Israel. ME is supported by Kreitman High-Tech scholarship. The authors thank the Lynn and William Frankel Center for Computer Science at BGU.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ben-Yair, I., Ben Shalom, G., Eliasof, M. et al. Quantized convolutional neural networks through the lens of partial differential equations. Res Math Sci 9, 58 (2022). https://doi.org/10.1007/s40687-022-00354-y

Download citation

Received: 31 August 2021
Accepted: 31 July 2022
Published: 06 September 2022
DOI: https://doi.org/10.1007/s40687-022-00354-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantized convolutional neural networks through the lens of partial differential equations

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

A comprehensive survey on model compression and acceleration

Error-Aware Conversion from ANN to SNN via Post-training Parameter Calibration

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Quantized convolutional neural networks through the lens of partial differential equations

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

A comprehensive survey on model compression and acceleration

Error-Aware Conversion from ANN to SNN via Post-training Parameter Calibration

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation