Skip to main content

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

Abstract

Deep learning has made a real revolution in the embedded computing environment. Convolutional neural network (CNN) revealed itself as a reliable fit to many emerging problems. The next step, is to enhance the CNN role in the embedded devices including both implementation details and performance. Resources needs of storage and computational ability are limited and constrained, resulting in key issues we have to consider in embedded devices. Compressing (i.e., quantizing) the CNN network is a valuable solution. In this paper, Our main goals are: memory compression and complexity reduction (both operations and cycles reduction) of CNNs, using methods (including quantization and pruning) that don’t require retraining (i.e., allowing us to exploit them in mobile system, or robots). Also, exploring further quantization techniques for further complexity reduction. To achieve these goals, we compress a CNN model layers (i.e., parameters and outputs) into suitable precision formats using several quantization methodologies. The methodologies are: First, we describe a pruning approach, which allows us to reduce the required storage and computation cycles in embedded devices. Such enhancement can drastically reduce the consumed power and the required resources. Second, a hybrid quantization approach with automatic tuning for the network compression. Third, a K-means quantization approach. With a minor degradation relative to the floating-point performance, the presented pruning and quantization methods are able to produce a stable performance fixed-point reduced networks. A precise fixed-point calculations for coefficients, input/output signals and accumulators are considered in the quantization process.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Ciresan D, Meier U, Masci J, Gambardella L, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 1237–1242

  2. 2.

    Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3642–3649

  3. 3.

    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  4. 4.

    Dipert B, Bier J, Rowen C, Dashwood J, Laroche D, Ors A, Thompson M (2016) Deep learning for object recognition: DSP and specialized processor optimizations. http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/documents/pages/cnn-dsps. Aaccessed 30 Dec 2018

  5. 5.

    NVIDIA, Nvidia pascal architecture. http://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf (2016). Accessed 30 Dec 2018

  6. 6.

    Delaye E, Sirasao A, Dudha C, Das S (2017) Deep learning challenges and solutions with xilinx fpgas. In: IEEE/ACM international conference on computer-aided design (ICCAD), pp 908–913

  7. 7.

    Al-Hami M, Lakaemper R (2014) Sitting pose generation using genetic algorithm for nao humanoid robots. In: IEEE international workshop on advanced robotics and its social impacts, pp 137–142

  8. 8.

    Al-Hami M, Lakaemper R (2015) Towards human pose semantic synthesis in 3D based on query keywords. VISAPP (3)

  9. 9.

    Al-Hami M, Lakaemper R (2017) Reconstructing 3D human poses from keyword based image database query. In: International conference on 3D vision (3DV), pp 440–448

  10. 10.

    Al-Hami M, Lakaemper R, Rawashdeh M, Hossain MS (2019) Camera localization for a human-pose in 3D space using a single 2D human-pose image with landmarks: a multimedia social network emerging demand. Multimed Tools Appl 78(3):3587–3608

    Google Scholar 

  11. 11.

    Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res 18:6869–6898

    Google Scholar 

  12. 12.

    Lin D, Talathi S, Annapureddy V (2016) Fixed point quantization of deep convolutional networks. In: International conference on machine learning (ICML), pp 2849–2858

  13. 13.

    Courbariaux M, David J-P, Bengio Y (2014) Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024

  14. 14.

    Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: training neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830

  15. 15.

    Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 32nd international conference on machine learning, pp 1737–1746

  16. 16.

    Esser S, Appuswamy R, Merolla P, Arthur J, Modha D (2015) Backpropagation for energy-efficient neuromorphic computing. In: Advances in neural information processing systems, vol 435, pp 1117–1125

  17. 17.

    Anwar S, Hwang K, Sung W (2015) Fixed point optimization of deep convolutional neural networks for object recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1131–1135

  18. 18.

    Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. ArXiv e-prints, arXiv:1604.03168

  19. 19.

    Vanhoucke V, Senior A, Mao M (2011) Improving the speed of neural networks on cpus. In: Proceedings of the deep learning and unsupervised feature learning NIPS workshop

  20. 20.

    Hwang K, Sung W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and \(-1\). In: IEEE workshop on signal processing systems (SiPS)

  21. 21.

    Courbariaux M, Bengio Y, David J (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131

  22. 22.

    Soudry D, Hubara I, Meir R (2014) Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. In: Advances in neural information processing systems (NIPS), pp 963–971

  23. 23.

    Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. ArXiv e-prints, arXiv:1603.05279

  24. 24.

    Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M A, Dally W J (2016) Eie: Efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528

  25. 25.

    Han S, Mao H, Dally W J (2016) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1602.01528

  26. 26.

    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Google Scholar 

  27. 27.

    Iandola F N, Moskewicz M W, Ashraf K, Han S, Dally W J, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5mb model 465 size. arXiv preprint arXiv:1602.07360

  28. 28.

    Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  29. 29.

    Wielgosz M, Pietron M (2017) Using spatial pooler of hierarchical temporal memory to classify noisy videos with predefined complexity. J Neurocomput 240:84–97

    Google Scholar 

  30. 30.

    Wielgosz M, Pietron M, Wiatr K (2016) Opencl-accelerated object classification in video streams using spatial pooler of hierarchical temporal memory. arXiv preprint arXiv:1608.01966

  31. 31.

    Pietron M, Wielgosz M, Wiatr K (2016) Formal analysis of htm spatial pooler performance under predefined operation condition. In: International joint conference on rough sets, pp 396–405

  32. 32.

    Pietron M, Wielgosz M, Wiatr K (2016) Parallel implementation of spatial pooler in hierarchical temporal memory. In: International conference on agents and artificial intelligence (ICAART), pp 346–353

  33. 33.

    Ristretto quantization system, http://lepsucd.com/?page_id=621 (2016). Accessed 31 Dec 2018

  34. 34.

    Google, Tensorflow, https://www.tensorflow.org (2016). Accessed 22 Feb 2017

  35. 35.

    Al-Hami M, Pietron M, Casas R, Hijazi S, Kaul P (2018) Towards a stable quantized convolutional neural networks: an embedded perspective. In: 10th International conference on agents and artificial intelligence (ICAART), pp 573–580

  36. 36.

    Courbariaux M, Bengio Y, Jean-Pierre D (2014) Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024

  37. 37.

    Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems (NIPS), pp 1135–1143

  38. 38.

    Huang Q, Zhou K, You S, Neumann U (2018) Learning to prune filters in convolutional neural networks, arXiv:1801.07365

  39. 39.

    Li H, Kadav A, Durdanovic I, Samet H, Graf H P (2016) Pruning filters for efficient convnets arXiv preprint arXiv:1608.08710,

  40. 40.

    Gysel P (2016) Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402

  41. 41.

    Al-Hami M, Pietron M, Kumar R, Casas R, Hijazi S, Rowen C (2018) Method for hybrid precision convolutional neural network representation. arXiv preprint arXiv:1807.09760

  42. 42.

    Gysel P, Motamedi M, Ghiasi S (2016) Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168

  43. 43.

    Han S, Mao H, Dally W J (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149

  44. 44.

    Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: IEEE conference on computer vision and pattern recognition (CVPR)

  45. 45.

    Zhang L, Liu B (2016) Ternary weight networks. arXiv:1605.04711

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mo’taz Al-Hami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Al-Hami, M., Pietron, M., Casas, R. et al. Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification. Neural Process Lett 51, 105–127 (2020). https://doi.org/10.1007/s11063-019-10076-y

Download citation

Keywords

  • Convolutional neural network (CNN)
  • Fixed-point
  • Quantization
  • Pruning
  • Clustering
  • K-means
  • Hybrid quantization
  • Incremental pruning
  • Partial quantization
  • Histogram