Towards Efficient Forward Propagation on Resource-Constrained Systems

  • Günther SchindlerEmail author
  • Matthias Zöhrer
  • Franz Pernkopf
  • Holger Fröning
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


In this work we present key elements of DeepChip, a framework that bridges recent trends in machine learning with applicable forward propagation on resource-constrained devices. Main objective of this work is to reduce compute and memory requirements by removing redundancy from neural networks. DeepChip features a flexible quantizer to reduce the bit width of activations to 8-bit fixed-point and weights to an asymmetric ternary representation. In combination with novel algorithms and data compression we leverage reduced precision and sparsity for efficient forward propagation on a wide range of processor architectures. We validate our approach on a set of different convolutional neural networks and datasets: ConvNet on SVHN, ResNet-44 on CIFAR10 and AlexNet on ImageNet. Compared to single-precision floating point, memory requirements can be compressed by a factor of 43, 22 and 10 and computations accelerated by a factor of 5.2, 2.8 and 2.0 on a mobile processor without a loss in classification accuracy. DeepChip allows trading accuracy for efficiency, and for instance tolerating about 2% loss in classification accuracy further reduces memory requirements by a factor of 88, 29 and 13, and speeds up computations by a factor of 6.0, 4.3 and 5.0. Code related to this paper is available at:



We gratefully acknowledge the valuable contributions of Andreas Kugel and Andreas Melzer. We also acknowledge funding by the German Research Foundation (DFG) under the project number FR3273/1-1 and the Austrian Science Fund (FWF) under the project number I2706-N31.


  1. 1.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. Resonance 11, 91–99 (2006)CrossRefGoogle Scholar
  2. 2.
    Abadi, M.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016)Google Scholar
  3. 3.
    ARM: Cortex-a9 neon media - technical reference manual. Technical report (2008)Google Scholar
  4. 4.
    Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave gaussian quantization. CoRR abs/1702.00953 (2017)Google Scholar
  5. 5.
    Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759 (2014)Google Scholar
  6. 6.
    Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1. CoRR (2016)Google Scholar
  7. 7.
    Han, S., et al.: ESE: efficient speech recognition engine with compressed LSTM, on FPGA. CoRR abs/1612.00694 (2016)Google Scholar
  8. 8.
    Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. CoRR abs/1602.01528 (2016)Google Scholar
  9. 9.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR abs/1510.00149 (2015)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)Google Scholar
  11. 11.
    Hinton, G., Dean, J., Vinyals, O.: Distilling the knowledge in a neural network. In: NIPS 2014 Deep Learning Workshop, pp. 1–9 (2014)Google Scholar
  12. 12.
    Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012)CrossRefGoogle Scholar
  13. 13.
    Horowitz, M.: 1.1 computing’s energy problem (and what we can do about it), vol. 57, pp. 10–14 (2014)Google Scholar
  14. 14.
    Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. CoRR abs/1602.07360 (2016)Google Scholar
  15. 15.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. eprint arXiv:1712.05877 (2017)
  16. 16.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th NIPS, NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012)Google Scholar
  17. 17.
    Lenz, I.: Deep Learning for Robotics (2016)Google Scholar
  18. 18.
    Li, F., Liu, B.: Ternary weight networks. CoRR abs/1605.04711 (2016)Google Scholar
  19. 19.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: imagenet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016)Google Scholar
  20. 20.
    Schindler, G., Mücke, M., Fröning, H.: Linking application description with efficient SIMD Code generation for low-precision signed-integer GEMM. In: Heras, D.B., Bougé, L. (eds.) Euro-Par 2017. LNCS, vol. 10659, pp. 688–699. Springer, Cham (2018). Scholar
  21. 21.
    Umuroglu, Y., et al.: FINN: a framework for fast, scalable binarized neural network inference. CoRR abs/1612.07119 (2016)Google Scholar
  22. 22.
    Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011 (2011)Google Scholar
  23. 23.
    Wu, Y., et al.: Tensorpack (2016)Google Scholar
  24. 24.
    Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160 (2016)Google Scholar
  25. 25.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. CoRR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Günther Schindler
    • 1
    Email author
  • Matthias Zöhrer
    • 2
  • Franz Pernkopf
    • 2
  • Holger Fröning
    • 1
  1. 1.Institute of Computer EngineeringRuprecht Karls UniversityHeidelbergGermany
  2. 2.Signal Processing and Speech Communication LaboratoryGraz University of TechnologyGrazAustria

Personalised recommendations