Heterogeneous Computing System for Deep Learning

  • Mihaela Maliţa
  • George Vlǎduţ Popescu
  • Gheorghe M. ŞtefanEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 866)


Various forms of Deep Neural Network (DNN)  architectures are used as Deep Learning  tools for neural inspired computational systems. The computational power, the bandwidth and the energy requested by the current developments of the domain are very high. The solutions offered by the current architectural environment are far from being efficient. We propose a hybrid computational system for running efficiently the training and inference DNN algorithms. The system is more energy efficient compared with the current solutions, and achieves a higher actual performance per peak performance ratio. The accelerator part of our heterogeneous system is a programmable many-core system with a Map-Scan/Reductive only the cells where architecture. The chapter describes and evaluates the proposed accelerator for the main computational intensive components of a DNN: the fully connected layer, the convolution layer, the pooling layer, and the softmax layer.


Deep neural network Parallel computing Heterogeneous computing Accelerators 


  1. 1.
    Andonie, R., Malita, M.: The connex array as a neural network accelerator. In: Proceedings of the IASTED International Conference on Computational Intelligence, Banff, Alberta, Canada, 2–4 July, pp. 163–167 (2007)Google Scholar
  2. 2.
    Andonie, R., Malita, M., Stefan, M. G.: MapReduce: from elementary circuits to cloud. In: Kreinovich, V. (ed.) Uncertainty Modeling. Studies in Computational Intelligence, pp. 1–14. Springer, Cham (2017)Google Scholar
  3. 3.
    Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: efficient primitives for deep learning. (2014)
  4. 4.
    Cireşan, D. C., Meier, U., Masci, J., Gambardella, L. M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1237–1242 (2011)Google Scholar
  5. 5.
    Georganas, E., Avancha, S., Banerjee, K., Kalamkar, D.D., Henry, G., Pabst, H., Heinecke, A.: Anatomy of high-performance deep learning convolutions on SIMD architectures. SC. (2018)
  6. 6.
    Harris,, M.: Mixed-precision programming with CUDA 8. (2016)
  7. 7.
    Hennessy, J.L., Patterson, D.: Computer Architecture A Quantitative Approach, Sixth edn. Morgan Kaufmann (2019)Google Scholar
  8. 8.
    Jouppi, N.P., Young, C., Patil, N., Patterson, D., et al.: In-datacenter performance analysis of a tensor processing unit\(^{TM}\). In: 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, 26 June. (2017)
  9. 9.
    Kennedy, P.: Case study on the Google TPU and GDDR5 from Hot Chips 29. (2017)
  10. 10.
    Kumar, A., Trivedi, M.: Intel scalable processor architecture deep drive. (2017)
  11. 11.
    Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUs, SC ’16. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, pp. 633–644. (2016)
  12. 12.
    Lorentz, I., Malita, M., Andonie, R.: Evolutionary computation on the connex architecture. In: Proceedings of The 22nd Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 2011), pp. 146–153 (2011)Google Scholar
  13. 13.
    Malita, M., Stefan, G.M., Thiébaut, D.: Not multi-, but many-core: designing integral parallel architectures for embedded computations. ACM SIGARCH Comput. Archit. News 35(5), 32–38 (2007)CrossRefGoogle Scholar
  14. 14.
    Malita, M., Stefan, G.M.: Map-scan node accelerator for big-data. In: 2017 IEEE International Conference on Big Data (BIGDATA), 4th Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery, 11–14 December 2017, Boston, MA, USA, pp. 3442–3447. (2017)
  15. 15.
    Smith, R., Oh, N.: The NVIDIA Titan V preview—Titanomachy: war of the Titans. In: AnandTech. (2017)
  16. 16.
    Stefan, G.M., Sheel, A., Mîţu B., Thomson, T., Tomescu, D.: The CA1024: a fully programmable system-on-chip for cost-effective HDTV media processing. In: Stanford University: Hot Chips: A Symposium on High Performance Chips, August 2006. at 35:00 (2006)
  17. 17.
    Stefan, G.M., Malita, M.: Can one-chip parallel computing be liberated from ad hoc solutions? A computation model based approach and its implementation. In: 18th International Conference on Circuits, Systems, Communications and Computers, pp. 582–597 (2015)Google Scholar
  18. 18.
    Yuan, B.: Efficient hardware architecture of softmax layer in deep neural network. In: 2016 29th IEEE International System-on-Chip Conference (SOCC), pp. 323–326 (2016)Google Scholar
  19. 19.
    Convolutional Neural Network. 3 things you need to know. (2019)
  20. 20.
    Cloud TPU. System Architecture. (2019)
  21. 21.
    The Future is Hybrid Trends in Computing Hardware. (2017)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mihaela Maliţa
    • 1
  • George Vlǎduţ Popescu
    • 2
  • Gheorghe M. Ştefan
    • 2
    Email author
  1. 1.Saint Anselm CollegeManchesterUSA
  2. 2.Politehnica University of BucharestBucharestRomania

Personalised recommendations