Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Analysis of large deviations behavior of multi-GPU memory access in deep learning

  • 388 Accesses

  • 1 Citations

Abstract

The unpredictable nature of irregular memory accesses in a mixed memory applications such as deep learning application poses many challenges due to the communication issues. Typically, a multi-GPU node that has a large number of simultaneous memory requests consumes almost 80% of the processing time for memory mapping. This calls for characterization of mixed regular and irregular memory accesses so that memory divergence can be simplified to improve performance. In this paper, using large deviations principle, it is shown that the mixed regular and irregular memory accesses can be viewed as a combination of continuous and discrete functions. This view point is proved to give better performance through characterization of memory divergence in multi-GPU node using the sub-additivity property. Further, a detection test procedure based on quenched large deviations model is proposed which generates threshold values for optimizing the memory mapping in data intensive applications and hence it will improve the performance.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Al-Ayyoub M, AlZu’bi S, Jararweh Y et al (2016) Accelerating 3D medical volume segmentation using GPUs. Multimed Tools Appl. https://doi.org/10.1007/11042-016-4218-0

  2. 2.

    Alsmirat MA, Jararweh Y, Al-Ayyoub M, Shehab MA, Gupta BB (2017) Accelerating compute intensive medical imaging segmentation algorithms using hybrid CPU–GPU implementations. Multimed Tools Appl 76(3):3537–3555

  3. 3.

    Ausavarungnirun R, Ghose S, Kayiran O, Loh GH, Das CR, Kandemir MT, Mutlu O (2015) Exploiting inter-warp heterogeneity to improve GPGPU performance. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 25–38

  4. 4.

    Bertsimas D, Paschalidis IC, Tsitsiklis JN et al (1998) On the large deviations behavior of acyclic networks of \( g/g/1\) queues. Ann Appl Probab 8(4):1027–1069

  5. 5.

    Bucklew JA (1990) Large deviation techniques in decision, simulation, and estimation. Wiley, New York

  6. 6.

    Burtscher M, Nasre R, Pingali K (2012) A quantitative study of irregular programs on GPUs. In: 2012 IEEE International Symposium on Workload Characterization (IISWC). IEEE, pp 141–151

  7. 7.

    Cabezas J, Jordà M, Gelado I, Navarro N, Hwu WM (2015) GPU-SM: shared memory multi-GPU programming. In: Proceedings of the 8th Workshop on General Purpose Processing using GPUs. ACM, pp 13–24

  8. 8.

    Campos D, Drewitz A, Ramirez AF, Rassoul-Agha F, Seppalainen T (2011) Level 1 quenched large deviation principle for random walk in dynamic random environment. arXiv preprint arXiv:1105.5726

  9. 9.

    Chatterjee N, O’Connor M, Loh GH, Jayasena N, Balasubramonian R (2014) Managing dram latency divergence in irregular GPGPU applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, pp 128–139

  10. 10.

    Chatterjee S, Varadhan S (2011) Large deviations for random matrices. arXiv preprint arXiv:1106.4366

  11. 11.

    Choi J (2012) On large deviations of HARQ with incremental redundancy over fading channels. IEEE Commun Lett 16(6):913–916

  12. 12.

    den Hollander F (2010) A key large deviation principle for interacting stochastic systems. In: Proceedings of the International Congress of Mathematicians, Hyderabad, India, vol 4, pp 2258–2274

  13. 13.

    Dembo A, Zeitouni O (2010) Large deviations techniques and applications. Corrected reprint of the second (1998) edition. Stochastic modelling and applied probability, vol 38

  14. 14.

    Faraji I, Mirsadeghi SH, Afsahi A (2016) Topology-aware GPU selection on multi-GPU nodes. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. IEEE, pp 712–720

  15. 15.

    Jararweh Y, Al-Ayyoub M, Fakirah M et al (2017) Improving the performance of the Needleman–Wunsch algorithm using parallelization and vectorization techniques. Multimed Tools Appl. https://doi.org/10.1007/s11042-017-5092-0

  16. 16.

    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732

  17. 17.

    Kim YD, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530

  18. 18.

    Lee D, Subramanian L, Ausavarungnirun R, Choi J, Mutlu O (2015) Decoupled direct memory access: isolating CPU and IO traffic by leveraging a dual-data-port dram. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 174–187

  19. 19.

    Li C, Yang Y, Feng M, Chakradhar S, Zhou H (2016) Optimizing memory efficiency for deep convolutional neural networks on GPUS. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 633–644 (2016)

  20. 20.

    Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 806–814

  21. 21.

    Rogers TG, O’Connor M, Aamodt TM (2013) Divergence-aware warp scheduling. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, pp 99–110 (2013)

  22. 22.

    Rünger G, Schwind M (2009) Parallelization Strategies for Mixed Regular-Irregular Applications on Multicore-Systems. In: Dou Y, Gruber R, Joller JM (eds) Advanced Parallel Processing Technologies. PPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg

  23. 23.

    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp 568–576

  24. 24.

    Varadhan S et al (2008) Large deviations. Ann Probab 36(2):397–419

  25. 25.

    Wang B, Yu W, Sun XH, Wang X (2015) Dacache: memory divergence-aware gpu cache management. In: Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, pp 89–98

  26. 26.

    Wang H, Zhao H, Lin B, Xu J (2012) Robust pipelined memory system with worst case performance guarantee for network processing. IEEE Trans Comput 61(10):1386–1400

  27. 27.

    Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp 2074–2082

  28. 28.

    Wu J, Xiong X, Berrocal E, Wang J, Lan Z (2017) Topology mapping of irregular parallel applications on torus-connected supercomputers. J Supercomput 73(4):1691–1714

  29. 29.

    Yadan O, Adams K, Taigman Y, Ranzato M (2013) Multi-GPU training of convnets. arXiv preprint arXiv:1312.5853

  30. 30.

    Zhang C, Tabkhi H, Schirner G (2016) Studying inter-warp divergence aware execution on gpus. IEEE Comput Archit Lett 15(2):117–120

Download references

Author information

Correspondence to P. S. Tamizharasan.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tamizharasan, P.S., Ramasubramanian, N. Analysis of large deviations behavior of multi-GPU memory access in deep learning. J Supercomput 74, 2199–2212 (2018). https://doi.org/10.1007/s11227-018-2246-4

Download citation

Keywords

  • Multi-GPUs
  • Large deviations
  • Memory divergence
  • Deep learning