Skip to main content
Log in

Exploiting architectural features of a computer vision platform towards reducing memory stalls

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Computer vision applications are becoming more and more popular in embedded systems such as drones, robots, tablets, and mobile devices. These applications are both compute and memory intensive, with memory bound stalls (MBS) making a significant part of their execution time. For maximum reduction in memory stalls, compilers need to consider architectural details of a platform and utilize its hardware components efficiently. In this paper, we propose a compiler optimization for a vision-processing system through classification of memory references to reduce MBS. As the proposed optimization is based on the architectural features of a specific platform, i.e., Myriad 2, it can only be applied to other platforms having similar architectural features. The optimization consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. We use LLVM compiler infrastructure for implementation of the proposed optimization. Application of annotation-based approach on a memory-intensive program shows a reduction in stall cycles by 67.44%, leading to 25.61% improvement in execution time. We use 11 different image-processing benchmarks for evaluation of automated analysis approach. Experimental results show that classification of memory references reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V.: Real-time computer vision with opencv. Commun. ACM 55(6), 61–69 (2012)

    Google Scholar 

  2. Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116 (2011)

  3. Barry, B., Brick, C., Connor, F., Donohoe, D., Moloney, D., Richmond, R., O’Riordan, M.J., Toma, V.: Always-on vision processing unit for mobile applications. IEEE Micro. 35(2), 56–66 (2015)

    Google Scholar 

  4. Chua, J.L., Chang, Y.C., Lim, W.K.: A simple vision-based fall detection technique for indoor video surveillance. Signal Image Video Process. 9(3), 623–633 (2015)

    Google Scholar 

  5. Gómez, M.J., García, F., Martín, D., de la Escalera, A., Armingol, J.M.: Intelligent surveillance of indoor environments based on computer vision and 3D point cloud fusion. Exp. Syst. Appl. 42(21), 8156–8171 (2015)

    Google Scholar 

  6. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)

    Google Scholar 

  7. Suwajanakorn, S., Kemelmacher-Shlizerman, I., Seitz, S.M.: Total moving face reconstruction. In: Proceedings of European Conference on Computer Vision, pp. 796–812 (2014)

  8. Smolyanskiy, N., Huitema, C., Liang, L., Anderson, S.E.: Real-time 3D face tracking based on active appearance model constrained by depth data. Image Vis. Comput. 32(11), 860–869 (2014)

    Google Scholar 

  9. Bar. Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: Proceedings of Medical Imaging 2015: Computer-Aided Diagnosis (2015)

  10. Greenspan, H., van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging. 35(5), 1153–1159 (2016)

    Google Scholar 

  11. Ohn-Bar, E., Tawari, A., Martin, S., Trivedi, M.M.: On surveillance for safety critical events: in-vehicle video networks for predictive driver assistance systems. Comput. Vis. Image Underst. 134, 130–140 (2015)

    Google Scholar 

  12. Mandal, D.K., Sankaran, J., Gupta, A., Castille, K., Gondkar, S., Kamath, S., Sundar, P., Phipps, A.: An Embedded Vision Engine (EVE) for automotive vision processing. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 49–52 (2014)

  13. Zhang, B., Huang, W., Li, J., Zhao, C., Fan, S., Wu, J., Liu, C.: Principles, developments and applications of computer vision for external quality inspection of fruits and vegetables: a review. Food Res. Int. 62, 326–343 (2014)

    Google Scholar 

  14. Aghbashlo, M., Hosseinpour, S., Ghasemi-Varnamkhasti, M.: Computer vision technology for real-time food quality assurance during drying process. Trends Food Sci. Technol. 39(1), 76–84 (2014)

    Google Scholar 

  15. Ma, J., Sun, D.W., Qu, J.H., Liu, D., Pu, H., Gao, W.H., Zeng, X.A.: Applications of computer vision for assessing quality of agri-food products: a review of recent research advances. Crit. Rev. Food Sci. Nutr. 56(1), 113–127 (2016)

    Google Scholar 

  16. Guo, Y., Zhuge, Q., Hu, J., Yi, J., Qiu, M., Sha, E.H.M.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)

    Google Scholar 

  17. Wang, D., Du, X., Yin, L., Lin, C., Ma, H., Ren, W., Wang, H., Wang, X., Xie, S., Wang, L., Liu. Z., Wang, T., Pu, Z., Ding, G., Zhu, M., Yang, L., Guo, R., Zhang, Z., Lin, X., Hao, J., Yang, Y., Sun, W., Zhou, F., Xiao, N., Cui, Q., Wangg, X.: MaPU: A novel mathematical computing architecture. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 457–468 (2016)

  18. Lin, Z., Sankaran, J., Flanagan, T.: Empowering automotive vision with TI’s Vision AccelerationPac. TI White Paper (2013)

  19. Conti, F., Rossi, D., Pullini, A., Loi, I., Benini, L.: PULP: a ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. J. Signal Process. Syst. 84(3), 339–354 (2016)

    Google Scholar 

  20. Machine Vision Technology: Movidius https://www.movidius.com/technology. Accessed 23 Sept 2017

  21. Diken, E., O’Riordan, M.J., Jordans, R., Jozwiak, L., Corporaal, H., Moloney, D.: Mixed-length simd code generation for vliw architectures with multiple native vector-widths. In: Proceedings of IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 181–188 (2015)

  22. Chen, T.P., Budnikov, D., Hughes, C.J, Chen, Y.K.: Computer vision on multi-core processors: articulated body tracking. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1862–1865 (2007)

  23. Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of Second Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 75–88 (2004)

  24. Sethia, A., Dasika, G., Mudge, T., Mahlke, S.A.: Customized processor for energy efficient scientific computing. IEEE Trans. Comput. 61(12), 1711–1723 (2012)

    MathSciNet  MATH  Google Scholar 

  25. Cho, J., Paek, Y., Whalley, D.: Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES), pp. 130–138 (2002)

  26. Leupers, R., Kotte, D.: Variable partitioning for dual memory bank DSPs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1121–1124 (2001)

  27. Ko, M.Y., Bhattacharyya, S.S.: Partitioning for DSP software synthesis. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 344–358 (2003)

  28. Murray, A., Franke, B.: Fast source-level data assignment to dual memory banks. In: Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 43–52 (2008)

  29. Sipkova, V.: Efficient variable allocation to dual memory banks of DSPs. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 359–372 (2003)

  30. Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Operation and data mapping for CGRAs with multi-bank memory. In: Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 17–26 (2010)

  31. Mi, W., Feng, X., Xue, J., Jia. Y.: Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In: Proceedings of International Conference on Network and Parallel Computing (IFIP), pp. 329–343 (2010)

  32. Bircsak, J., Craig, P., Crowell, R., Cvetanovic, Z., Harris, J., Nelson, C.A, Offner, C.D.: Extending openmp for NUMA machines. In: Proceedings of ACM/IEEE 2000 Conference on Supercomputing (SC) (2000)

  33. Antony, J., Janes, P.P., Rendell, A.P.: Exploring thread and memory placement on numa architectures: Solaris and linux, ultrasparc/fireplane and opteron/hypertransport. In: Proceedings of International Conference on High-Performance Computing, pp. 338–352 (2006)

  34. Lameter, C.: Numa (non-uniform memory access): an overview. ACM Queue. 11(7), 1–12 (2013)

    Google Scholar 

  35. Ribeiro, C.P., Mehaut, J.F., Carissimi, A., Castro, M., Fernandes, L.G.: Memory affinity for hierarchical shared memory multiprocessors. In: Proceedings of 21st International Symposium on Computer Architecture and High Performance Computing, pp. 59–66 (2009)

  36. Kleen, A.: A numa api for linux. SUSE Labs (2004). http://halobates.de/numaapi3.pdf. Accessed 23 Sept 2017

  37. Löf, H.,Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system. In: Proceedings of 19th Annual International Conference on Supercomputing (SC), pp. 387–392 (2005)

  38. Lankes, S., Bierbaum, B., Bemmerl, T.: Affinity-on-next-touch: an extension to the linux kernel for numa architectures. In: Proceedings of International Conference on Parallel Processing and Applied Mathematics, pp. 576–585 (2010)

  39. Golgin, B., Furmento, N.: Enabling high-performance memory migration for multithreaded applications on LINUX. In: Proceedings of IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (2009)

  40. Codrescu, L., Anderson, W., Venkumanhanti, S., Zeng, M., Plondke, E., Koob, C., Ingle, A., Tabony, C., Maule, R.: Hexagon DSP: an architecture optimized for mobile multimedia and communications. IEEE Micro. 34(2), 34–43 (2014)

    Google Scholar 

  41. Gonzalez, R.C.: Digital Image Processing. Prentice-Hall, Upper Sadle River (2002)

    Google Scholar 

  42. McDonnell, M.J.: Box-filtering techniques. Comput. Graph. Image Process. 17(1), 65–70 (1981)

    Google Scholar 

  43. Podlozhnyuk, V.: Image convolution with cuda. NVIDIA Corporation white paper, vol 2097(3), (2007)

  44. Niitsuma, H., Maruyama, T.: Sum of absolute difference implementations for image processing on fpgas. In: Proceedings of International Conference on Field Programmable Logic and Applications (FPL), pp. 167–170 (2010)

  45. Bianco, S., Gasparini, F., Schettini, R.: Combining strategies for white balance. In: Proceedings of SPIE 6502, Digital Photography III, pp. 65020D (2007)

  46. Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  47. Reinders, J.: VTune Performance Analyzer Essentials. Intel Press, Santa Clara (2005)

    Google Scholar 

  48. Moloney, D., Barry, B., Richmond, R., Connor, F., Brick, C., Donohoe, D.: Myriad 2: Eye of the computational vision storm. In: Proceedings of Hot Chips 26 Symposium (HCS), pp. 1–18 (2014)

  49. Thorarensen, S.: A back-end for the skepu skeleton programming library targeting the low-power multicore vision processor myriad 2. Master’s thesis, Linköping university, Sweden (2016)

  50. LLVM 6 documentation https://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags. Accessed 23 Sept 2017

  51. Hussain, M., Chen, D., Cheng, A., Wei, H., Stanley, D.: Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J. Photogramm. Remote Sens. 80, 91–106 (2013)

    Google Scholar 

  52. S, M., Shetty, A.: A comparative study of image change detection algorithms in MATLAB. In: Proceedings of International Conference on Water Resources, Coastal and Ocean Engineering (ICWRCOE) pp. 1366–1373 (2015)

  53. Turk, M., Pentland, A.: Eigenfaces for recognition. J Cogn Neurosci. 3(1), 71–86 (1991)

    Google Scholar 

  54. Crow, F.C.: Summed-area tables for texture mapping. In: Proceedings of 11th International Conference on Computer Graphics and Interactive Techniques pp. 207–212 (1984)

  55. Jiang, L., Xie, H., Pan, B.: Speeding up digital image correlation computation using the integral image technique. Opt. Lasers Eng. 65, 117–122 (2015)

    Google Scholar 

  56. He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)

    Google Scholar 

  57. Ramanath, R., Snyder, W.E., Yoo, Y., Drew, M.S.: Color image processing pipeline. IEEE Signal Process. Mag. 22(1), 34–43 (2005)

    Google Scholar 

  58. Lukac, R.: New framework for automatic white balancing of digital camera images. Signal Process. 88(3), 582–593 (2008)

    MathSciNet  MATH  Google Scholar 

  59. Arici, T., Dikbas, S., Altunbasak, Y.: A histogram modification framework and its application for image contrast enhancement. IEEE Trans. Image Process. 18(9), 1921–1935 (2009)

    MathSciNet  MATH  Google Scholar 

  60. Duan, J., Qiu, G.: Novel histogram processing for colour image enhancement. In: Proceedings of Third International Conference on Image and Graphics (ICIG) pp. 55–58 (2004)

  61. Hong, W.: A study of fast, robust stereo-matching algorithms. Doctoral dissertation, Massachusetts Institute of Technology, USA, (2010)

Download references

Acknowledgements

This work is supported by European Union’s Horizon2020 research and innovation programme under grant agreement number 687698 and Ph.D. scholarship from Higher Education Commission (HEC) of Pakistan awarded to Naveed Ul Mustafa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naveed Ul Mustafa.

Appendix: A critical part of source code for benchmarks

Appendix: A critical part of source code for benchmarks

See Table 4

Table 4 Source code of benchmarks

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ul Mustafa, N., O’Riordan, M.J., Rogers, S. et al. Exploiting architectural features of a computer vision platform towards reducing memory stalls. J Real-Time Image Proc 17, 853–870 (2020). https://doi.org/10.1007/s11554-018-0830-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-018-0830-8

Keywords

Navigation