Exploiting architectural features of a computer vision platform towards reducing memory stalls

Ul Mustafa, Naveed; O’Riordan, Martin J.; Rogers, Stephen; Ozturk, Ozcan

doi:10.1007/s11554-018-0830-8

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Original Research Paper
Published: 09 October 2018

Volume 17, pages 853–870, (2020)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Naveed Ul Mustafa ORCID: orcid.org/0000-0002-0650-3464¹,
Martin J. O’Riordan²,
Stephen Rogers² &
…
Ozcan Ozturk¹

275 Accesses
1 Citation
Explore all metrics

Abstract

Computer vision applications are becoming more and more popular in embedded systems such as drones, robots, tablets, and mobile devices. These applications are both compute and memory intensive, with memory bound stalls (MBS) making a significant part of their execution time. For maximum reduction in memory stalls, compilers need to consider architectural details of a platform and utilize its hardware components efficiently. In this paper, we propose a compiler optimization for a vision-processing system through classification of memory references to reduce MBS. As the proposed optimization is based on the architectural features of a specific platform, i.e., Myriad 2, it can only be applied to other platforms having similar architectural features. The optimization consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. We use LLVM compiler infrastructure for implementation of the proposed optimization. Application of annotation-based approach on a memory-intensive program shows a reduction in stall cycles by 67.44%, leading to 25.61% improvement in execution time. We use 11 different image-processing benchmarks for evaluation of automated analysis approach. Experimental results show that classification of memory references reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 7

Meeting the Challenges of Optimized Memory Management in Embedded Vision Systems Using Operations Research

A Novel Object-Oriented Software Cache for Scratchpad-Based Multi-Core Clusters

Article 06 June 2014

An extended analysis of memory hierarchies for efficient implementations of image processing applications

Article 27 September 2017

References

Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V.: Real-time computer vision with opencv. Commun. ACM 55(6), 61–69 (2012)
Google Scholar
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116 (2011)
Barry, B., Brick, C., Connor, F., Donohoe, D., Moloney, D., Richmond, R., O’Riordan, M.J., Toma, V.: Always-on vision processing unit for mobile applications. IEEE Micro. 35(2), 56–66 (2015)
Google Scholar
Chua, J.L., Chang, Y.C., Lim, W.K.: A simple vision-based fall detection technique for indoor video surveillance. Signal Image Video Process. 9(3), 623–633 (2015)
Google Scholar
Gómez, M.J., García, F., Martín, D., de la Escalera, A., Armingol, J.M.: Intelligent surveillance of indoor environments based on computer vision and 3D point cloud fusion. Exp. Syst. Appl. 42(21), 8156–8171 (2015)
Google Scholar
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
Google Scholar
Suwajanakorn, S., Kemelmacher-Shlizerman, I., Seitz, S.M.: Total moving face reconstruction. In: Proceedings of European Conference on Computer Vision, pp. 796–812 (2014)
Smolyanskiy, N., Huitema, C., Liang, L., Anderson, S.E.: Real-time 3D face tracking based on active appearance model constrained by depth data. Image Vis. Comput. 32(11), 860–869 (2014)
Google Scholar
Bar. Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: Proceedings of Medical Imaging 2015: Computer-Aided Diagnosis (2015)
Greenspan, H., van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging. 35(5), 1153–1159 (2016)
Google Scholar
Ohn-Bar, E., Tawari, A., Martin, S., Trivedi, M.M.: On surveillance for safety critical events: in-vehicle video networks for predictive driver assistance systems. Comput. Vis. Image Underst. 134, 130–140 (2015)
Google Scholar
Mandal, D.K., Sankaran, J., Gupta, A., Castille, K., Gondkar, S., Kamath, S., Sundar, P., Phipps, A.: An Embedded Vision Engine (EVE) for automotive vision processing. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 49–52 (2014)
Zhang, B., Huang, W., Li, J., Zhao, C., Fan, S., Wu, J., Liu, C.: Principles, developments and applications of computer vision for external quality inspection of fruits and vegetables: a review. Food Res. Int. 62, 326–343 (2014)
Google Scholar
Aghbashlo, M., Hosseinpour, S., Ghasemi-Varnamkhasti, M.: Computer vision technology for real-time food quality assurance during drying process. Trends Food Sci. Technol. 39(1), 76–84 (2014)
Google Scholar
Ma, J., Sun, D.W., Qu, J.H., Liu, D., Pu, H., Gao, W.H., Zeng, X.A.: Applications of computer vision for assessing quality of agri-food products: a review of recent research advances. Crit. Rev. Food Sci. Nutr. 56(1), 113–127 (2016)
Google Scholar
Guo, Y., Zhuge, Q., Hu, J., Yi, J., Qiu, M., Sha, E.H.M.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)
Google Scholar
Wang, D., Du, X., Yin, L., Lin, C., Ma, H., Ren, W., Wang, H., Wang, X., Xie, S., Wang, L., Liu. Z., Wang, T., Pu, Z., Ding, G., Zhu, M., Yang, L., Guo, R., Zhang, Z., Lin, X., Hao, J., Yang, Y., Sun, W., Zhou, F., Xiao, N., Cui, Q., Wangg, X.: MaPU: A novel mathematical computing architecture. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 457–468 (2016)
Lin, Z., Sankaran, J., Flanagan, T.: Empowering automotive vision with TI’s Vision AccelerationPac. TI White Paper (2013)
Conti, F., Rossi, D., Pullini, A., Loi, I., Benini, L.: PULP: a ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. J. Signal Process. Syst. 84(3), 339–354 (2016)
Google Scholar
Machine Vision Technology: Movidius https://www.movidius.com/technology. Accessed 23 Sept 2017
Diken, E., O’Riordan, M.J., Jordans, R., Jozwiak, L., Corporaal, H., Moloney, D.: Mixed-length simd code generation for vliw architectures with multiple native vector-widths. In: Proceedings of IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 181–188 (2015)
Chen, T.P., Budnikov, D., Hughes, C.J, Chen, Y.K.: Computer vision on multi-core processors: articulated body tracking. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1862–1865 (2007)
Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of Second Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 75–88 (2004)
Sethia, A., Dasika, G., Mudge, T., Mahlke, S.A.: Customized processor for energy efficient scientific computing. IEEE Trans. Comput. 61(12), 1711–1723 (2012)
MathSciNet MATH Google Scholar
Cho, J., Paek, Y., Whalley, D.: Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES), pp. 130–138 (2002)
Leupers, R., Kotte, D.: Variable partitioning for dual memory bank DSPs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1121–1124 (2001)
Ko, M.Y., Bhattacharyya, S.S.: Partitioning for DSP software synthesis. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 344–358 (2003)
Murray, A., Franke, B.: Fast source-level data assignment to dual memory banks. In: Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 43–52 (2008)
Sipkova, V.: Efficient variable allocation to dual memory banks of DSPs. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 359–372 (2003)
Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Operation and data mapping for CGRAs with multi-bank memory. In: Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 17–26 (2010)
Mi, W., Feng, X., Xue, J., Jia. Y.: Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In: Proceedings of International Conference on Network and Parallel Computing (IFIP), pp. 329–343 (2010)
Bircsak, J., Craig, P., Crowell, R., Cvetanovic, Z., Harris, J., Nelson, C.A, Offner, C.D.: Extending openmp for NUMA machines. In: Proceedings of ACM/IEEE 2000 Conference on Supercomputing (SC) (2000)
Antony, J., Janes, P.P., Rendell, A.P.: Exploring thread and memory placement on numa architectures: Solaris and linux, ultrasparc/fireplane and opteron/hypertransport. In: Proceedings of International Conference on High-Performance Computing, pp. 338–352 (2006)
Lameter, C.: Numa (non-uniform memory access): an overview. ACM Queue. 11(7), 1–12 (2013)
Google Scholar
Ribeiro, C.P., Mehaut, J.F., Carissimi, A., Castro, M., Fernandes, L.G.: Memory affinity for hierarchical shared memory multiprocessors. In: Proceedings of 21st International Symposium on Computer Architecture and High Performance Computing, pp. 59–66 (2009)
Kleen, A.: A numa api for linux. SUSE Labs (2004). http://halobates.de/numaapi3.pdf. Accessed 23 Sept 2017
Löf, H.,Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system. In: Proceedings of 19th Annual International Conference on Supercomputing (SC), pp. 387–392 (2005)
Lankes, S., Bierbaum, B., Bemmerl, T.: Affinity-on-next-touch: an extension to the linux kernel for numa architectures. In: Proceedings of International Conference on Parallel Processing and Applied Mathematics, pp. 576–585 (2010)
Golgin, B., Furmento, N.: Enabling high-performance memory migration for multithreaded applications on LINUX. In: Proceedings of IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (2009)
Codrescu, L., Anderson, W., Venkumanhanti, S., Zeng, M., Plondke, E., Koob, C., Ingle, A., Tabony, C., Maule, R.: Hexagon DSP: an architecture optimized for mobile multimedia and communications. IEEE Micro. 34(2), 34–43 (2014)
Google Scholar
Gonzalez, R.C.: Digital Image Processing. Prentice-Hall, Upper Sadle River (2002)
Google Scholar
McDonnell, M.J.: Box-filtering techniques. Comput. Graph. Image Process. 17(1), 65–70 (1981)
Google Scholar
Podlozhnyuk, V.: Image convolution with cuda. NVIDIA Corporation white paper, vol 2097(3), (2007)
Niitsuma, H., Maruyama, T.: Sum of absolute difference implementations for image processing on fpgas. In: Proceedings of International Conference on Field Programmable Logic and Applications (FPL), pp. 167–170 (2010)
Bianco, S., Gasparini, F., Schettini, R.: Combining strategies for white balance. In: Proceedings of SPIE 6502, Digital Photography III, pp. 65020D (2007)
Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Reinders, J.: VTune Performance Analyzer Essentials. Intel Press, Santa Clara (2005)
Google Scholar
Moloney, D., Barry, B., Richmond, R., Connor, F., Brick, C., Donohoe, D.: Myriad 2: Eye of the computational vision storm. In: Proceedings of Hot Chips 26 Symposium (HCS), pp. 1–18 (2014)
Thorarensen, S.: A back-end for the skepu skeleton programming library targeting the low-power multicore vision processor myriad 2. Master’s thesis, Linköping university, Sweden (2016)
LLVM 6 documentation https://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags. Accessed 23 Sept 2017
Hussain, M., Chen, D., Cheng, A., Wei, H., Stanley, D.: Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J. Photogramm. Remote Sens. 80, 91–106 (2013)
Google Scholar
S, M., Shetty, A.: A comparative study of image change detection algorithms in MATLAB. In: Proceedings of International Conference on Water Resources, Coastal and Ocean Engineering (ICWRCOE) pp. 1366–1373 (2015)
Turk, M., Pentland, A.: Eigenfaces for recognition. J Cogn Neurosci. 3(1), 71–86 (1991)
Google Scholar
Crow, F.C.: Summed-area tables for texture mapping. In: Proceedings of 11th International Conference on Computer Graphics and Interactive Techniques pp. 207–212 (1984)
Jiang, L., Xie, H., Pan, B.: Speeding up digital image correlation computation using the integral image technique. Opt. Lasers Eng. 65, 117–122 (2015)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)
Google Scholar
Ramanath, R., Snyder, W.E., Yoo, Y., Drew, M.S.: Color image processing pipeline. IEEE Signal Process. Mag. 22(1), 34–43 (2005)
Google Scholar
Lukac, R.: New framework for automatic white balancing of digital camera images. Signal Process. 88(3), 582–593 (2008)
MathSciNet MATH Google Scholar
Arici, T., Dikbas, S., Altunbasak, Y.: A histogram modification framework and its application for image contrast enhancement. IEEE Trans. Image Process. 18(9), 1921–1935 (2009)
MathSciNet MATH Google Scholar
Duan, J., Qiu, G.: Novel histogram processing for colour image enhancement. In: Proceedings of Third International Conference on Image and Graphics (ICIG) pp. 55–58 (2004)
Hong, W.: A study of fast, robust stereo-matching algorithms. Doctoral dissertation, Massachusetts Institute of Technology, USA, (2010)

Download references

Acknowledgements

This work is supported by European Union’s Horizon2020 research and innovation programme under grant agreement number 687698 and Ph.D. scholarship from Higher Education Commission (HEC) of Pakistan awarded to Naveed Ul Mustafa.

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, Turkey
Naveed Ul Mustafa & Ozcan Ozturk
Movidius-Intel, Dublin, Ireland
Martin J. O’Riordan & Stephen Rogers

Authors

Naveed Ul Mustafa
View author publications
You can also search for this author in PubMed Google Scholar
Martin J. O’Riordan
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Ozcan Ozturk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naveed Ul Mustafa.

Appendix: A critical part of source code for benchmarks

See Table 4

Table 4 Source code of benchmarks

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ul Mustafa, N., O’Riordan, M.J., Rogers, S. et al. Exploiting architectural features of a computer vision platform towards reducing memory stalls. J Real-Time Image Proc 17, 853–870 (2020). https://doi.org/10.1007/s11554-018-0830-8

Download citation

Received: 24 September 2017
Accepted: 01 October 2018
Published: 09 October 2018
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11554-018-0830-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Abstract

Access this article

Similar content being viewed by others

Meeting the Challenges of Optimized Memory Management in Embedded Vision Systems Using Operations Research

A Novel Object-Oriented Software Cache for Scratchpad-Based Multi-Core Clusters

An extended analysis of memory hierarchies for efficient implementations of image processing applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: A critical part of source code for benchmarks

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting architectural features of a computer vision platform towards reducing memory stalls

Abstract

Access this article

Similar content being viewed by others

Meeting the Challenges of Optimized Memory Management in Embedded Vision Systems Using Operations Research

A Novel Object-Oriented Software Cache for Scratchpad-Based Multi-Core Clusters

An extended analysis of memory hierarchies for efficient implementations of image processing applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: A critical part of source code for benchmarks

Appendix: A critical part of source code for benchmarks

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation