Abstract
Computer vision applications are becoming more and more popular in embedded systems such as drones, robots, tablets, and mobile devices. These applications are both compute and memory intensive, with memory bound stalls (MBS) making a significant part of their execution time. For maximum reduction in memory stalls, compilers need to consider architectural details of a platform and utilize its hardware components efficiently. In this paper, we propose a compiler optimization for a vision-processing system through classification of memory references to reduce MBS. As the proposed optimization is based on the architectural features of a specific platform, i.e., Myriad 2, it can only be applied to other platforms having similar architectural features. The optimization consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. We use LLVM compiler infrastructure for implementation of the proposed optimization. Application of annotation-based approach on a memory-intensive program shows a reduction in stall cycles by 67.44%, leading to 25.61% improvement in execution time. We use 11 different image-processing benchmarks for evaluation of automated analysis approach. Experimental results show that classification of memory references reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%.
Similar content being viewed by others
References
Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V.: Real-time computer vision with opencv. Commun. ACM 55(6), 61–69 (2012)
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116 (2011)
Barry, B., Brick, C., Connor, F., Donohoe, D., Moloney, D., Richmond, R., O’Riordan, M.J., Toma, V.: Always-on vision processing unit for mobile applications. IEEE Micro. 35(2), 56–66 (2015)
Chua, J.L., Chang, Y.C., Lim, W.K.: A simple vision-based fall detection technique for indoor video surveillance. Signal Image Video Process. 9(3), 623–633 (2015)
Gómez, M.J., García, F., Martín, D., de la Escalera, A., Armingol, J.M.: Intelligent surveillance of indoor environments based on computer vision and 3D point cloud fusion. Exp. Syst. Appl. 42(21), 8156–8171 (2015)
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
Suwajanakorn, S., Kemelmacher-Shlizerman, I., Seitz, S.M.: Total moving face reconstruction. In: Proceedings of European Conference on Computer Vision, pp. 796–812 (2014)
Smolyanskiy, N., Huitema, C., Liang, L., Anderson, S.E.: Real-time 3D face tracking based on active appearance model constrained by depth data. Image Vis. Comput. 32(11), 860–869 (2014)
Bar. Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: Proceedings of Medical Imaging 2015: Computer-Aided Diagnosis (2015)
Greenspan, H., van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging. 35(5), 1153–1159 (2016)
Ohn-Bar, E., Tawari, A., Martin, S., Trivedi, M.M.: On surveillance for safety critical events: in-vehicle video networks for predictive driver assistance systems. Comput. Vis. Image Underst. 134, 130–140 (2015)
Mandal, D.K., Sankaran, J., Gupta, A., Castille, K., Gondkar, S., Kamath, S., Sundar, P., Phipps, A.: An Embedded Vision Engine (EVE) for automotive vision processing. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 49–52 (2014)
Zhang, B., Huang, W., Li, J., Zhao, C., Fan, S., Wu, J., Liu, C.: Principles, developments and applications of computer vision for external quality inspection of fruits and vegetables: a review. Food Res. Int. 62, 326–343 (2014)
Aghbashlo, M., Hosseinpour, S., Ghasemi-Varnamkhasti, M.: Computer vision technology for real-time food quality assurance during drying process. Trends Food Sci. Technol. 39(1), 76–84 (2014)
Ma, J., Sun, D.W., Qu, J.H., Liu, D., Pu, H., Gao, W.H., Zeng, X.A.: Applications of computer vision for assessing quality of agri-food products: a review of recent research advances. Crit. Rev. Food Sci. Nutr. 56(1), 113–127 (2016)
Guo, Y., Zhuge, Q., Hu, J., Yi, J., Qiu, M., Sha, E.H.M.: Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(6), 809–817 (2013)
Wang, D., Du, X., Yin, L., Lin, C., Ma, H., Ren, W., Wang, H., Wang, X., Xie, S., Wang, L., Liu. Z., Wang, T., Pu, Z., Ding, G., Zhu, M., Yang, L., Guo, R., Zhang, Z., Lin, X., Hao, J., Yang, Y., Sun, W., Zhou, F., Xiao, N., Cui, Q., Wangg, X.: MaPU: A novel mathematical computing architecture. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 457–468 (2016)
Lin, Z., Sankaran, J., Flanagan, T.: Empowering automotive vision with TI’s Vision AccelerationPac. TI White Paper (2013)
Conti, F., Rossi, D., Pullini, A., Loi, I., Benini, L.: PULP: a ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. J. Signal Process. Syst. 84(3), 339–354 (2016)
Machine Vision Technology: Movidius https://www.movidius.com/technology. Accessed 23 Sept 2017
Diken, E., O’Riordan, M.J., Jordans, R., Jozwiak, L., Corporaal, H., Moloney, D.: Mixed-length simd code generation for vliw architectures with multiple native vector-widths. In: Proceedings of IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 181–188 (2015)
Chen, T.P., Budnikov, D., Hughes, C.J, Chen, Y.K.: Computer vision on multi-core processors: articulated body tracking. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1862–1865 (2007)
Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of Second Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 75–88 (2004)
Sethia, A., Dasika, G., Mudge, T., Mahlke, S.A.: Customized processor for energy efficient scientific computing. IEEE Trans. Comput. 61(12), 1711–1723 (2012)
Cho, J., Paek, Y., Whalley, D.: Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES), pp. 130–138 (2002)
Leupers, R., Kotte, D.: Variable partitioning for dual memory bank DSPs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1121–1124 (2001)
Ko, M.Y., Bhattacharyya, S.S.: Partitioning for DSP software synthesis. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 344–358 (2003)
Murray, A., Franke, B.: Fast source-level data assignment to dual memory banks. In: Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 43–52 (2008)
Sipkova, V.: Efficient variable allocation to dual memory banks of DSPs. In: Proceedings of International Workshop on Software and Compilers for Embedded Systems (SCOPES), pp. 359–372 (2003)
Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Operation and data mapping for CGRAs with multi-bank memory. In: Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 17–26 (2010)
Mi, W., Feng, X., Xue, J., Jia. Y.: Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In: Proceedings of International Conference on Network and Parallel Computing (IFIP), pp. 329–343 (2010)
Bircsak, J., Craig, P., Crowell, R., Cvetanovic, Z., Harris, J., Nelson, C.A, Offner, C.D.: Extending openmp for NUMA machines. In: Proceedings of ACM/IEEE 2000 Conference on Supercomputing (SC) (2000)
Antony, J., Janes, P.P., Rendell, A.P.: Exploring thread and memory placement on numa architectures: Solaris and linux, ultrasparc/fireplane and opteron/hypertransport. In: Proceedings of International Conference on High-Performance Computing, pp. 338–352 (2006)
Lameter, C.: Numa (non-uniform memory access): an overview. ACM Queue. 11(7), 1–12 (2013)
Ribeiro, C.P., Mehaut, J.F., Carissimi, A., Castro, M., Fernandes, L.G.: Memory affinity for hierarchical shared memory multiprocessors. In: Proceedings of 21st International Symposium on Computer Architecture and High Performance Computing, pp. 59–66 (2009)
Kleen, A.: A numa api for linux. SUSE Labs (2004). http://halobates.de/numaapi3.pdf. Accessed 23 Sept 2017
Löf, H.,Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system. In: Proceedings of 19th Annual International Conference on Supercomputing (SC), pp. 387–392 (2005)
Lankes, S., Bierbaum, B., Bemmerl, T.: Affinity-on-next-touch: an extension to the linux kernel for numa architectures. In: Proceedings of International Conference on Parallel Processing and Applied Mathematics, pp. 576–585 (2010)
Golgin, B., Furmento, N.: Enabling high-performance memory migration for multithreaded applications on LINUX. In: Proceedings of IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (2009)
Codrescu, L., Anderson, W., Venkumanhanti, S., Zeng, M., Plondke, E., Koob, C., Ingle, A., Tabony, C., Maule, R.: Hexagon DSP: an architecture optimized for mobile multimedia and communications. IEEE Micro. 34(2), 34–43 (2014)
Gonzalez, R.C.: Digital Image Processing. Prentice-Hall, Upper Sadle River (2002)
McDonnell, M.J.: Box-filtering techniques. Comput. Graph. Image Process. 17(1), 65–70 (1981)
Podlozhnyuk, V.: Image convolution with cuda. NVIDIA Corporation white paper, vol 2097(3), (2007)
Niitsuma, H., Maruyama, T.: Sum of absolute difference implementations for image processing on fpgas. In: Proceedings of International Conference on Field Programmable Logic and Applications (FPL), pp. 167–170 (2010)
Bianco, S., Gasparini, F., Schettini, R.: Combining strategies for white balance. In: Proceedings of SPIE 6502, Digital Photography III, pp. 65020D (2007)
Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Reinders, J.: VTune Performance Analyzer Essentials. Intel Press, Santa Clara (2005)
Moloney, D., Barry, B., Richmond, R., Connor, F., Brick, C., Donohoe, D.: Myriad 2: Eye of the computational vision storm. In: Proceedings of Hot Chips 26 Symposium (HCS), pp. 1–18 (2014)
Thorarensen, S.: A back-end for the skepu skeleton programming library targeting the low-power multicore vision processor myriad 2. Master’s thesis, Linköping university, Sweden (2016)
LLVM 6 documentation https://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags. Accessed 23 Sept 2017
Hussain, M., Chen, D., Cheng, A., Wei, H., Stanley, D.: Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J. Photogramm. Remote Sens. 80, 91–106 (2013)
S, M., Shetty, A.: A comparative study of image change detection algorithms in MATLAB. In: Proceedings of International Conference on Water Resources, Coastal and Ocean Engineering (ICWRCOE) pp. 1366–1373 (2015)
Turk, M., Pentland, A.: Eigenfaces for recognition. J Cogn Neurosci. 3(1), 71–86 (1991)
Crow, F.C.: Summed-area tables for texture mapping. In: Proceedings of 11th International Conference on Computer Graphics and Interactive Techniques pp. 207–212 (1984)
Jiang, L., Xie, H., Pan, B.: Speeding up digital image correlation computation using the integral image technique. Opt. Lasers Eng. 65, 117–122 (2015)
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)
Ramanath, R., Snyder, W.E., Yoo, Y., Drew, M.S.: Color image processing pipeline. IEEE Signal Process. Mag. 22(1), 34–43 (2005)
Lukac, R.: New framework for automatic white balancing of digital camera images. Signal Process. 88(3), 582–593 (2008)
Arici, T., Dikbas, S., Altunbasak, Y.: A histogram modification framework and its application for image contrast enhancement. IEEE Trans. Image Process. 18(9), 1921–1935 (2009)
Duan, J., Qiu, G.: Novel histogram processing for colour image enhancement. In: Proceedings of Third International Conference on Image and Graphics (ICIG) pp. 55–58 (2004)
Hong, W.: A study of fast, robust stereo-matching algorithms. Doctoral dissertation, Massachusetts Institute of Technology, USA, (2010)
Acknowledgements
This work is supported by European Union’s Horizon2020 research and innovation programme under grant agreement number 687698 and Ph.D. scholarship from Higher Education Commission (HEC) of Pakistan awarded to Naveed Ul Mustafa.
Author information
Authors and Affiliations
Corresponding author
Appendix: A critical part of source code for benchmarks
Appendix: A critical part of source code for benchmarks
See Table 4
Rights and permissions
About this article
Cite this article
Ul Mustafa, N., O’Riordan, M.J., Rogers, S. et al. Exploiting architectural features of a computer vision platform towards reducing memory stalls. J Real-Time Image Proc 17, 853–870 (2020). https://doi.org/10.1007/s11554-018-0830-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-018-0830-8