Skip to main content
Log in

A Halide-based Synergistic Computing Framework for Heterogeneous Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for different processors of the machine, e.g., OpenMP for the CPU and CUDA for the GPU, the new ones tend to offer a unified programming model to abstract details of heterogeneous computing engines. One such programming model is Halide that is designed for high performance image processing. Halide programmers are allowed to map data and computation to either the CPUs or GPUs through high-level C++ functions, which are converted to various code targets, including x86, ARM, CUDA, and OpenCL, by the Halide compiler. Nevertheless, it becomes complex when the programmers attempt to write a Halide program for cooperative computation on both the CPU and GPU. In this work, we propose the synergistic computing framework that extends Halide to improve program execution performance. Several key issues are tackled, including data coherence, workload partitioning, job dispatching and communication/synchronization, so that the Halide programmers are allowed to take advantage of the heterogeneous computing engines with the two developed C++ classes, one is for static workload partitioning/dispatching and the other is the dynamic counterpart. Furthermore, optimizations are developed to improve performance by generating adequate the CPU code, and eliminating extra memory copies. We characterize and discuss the performance of two image processing programs and our framework on the heterogeneous platforms, i.e., Android Nexus 7 smartphone and x86-based computers. Our results show that significant performance gain can be achieved while the CPU and GPU execute a program synergistically with the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

Notes

  1. The initialization overhead refers to the estimated values of the intersection of the CPU-only and GPU-only lines, and the y-axis of the figure, i.e., the values of y-axis when the percentage of the workload is equal to zero. The overhead could come from Halide library and/or the program itself for environment setup.

References

  1. Advanced Micro Devices, Inc (2015). AMD APP SDK - A complete development platform.

  2. Ayguadé, E., Badia, R.M., Bellens, P., Cabrera, D., Duran, A., Ferrer, R., Gonzàlez, M., Igual, F., Jiménez-González, D., Labarta, J., Martinell, L., Martorell, X., Mayo, R., Pérez, J. M., Planas, J., & Quintana-Ortí, E.S. (2010). Extending openMP to survive the heterogeneous multicore era. International Journal of Parallel Programming, 38(5), 440–459.

    Article  Google Scholar 

  3. Garland, M., Kudlur, M., & Zheng, Y. (2012). Designing a unified programming model for heterogeneous machines. In Proceedings of the international conference on high performance computing, networking, storage and analysis, SC ’12 (pp. 67:1–67:11). Los Alamitos, CA, USA: IEEE Computer Society Press.

  4. Hung, S.-H., Chiu, P.-H., Tu, C.-H., Chou, W.-T., & Yang, W.-L. (2014). Message-passing programming for embedded multicore signal-processing platforms. Signal Processing Systems, 75(2), 123–139.

    Article  Google Scholar 

  5. International Business Machines Corp (2007). Data communication and synchronization library programmer’s guide and API reference.

  6. Khronos Group (2015). OpenCL - the open standard for parallel programming of heterogeneous systems.

  7. LLVM Project (2015). Clang: a c language family frontend for LLVM.

  8. Message Passing Interface Forum (2015). Message passing interface (MPI) forum home page.

  9. Microsoft Corp (2015). C++ AMP (C++ accelerated massive parallelism).

  10. O’Brien, K., O’Brien, K., Sura, Z., Chen, T., & Zhang, T. (2008). Supporting openmp on cell. International Journal of Parallel Programming, 36(3), 289–311.

    Article  Google Scholar 

  11. Pandit, P., & Govindarajan, R. (2014). Fluidic kernels: Cooperative execution of openCL programs on multiple heterogeneous devices. In Proceedings of annual IEEE/ACM international symposium on code generation and optimization, CGO ’14 (pp. 273–283).

  12. Planas, J., Badia, R.M., Ayguade, E., & Labarta, J. (2013). Self-adaptive ompss tasks in heterogeneous environments. In 2013 IEEE 27th international symposium on parallel distributed processing (IPDPS) (pp. 138–149).

  13. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN conference on programming language design and implementation, PLDI ’13 (pp. 519–530).

  14. The Multicore Association (2015). Multicore communications API (MCAPI).

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology, Taiwan, and MediaTek Inc., under the grant MOST 103-2622-E-002-034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Heng Tu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, SW., Kuang, SY., Kao, CL. et al. A Halide-based Synergistic Computing Framework for Heterogeneous Systems. J Sign Process Syst 91, 219–233 (2019). https://doi.org/10.1007/s11265-017-1283-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1283-1

Keywords

Navigation