Understanding Data Partition for Applications on CPU-GPU Integrated Processors
Integrating GPU with CPU on the same chip is increasingly common in current processor architectures for high performance. CPU and GPU share on-chip network, last level cache, memory. Do not need to copy data back and forth that a discrete GPU requires. Shared virtual memory, memory coherence, and system-wide atomics are introduced to heterogeneous architectures and programming models to enable fine-grained CPU and GPU collaboration. Programming model such as OpenCL 2.0, CUDA 8.0, and C++ AMP support these heterogeneous architecture features. Data partition is one of the collaboration patterns. It is essential for improving performance and energy-efficiency to balance the data processed between CPU and GPU. In this paper, we first demonstrate that the optimal allocation of data to the CPU and GPU can provide 20% higher performance than fixed ratio of 20% for one application. Second, we evaluate another 5 heterogeneous applications covering the latest architecture features, found the relation of the data partitioning with performance.
KeywordsData partition GPU Heterogeneous architectures
This work is partially supported by the National Natural Science Foundation of China under Grant NO. 61202076 and NO. 61202062.
- 1.Khronos Group: The OpenCL specification, Version 2.0 (2015)Google Scholar
- 2.NVidia: CUDA C programming guide v. 8.0, September 2016Google Scholar
- 3.Vilches, A., Asenjo, R., Navarro, A., Corbera, F., Gran, R., Garzarán, M.: Adaptive partitioning for irregular applications on heterogeneous CPU-GPU chips. In: International Conference on Computational Science, vol. 51, pp. 271–350, pp. 140–149 (2015)Google Scholar
- 6.Pérez, B., Bosque, J.L., Beivide, R.: Simplify programming and load balancing of data parallel applications on heterogeneous system. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, pp. 42–51 (2016)Google Scholar
- 7.Gómez-Luna, J., Hajj, I.E., Chang, L.-W., García-Flores, V., de Gonzalo, S.G., Jablin, T.B., Pena, A.J., Hwu, W.-M.: Chai: collaborative heterogeneous applications for integrated-architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (2017)Google Scholar
- 9.Hwu, W.-M.W.: Heterogeneous System Architecture: A New Compute Platform Infrastructure. Morgan Kaufman (2015)Google Scholar
- 12.Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing CUDA workloads using a detailed GPU simulator. In: International Symposium on Performance Analysis of Systems and Software (2009)Google Scholar