Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications

  • Yoji YamatoEmail author


To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.


Open IoT GPGPU Tacit computing Data transfer optimization Genetic algorithm Automatic offloading 



  1. Beylkin, G., Fann, G., Harrison, R.J., Kurcz, C., Monzon, L. (2012). Multiresolution representation of operators with boundary conditions on simple domains. Elsevier Applied and Computational Harmonic Analysis, 33(1), 109–139.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Clang Website. (2018). Accessed 20 May 2019.
  3. Hermann, M., Pentek, T., Otto, B. (2015). Design principles for Industrie 4.0 scenarios, Working Draft, Rechnische Universitat Dortmund.
  4. Holland, J.H. (1992). Genetic algorithms. Scientific american, 267(1), 66–73.CrossRefGoogle Scholar
  5. Ishizaki, K. (2016). Transparent GPU exploitation for Java. In The fourth international symposium on computing and networking (CANDAR 2016).Google Scholar
  6. Laplace Equation Source Website. (2018). Accessed 20 May 2019.
  7. NAS.FT Website. (2018). Accessed 20 May 2019.
  8. Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P.Y., Burger, D. (2014). A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41th annual international symposium on computer architecture (ISCA’14) (pp. 13–24).Google Scholar
  9. Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In IEEE international conference on robotics and automation (ICRA) (p. 2015).Google Scholar
  10. Sanders, J., & Kandrot, E. (2011). CUDA by example: an introduction to general-purpose GPU programming, Addison-Wesley ISBN-0131387685.Google Scholar
  11. Shirahata, K., Sato, H., Matsuoka, S. (2010). Hybrid map task scheduling for GPU-based heterogeneous clusters. In IEEE second international conference on cloud computing technology and science (CloudCom) (pp. 733–740).Google Scholar
  12. Shitara, A., Nakahama, T., Yamada, M., Kamata, T., Nishikawa, Y., Yoshimi, M., Amano, H. (2011). Vegeta: an implementation and evaluation of development-support middleware on multiple opencl platform. In IEEE second international conference on networking and computing (ICNC 2011) (pp. 141–147).Google Scholar
  13. Stone, J.E., Gohara, D., Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12 (3), 66–73.CrossRefGoogle Scholar
  14. Su, E., Tian, X., Girkar, M., Haab, G., Shah, S., Petersen, P. (2002). Compiler support of the workqueuing execution model for Intel SMP architectures. In Fourth European workshop on OpenMP.Google Scholar
  15. Sunaga, H., Yamato, Y., Ohnishi, H., Kaneko, M., Iio, M., Hirano, M. (2008). Service delivery platform architecture for the next-generation network, ICIN 2008, Session 9-A.Google Scholar
  16. Tanaka, Y., Miki, M., Yoshimi, M., Hiroyasu, T. (2011). Evaluation of optimization method for fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, 2011(9), 1–6.Google Scholar
  17. Tomatsu, Y., Hiroyasu, T., Yoshimi, M., Miki, M. (2010). Gpot: intelligent compiler for GPGPU using combinatorial optimization techniques. In The 7th joint symposium between Doshisha University and Chonnam National University.Google Scholar
  18. Tron Project Web Site. (2018). Accessed 20 May 2019.
  19. Wienke, S., Springer, P., Terboven, C., an Mey, D. (2012). Open ACC-first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pp. 859–870.Google Scholar
  20. Wolfe, M. (2010). Implementing the PGI accelerator model. In ACM the 3rd workshop on general-purpose computation on graphics processing units (pp. 43–50).Google Scholar
  21. Wuhib, F., Stadler, R., Lindgren, H. (2012). Dynamic resource allocation with management objectives - implementation for an OpenStack cloud. In 2012 8th international conference and 2012 workshop on systems virtualiztion management, Proceedings of Network and service management (pp. 309–315).Google Scholar
  22. Yamato, Y. (2007). Ubiquitous service composition technology for ubiquitous network environments. IPSJ Journal, 48(2), 562–577.Google Scholar
  23. Yamato, Y. (2015a). Use case study of HDD-SSD hybrid storage, distributed storage and HDD storage on OpenStack. In 19th international database engineering & applications symposium (IDEAS15) (pp. 228–229).Google Scholar
  24. Yamato, Y. (2015b). OpenStack Hypervisor, container and baremetal servers performance comparison. IEICE Communication Express, 4(7), 228–232.CrossRefGoogle Scholar
  25. Yamato, Y. (2015c). Automatic verification technology of software patches for user virtual environments on IaaS cloud, Journal of Cloud Computing, Springer, 2015, 4:4,
  26. Yamato, Y. (2016a). Cloud storage application area of HDD-SSD hybrid storage, distributed storage and HDD storage. IEEJ Transactions on Electrical and Electronic Engineering, 11(5), 674–675.CrossRefGoogle Scholar
  27. Yamato, Y. (2016b). Performance-aware server architecture recommendation and automatic performance verification technology on IaaS cloud, Service oriented computing and applications, Springer.Google Scholar
  28. Yamato, Y. (2017a). Server selection, configuration and reconfiguration technology for IaaS cloud with multiple server types, Journal of Network and Systems Management, Springer,
  29. Yamato, Y. (2017b). Optimum application deployment technology for heterogeneous IaaS cloud. Journal of Information Processing, 25(1), 56–58.CrossRefGoogle Scholar
  30. Yamato, Y., & Sunaga, H. (2007). Context-aware service composition and component change-over using semantic web techniques. In IEEE international conference on web services (ICWS 2007) (pp. 687–694).Google Scholar
  31. Yamato, Y., Tanaka, Y., Sunaga, H. (2006). Context-aware ubiquitous service composition technology. In The IFIP international conference on research and practical issues of enterprise information systems (CONFENIS 2006) (pp. 51–61).Google Scholar
  32. Yamato, Y., Ohnishi, H., Sunaga, H. (2008). Development of service control server for web-telecom coordination service. In IEEE international conference on web services (ICWS 2008) (pp. 600–607).Google Scholar
  33. Yamato, Y., Nishizawa, Y., Nagao, S., Sato, K. (2015a). Fast and reliable restoration method of virtual resources on OpenStack, IEEE Transactions on Cloud Computing,
  34. Yamato, Y., Katsuragi, S., Nagao, S., Miura, N. (2015b). Software maintenance evaluation of agile software development method based on OpenStack. IEICE Transactions on Information & Systems, E98-D(7), 1377–1380.CrossRefGoogle Scholar
  35. Yamato, Y., Fukumoto, Y., Kumazaki, H. (2017). Predictive maintenance platform with sound stream analysis in edges. Journal of Information Processing, 25, 317–320.CrossRefGoogle Scholar
  36. Yamato, Y., Demizu, T., Noguchi, H., Kataoka, M. (2018a). Automatic GPU offloading technology for open IoT environment. IEEE Internet of Things Journal.Google Scholar
  37. Yamato, Y., Noguchi, H., Kataoka, M., Isoda, T., Demizu, T. (2018b). Proposal of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. In The 3rd international conference on smart computing and communication (SmartCom 2018) (pp. 39–54).Google Scholar
  38. Yokohata, Y., Yamato, Y., Takemoto, M., Sunaga, H. (2006a). Service composition architecture for programmability and flexibility in ubiquitous communication networks. In IEEE international symposium on applications and the internet workshops (SAINTW’06) (pp. 142–145).Google Scholar
  39. Yokohata, Y., Yamato, Y., Takemoto, M., Tanaka, E., Nishiki, K. (2006b). Context-aware content-provision service for shopping malls based on ubiquitous Service-Oriented network framework and authentication and access control agent framework. In IEEE consumer communications and networking conference (CCNC 2006) (pp. 1330–1331).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.NTT Network Service Systems LaboratoriesNTT CorporationMusashino-shiJapan

Personalised recommendations