Proposal of Parallel Processing Area Extraction and Data Transfer Number Reduction for Automatic GPU Offloading of IoT Applications

  • Yoji YamatoEmail author
  • Hirofumi Noguchi
  • Misao Kataoka
  • Takuma Isoda
  • Tatsuya Demizu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11344)


Recently, IoT (Internet of Things) technologies have been progressed. To overcome of the high cost of developing IoT services by vertically integrating devices and services, Open IoT enables various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that have data users need on demand and use them dynamically and an automatic GPU (graphics processing unit) offloading technology as an elementary technology of Tacit Computing. However, it can improve limited applications because it only optimizes parallelizable loop statements extraction. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved method with reduction of data transfer between CPU and GPU. This can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet which is general large application for CPU and find that it can process it 3 times as quickly as only using CPUs within 10 h tuning time.


Open IoT GPGPU Automatic offloading 


  1. 1.
    Hermann, M., et al.: Design principles for industrie 4.0 scenarios. In: Working Draft. Rechnische Universitat Dortmund (2015)Google Scholar
  2. 2.
    Evans, P.C., et al.: Industrial internet: pushing the boundaries of minds and machines. Technical report of GE (2012)Google Scholar
  3. 3.
  4. 4.
    Sefraoui, O., et al.: OpenStack: toward an open-source solution for cloud computing. Int. J. Comput. Appl. 55(3), 38–42 (2012)Google Scholar
  5. 5.
    Yamato, Y., et al.: Fast and reliable restoration method of virtual resources on OpenStack. IEEE Trans. Cloud Comput. 6, 572–583 (2015)CrossRefGoogle Scholar
  6. 6.
    Yamato, Y., et al.: Development of low user impact and low cost server migration technology for shared hosting services. IEICE Trans. Commun. J95-B(4), 547–555 (2012)Google Scholar
  7. 7.
    Yamato, Y.: Key points of telecommunication carriers’ shared hosting servers replacement project. J. Soc. Project Manag. 15(3), 3–8 (2013)Google Scholar
  8. 8.
    Yamato, Y., et al.: Software maintenance evaluation of agile software development method based on OpenStack. IEICE Trans. Inf. Syst. E98-D(7), 1377–1380 (2015)Google Scholar
  9. 9.
    Zaharia, M., et al.: Spark: cluster computing with working sets. In: 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)Google Scholar
  10. 10.
    Marz, N.: STORM: distributed and fault-tolerant realtime computation (2013)Google Scholar
  11. 11.
    Dean, J., et al.: MapReduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150, December 2004Google Scholar
  12. 12.
    TRON project.
  13. 13.
    Yamato, Y.: Ubiquitous service composition technology for ubiquitous network environments. IPSJ J. 48(2), 562–577 (2007)Google Scholar
  14. 14.
    Yamato, Y., et al.: Context-aware ubiquitous service composition technology. In: The IFIP International Conference on Research and Practical Issues of Enterprise Information Systems (CONFENIS 2006), pp. 51–61, April 2006Google Scholar
  15. 15.
    Yamato, Y., et al.: Study of user customize sevice composition technology based on BPEL extension. IPSJ J. 51 (2010)Google Scholar
  16. 16.
    Yamato, Y., et al.: Study and development of user customize service composition and change-over using BPEL engine. IEICE Trans. Commun. J91-B, 1428–1439 (2008)Google Scholar
  17. 17.
    Yamato, Y., et al.: Context-aware service composition and component change-over using semantic web techniques. IEEE ICWS 2007, 687–694 (2007)Google Scholar
  18. 18.
  19. 19.
  20. 20.
    Putnam, A., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: ISCA 2014, pp. 13–24, June 2014Google Scholar
  21. 21.
    Yamato, Y., et al.: Study of service control function for SOAP-REST mash-up service. IPSJ J. 51(2) (2010)Google Scholar
  22. 22.
    Yamato, Y., et al.: Abstract service scenario generation method for ubiquitous service composition. IIEICE Trans. Commun. J91-B, 1220–1230 (2008)Google Scholar
  23. 23.
    Yokohata, Y., et al.: Context-aware content-provision service for shopping malls based on ubiquitous service-oriented network framework and authentication and access control agent framework. In: IEEE CCNC 2006, pp. 1330–1331 (2006)Google Scholar
  24. 24.
    Moriya, T., et al.: Development of building alarm system on service delivery platform. IEICE Trans. Commun. J93-B(4) (2010)Google Scholar
  25. 25.
    Yamato, Y., et al.: Development of service processing agent for context aware service. IEICE Trans. Commun. J91-B(12) (2008)Google Scholar
  26. 26.
    Sanders, J., et al.: CUDA by Example : An Introduction to General-Purpose GPU Programming. Addison-Wesley, Boston (2011). ISBN 0131387685Google Scholar
  27. 27.
    Stone, J.E., et al.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12, 66–73 (2010)CrossRefGoogle Scholar
  28. 28.
    Yamato, Y., et al.: Automatic GPU offloading technology for open IoT environment. IEEE Internet Things J. (2018)Google Scholar
  29. 29.
    Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC—first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). Scholar
  30. 30.
    Wolfe, M.: Implementing the PGI accelerator model. In: ACM the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 43–50 (2010)Google Scholar
  31. 31.
    Ishizaki, K.: Transparent GPU exploitation for Java. In: The Fourth International Symposium on Computing and Networking (CANDAR 2016), November 2016Google Scholar
  32. 32.
    Su, E., et al.: Compiler support of the workqueuing execution model for Intel SMP architectures. In: Fourth European Workshop on OpenMP, September 2002Google Scholar
  33. 33.
  34. 34.
    Tanaka, Y., et al.: Evaluation of optimization method for Fortran codes with GPU automatic parallelization compiler. IPSJ SIG Technical Report, no. 9 (2011)Google Scholar
  35. 35.
    Tomatsu, Y., et al.: gPot: intelligent compiler for GPGPU using combinatorial optimization techniques. In: The 7th Joint Symposium Between Doshisha University and Chonnam National University, August 2010Google Scholar
  36. 36.
    Yamato, Y., et al.: Study and evaluation of context aware service composition using BPEL engine. Inf. Technol. Lett. 6, 447–449 (2007)Google Scholar
  37. 37.
    Yamato, Y., et al.: Study of service control function for web-telecom coordination service. IEICE trans. commun. J91-B, 1417–1427 (2008)Google Scholar
  38. 38.
    Nakano, Y., et al.: Implementation and evaluation of wrapper system that creates web services from web applications. IPSJ J. 49(2), 727–738 (2008)Google Scholar
  39. 39.
    Yamato, Y., et al.: Evaluation of service composition technology through field trial of shopping support service. IPSJ J. 48(2), 755–769 (2007)Google Scholar
  40. 40.
    Yamato, Y., et al.: Study of service composition engine implemented on cellular phone. Inf. Technol. Lett. 4, 269–271 (2005)Google Scholar
  41. 41.
  42. 42.
    Yamato, Y.: Optimum application deployment technology for heterogeneous IaaS cloud. J. Inf. Process. 25(1), 56–58 (2017)Google Scholar
  43. 43.
    Yamato, Y.: OpenStack hypervisor, container and baremetal servers performance comparison. IEICE Commun. Express 4, 228–232 (2015)CrossRefGoogle Scholar
  44. 44.
    Yamato, Y.: Performance-aware server architecture recommendation and automatic performance verification technology on IaaS Cloud. Serv. Orient. Comput. Appl. 11, 121–135 (2016)CrossRefGoogle Scholar
  45. 45.
    Yamato, Y.: Server selection, configuration and reconfiguration technology for IaaS cloud with multiple server types. J. Netw. Syst. Manag. (2017).
  46. 46.
    Yamato, Y., et al.: Development of template management technology for easy deployment of virtual resources on OpenStack. J. Cloud Comput. 3, 7 (2014). Scholar
  47. 47.
    Yamato, Y.: Automatic verification technology of software patches for user virtual environments on IaaS cloud. J. Cloud Comput. 4, 4 (2015). Scholar
  48. 48.
    Holland, J.H.: Genetic algorithms. Sci. Am. 267, 66–73 (1992)CrossRefGoogle Scholar
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
    Redmon, J., et al.: Real-time grasp detection using convolutional neural networks. In: IEEE International Conference on Robotics and Automation (ICRA), May 2015Google Scholar
  54. 54.
    Beylkin, G., et al.: Multiresolution representation of operators with boundary conditions on simple domains. Elsevier ACHA 33(1), 109–139 (2012)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.NTT Network Service Systems LaboratoriesNTT CorporationTokyoJapan

Personalised recommendations