International Journal of Parallel Programming

, Volume 45, Issue 6, pp 1592–1624 | Cite as

Towards Parallelism Extraction for Heterogeneous Multicore Android Devices

  • Miguel Angel AguilarEmail author
  • Juan Fernando Eusse
  • Projjol Ray
  • Rainer Leupers
  • Gerd Ascheid
  • Weihua Sheng
  • Prashant Sharma


Modern Android mobile devices are enabled by complex heterogeneous MPSoC platforms. To exploit the full potential of these hardware platforms, computationally intensive parts of applications have to be properly parallelized. However, the current practice involves several manual steps, which is a cumbersome task for programmers. In this paper, we present an automated approach to extract multiple forms of parallelism from native C code within Android applications, targeting heterogeneous multicore devices. We show the effectiveness of our approach by parallelizing a set of benchmarks on a Nexus 7 tablet, which is based on a Snapdragon MPSoC that features a quad-core Krait CPU cluster and an Adreno 320 GPU.


Parallelization Android MPSoC Mobile GPUs 


  1. 1.
    Acosta, A., Almeida, F.: Euro-Par 2013: parallel processing workshops. In: Towards a Unified Heterogeneous Development Model in AndroidTM, Chap., pp. 238–248. Springer, Berlin (2014)Google Scholar
  2. 2.
    Aguilar, M.A., Eusse, J.F., Ray, P., Leupers, R., Ascheid, G., Sheng, W., Sharma, P.: Parallelism extraction in embedded software for Android devices. In: SAMOS XV, pp. 9–17 (2015)Google Scholar
  3. 3.
    Aguilar, M.A., Leupers, R.: Unified identification of multiple forms of parallelism in embedded applications. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 482–483 (2015)Google Scholar
  4. 4.
    Aguilar, M.A., Leupers, R., Ascheid, G., Kavvadias, N.: A toolflow for parallelization of embedded software in multicore DSP platforms. In: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems. SCOPES’15, pp. 76–79. ACM, New York (2015)Google Scholar
  5. 5.
    Aguilar, M.A., Leupers, R., Ascheid, G., Murillo, L.G.: Automatic parallelization and accelerator offloading for embedded applications on heterogeneous MPSoCs. In: Proceedings of the 53rd Annual Design Automation Conference, DAC’16, pp. 49:1–49:6. ACM, New York (2016)Google Scholar
  6. 6.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20. 1967, Spring Joint Computer Conference, AFIPS’67 (Spring), pp. 483–485. ACM, New York (1967)Google Scholar
  7. 7.
    ASUS: Nexus 7 (2013). (online) Accessed 02/2016
  8. 8.
    Boissinot, B.: Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions. Ph.D. thesis (2010)Google Scholar
  9. 9.
    Castrillon, J., Leupers, R.: Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, Berlin (2014)CrossRefGoogle Scholar
  10. 10.
    Castrillon, J., Leupers, R., Ascheid, G.: MAPS: mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Trans. Ind. Inform. (99), 19 (2011)Google Scholar
  11. 11.
    Castrillon, J., Tretter, A., Leupers, R., Ascheid, G.: Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In: Proceedings of the 49th Annual Design Automation Conference, pp. 1266–1271. ACM, New York (2012)Google Scholar
  12. 12.
    Chandrasekaran, S., Chapman, B.: A portable OpenMP runtime library based on MCAPI/MRAPI. (online) Accessed 03/2016
  13. 13.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization, IISWC 2009. pp. 44–54 (2009)Google Scholar
  14. 14.
    Cordes, D.A.: Automatic parallelization for embedded multi-core systems using high-level cost models. Ph.D. thesis, TU Dortmund University (2013)Google Scholar
  15. 15.
    CriticalBlue: Prism. (online) Accessed 3/2016
  16. 16.
    Eclipse. (online) Accessed 03/2016
  17. 17.
    Eusse, J.F., Williams, C., Leupers, R.: CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design. ACM Trans. Reconfig. Technol. Syst. 8, 17:1–17:16 (2014)Google Scholar
  18. 18.
    Faxen, K.F., Popov, K., Albertsson, L., Janson, S.: Embla—data dependence profiling for parallel programming. In: Proceedings of Complex, Intelligent and Software Intensive Systems, pp. 780–785 (2008)Google Scholar
  19. 19.
    Gilles, K.: The semantics of a simple language for parallel programming. In: Rosenfeld, J.L. (ed.) IFIP Congress 74, pp. 471–475. North Holland, Amsterdam (1974)Google Scholar
  20. 20.
    Google: Android Auto. (online) Accessed 02/2016
  21. 21.
    Google: Android: Canvas and Drawables. (online) Accessed 03/2016
  22. 22.
    Google: Android Studio. (online) Accessed 03/2016
  23. 23.
    Google: ART and Dalvik. (online) Accessed 02/2016
  24. 24.
    Google: Java Native Interface. (online) Accessed 02/2016
  25. 25.
    Google: Native Development Kit. (online) Accessed 02/2016
  26. 26.
    Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151–162 (2006)Google Scholar
  27. 27.
    IDC: Smartphone os market share, 2015 q2. (online) Accessed 02/2016
  28. 28.
    Islam, M.: On the limitations of compilers to exploit thread-level parallelism in embedded applications. In: 6th IEEE/ACIS International Conference on Computer and Information Science, 2007. ICIS 2007, pp. 60–66 (2007)Google Scholar
  29. 29.
    Johnson, R.C.: Efficient program analysis using dependence flow graphs. Ph.D. thesis, Cornell University (1994)Google Scholar
  30. 30.
    Johnson, R.E.: Software development is program transformation. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. FoSER’10, pp. 177–180. ACM, New York (2010)Google Scholar
  31. 31.
    Karkowski, I., Corporaal, H.: Overcoming the limitations of the traditional loop parallelization. FGCS 13(4–5), 407–416 (1998)CrossRefGoogle Scholar
  32. 32.
    Kejariwal, A., Veidenbaum, A.V., Nicolau, A., Girkarmark, M., Tian, X., Saito, H.: Challenges in exploitation of loop parallelism in embedded applications. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. CODES+ISSS’06, pp. 173–180. ACM, New York (2006)Google Scholar
  33. 33.
    Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, San Francisco (2002)Google Scholar
  34. 34.
    Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: Proceedings of MICRO 45, pp. 437–448. IEEE Computer Society, Washington (2012)Google Scholar
  35. 35.
    Keutzer, K., Mattson, T.: Our pattern language (OPL). A pattern language for parallel programming. (online) Accessed 06/2016
  36. 36.
    Khronos: The OpenCL specification. version 1.1. (online) Accessed 03/2016
  37. 37.
    Kienhuis, B., Rijpkema, E., Deprettere, E.: Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In: Proceedings of CODES 2000, pp. 13–17Google Scholar
  38. 38.
    Kim, M.: Dynamic program analysis algorithms to assist parallelization. Ph.D. thesis, Atlanta. AAI0828881 (2012)Google Scholar
  39. 39.
    Kock, E.A.D., Essink, G., Smits, W.J.M., Wolf, P.V.D.: YAPI: application modeling for signal processing systems. In: Proceedings of 37th DAC, pp. 402–405 (2000)Google Scholar
  40. 40.
    McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation, 1st edn. Morgan Kaufmann, San Francisco (2012)Google Scholar
  41. 41.
    Membarth, R., Reiche, O., Hannig, F., Teich, J.: Code generation for embedded heterogeneous architectures on Android. In: Proceedings of DATE’14, pp. 86:1–86:6Google Scholar
  42. 42.
    Multicore Association: Software-hardware interface for multi-many-core (SHIM) specification v1.00. (online) Accessed 06/2016
  43. 43.
    OpenMP Review Board: Openmp application program interface. version 3.1. (online) Accessed 08/2016
  44. 44.
    Qualcomm: Snapdragon. (online) Accessed 02/2016
  45. 45.
    Samsung: Exynos. (online) Accessed 02/2016
  46. 46.
    Sheng, W., Schürmans, S., Odendahl, M., Bertsch, M., Volevach, V., Leupers, R., Ascheid, G.: A compiler infrastructure for embedded heterogeneous MPSoCs. Parallel Comput. 40(2), 51–68 (2014)CrossRefGoogle Scholar
  47. 47.
    Silexica: (online) Accessed 4/2016
  48. 48.
    Stotzer, E.: Towards using OpenMP in embedded systems. OpenMPCon: Developers Conference (2015)Google Scholar
  49. 49.
    Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: Proceedings of SAMOS XII, pp. 144–151 (2012)Google Scholar
  50. 50.
    Sujeeth, A.K., Brown, K.J., Lee, H., Rompf, T., Chafi, H., Odersky, M., Olukotun, K.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13(4s), 134:1–134:25 (2014). doi: 10.1145/2584665 CrossRefGoogle Scholar
  51. 51.
    Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: Proceedings of MICRO 40, pp. 356–369. IEEE Computer Society (2007)Google Scholar
  52. 52.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: a language for streaming applications. In: Proceedings of CC’02, pp. 179–196. Springer, Berlin (2002)Google Scholar
  53. 53.
    Tournavitis, G.: Profile-driven parallelization of sequential programs. Ph.D. thesis, University of Edinburgh (2011)Google Scholar
  54. 54.
    Verdoolaege, S., Nikolov, H., Stefanov, T.: Pn: A tool for improved derivation of process networks. EURASIP J. Embedded Syst. 2007(1), 19–19 (2007). doi: 10.1155/2007/75947 Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Institute for Communication Technologies and Embedded SystemsRWTH Aachen UniversityAachenGermany
  2. 2.Silexica GmbHCologneGermany
  3. 3.Samsung R&D InstituteStainesUK

Personalised recommendations