Journal of Signal Processing Systems

, Volume 77, Issue 1–2, pp 95–115 | Cite as

Hardware Acceleration of Red-Black Tree Management and Application to Just-In-Time Compilation

  • Alexandre Carbon
  • Yves Lhuillier
  • Henri-Pierre Charles
Article

Abstract

Due to the everlasting consumer demand for more complex applications, embedded systems have evolved both in terms of complexity and heterogeneity. The architecture of such systems often includes several kinds of different computing resources (DSPs, GPUs, etc.). As a consequence, software designers are facing significant performance and portability issues to target these devices. Software relies more and more on virtualization technologies to maximize portability of applications. In order to balance portability and performance, most virtualization technologies leverage Just-in-time (JIT) compilation to provide runtime optimized code from portable one. Nevertheless, the efficiency of JIT compilation depends on the ability to compensate its overhead with execution speedups of generated code. While most research efforts focus on limiting overhead of JIT compilation phases by reducing their occurrences, this paper investigates opportunities of speeding up JIT compilation itself. We first present a performance analysis of different JIT compilation technologies in order to identify hardware and software optimization opportunities. Second, we propose a solution based on a dedicated processor with specialized instructions for critical functions of JIT compilers. These specialized instructions provide an average 5× speedup on manipulations of associative arrays and dynamic memory allocation. Based on the LLVM framework, we show a 15% overall speedup on code generator’s execution time. Because our specialized instructions are hidden behind standard libraries, we also argue that these instructions may be transparently reused for a wider range of applications.

Keywords

Hardware acceleration Red-Black trees Associative arrays JIT compilation Virtualization Embedded systems 

References

  1. 1.
    Apple Inc (Original authors) and Khronos Group (Developpers) OpenCL (Open Computing Language), [Online, March 2014]. http://www.khronos.org/opencl/.
  2. 2.
  3. 3.
    ARM Limited Steele S., Java Program Manager. White paper: Accelerating to meet the challenge of embedded java, november 2001.Google Scholar
  4. 4.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A. (2006). The landscape of parallel computing research: A view from berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley.Google Scholar
  5. 5.
    Aycock, J. (2003). A brief history of just-in-time. ACM Computing Surveys, 35, 97–113.CrossRefGoogle Scholar
  6. 6.
    Baiocchi, J., Childers, B.R., Davidson, J.W., Hiser, J.D., Misurda, J. (2007). Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In: Proceedings of the 2007 international conference on compilers, architecture, and synthesis for embedded systems, CASES ’07, pp 75-84, New York, ACM.Google Scholar
  7. 7.
    Bayer, R. (1972). Symmetric binary b-trees: Data structure and maintenance algorithms. Informatica Acta, 1, 290–306.CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Berger, E.D., Zorn, B.G., McKinley, K.S. (2002). Reconsidering custom memory allocation. In Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’02, pp. 1–12, New York, ACM.Google Scholar
  9. 9.
    Borkar, S., & Chien, A.A. (2011). The future of microprocessors. Commun ACM, 54(5), 67–77.CrossRefGoogle Scholar
  10. 10.
    Campanoni, S., Agosta, G., Reghizzi, S.C. (2008). A parallel dynamic compiler for cil bytecode. SIGPLAN Not, 43(4), 11–20. doi:http://dx.doi.org/10.1145/1374752.1374754.CrossRefGoogle Scholar
  11. 11.
    Cao, T., Blackburn, S.M., Gao, T., McKinley, K.S. (2012). The yin and yang of power and performance for asymmetric hardware and managed software. In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12, pp 225-236 Washington, DC, USA, IEEE Computer Society.Google Scholar
  12. 12.
    Carbon, A., Lhuillier, Y., Charles, H.P. (2013). Hardware acceleration for just-in-time compilation on heterogeneous embedded systems. In: Application-Specific Systems, Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on, pp 203-210.Google Scholar
  13. 13.
    Carbon, A., Lhuillier, Y., Charles, H.-P. (2013). Code specialization for red-black tree man- agement algorithms. In Proceedings of the 3rd international workshop on adaptive self-tuning computing systems, ADAPT ’13, page To appear, New York, ACM.Google Scholar
  14. 14.
    CEA LIST. Unisim virtual platforms. http://unisim-vp.org/site/index.html. [On- line, March 2014].
  15. 15.
    Chang, M., Smith, E., Reitmaier, R., Bebenita, M., Gal, A., Wimmer, C., Eich, B., Franz, M. (2009). Tracing for web 3.0: trace compilation for the next generation web applications. In: Proceedings of the ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE ’09, pp 71-80 New York, ACM.Google Scholar
  16. 16.
    Charles, H.-P., & Sajjad, K. (2009). HPBCG High Performance Binary Code Generator. [Online, March 2014]. http://code.google.com/p/hpbcg/.
  17. 17.
    Chen, G., Kandemir, M., Vijaykrishnan, N., Irwin, M.J. (2003). Energy-aware code cache management for memory-constrained java devices. In SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip], 179–182.Google Scholar
  18. 18.
    Cohen, A., & Rohou, E. (2010). Processor virtualization and split compilation for hetero- geneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC ’10, pages 102-107, New York, ACM.Google Scholar
  19. 19.
    Gal, A., Probst, C.W., Franz, M. (2006). Hotpathvm: an effective jit compiler for resource-constrained devices. In: Proceedings of the 2nd international conference on virtual execution environments, VEE ’06, pp 144-153, New York, NY, USA, ACM.Google Scholar
  20. 20.
    Guibas, L.J., & Sedgewick, R. (1978). A dichromatic framework for balanced trees. IEEE Annual Symposium on Foundations of Computer Science, 0, 8–21.MathSciNetGoogle Scholar
  21. 21.
    Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B. (2001). MiBench: A free, commercially representative embedded benchmark suite. In Pro- ceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, WWC ’01, Washington. IEEE Computer Society, 3–14.Google Scholar
  22. 22.
    Heiser, G. (2008). The role of virtualization in embedded systems. In: Proceedings of the 1st workshop on Isolation and integration in embedded systems, IIES ’08, pp 11-16, New York, NY, USA, ACM.Google Scholar
  23. 23.
    Kulkarni, P.A., & Fuller, J. (2011). Jit compilation policy on single-core and multi-core ma- chines. In Proceedings of the 2011 15th workshop on interaction between compilers and computer architectures, INTERACT ’11, Washington. IEEE Computer Society, 54–62.Google Scholar
  24. 24.
    Kumar, R., Farkas, K.I., Jouppi, N.P., Ranganathan, P., Tullsen, D.M. (2003). Single-isa heterogeneous multi-core architectures: The potential for processor power reduction. In: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pp 81, Washington, IEEE Computer Society.Google Scholar
  25. 25.
    Lattner, C., & Adve. V. (2004). LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO ’04, pp. 75, Washington, IEEE Computer Society.Google Scholar
  26. 26.
    Lea, D. (2000). A memory allocator. http://g.oswego.edu/dl/html/malloc.html.
  27. 27.
    Moore, R.W., Baiocchi, J.A., Childers, B.R., Davidson, J.W., Hiser, J.D. (2009). Addressing the challenges of dbt for the arm architecture. In Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded sys- tems, LCTES ’09, pp. 147–156, New York,ACM.Google Scholar
  28. 28.
    Nethercote, N., & Seward, J. (2007). Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Program- ming language design and implementation, PLDI ’07, pp. 89–100, New York, NY, USA, ACM.Google Scholar
  29. 29.
    Nuzman, D., Dyshel, S., Rohou, E., Rosen, I., Williams, K., Yuste, D., Cohen, A., Zaks, A. (2011). Vapor simd: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’11, pp 151–160, Washington, DC, USA, IEEE Computer Society.Google Scholar
  30. 30.
    Pty Ltd Southern Storm Software (2014). Dotgnu project.Google Scholar
  31. 31.
    Radhakrishnan, R., John, L.K., Rubio, J., Vijaykrishnan, N. (1999). Execution characteristics of just-in-time compilers.Google Scholar
  32. 32.
    Rigo, A. (2004). Representation-based just-in-time specialization and the psyco prototype for python. In Proceedings of the 2004 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, PEPM ’04, pages 15-26, New York, ACM.Google Scholar
  33. 33.
    Schoeberl, M. (2008). A java processor architecture for embedded real-time systems. J Syst Archit, 54(1-2), 265–286.CrossRefGoogle Scholar
  34. 34.
    Shaylor, N. (2002). A just-in-time compiler for memory-constrained low-power devices. In: Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, USENIX Association, Berkeley, (pp. 119–126). USA: CA.Google Scholar
  35. 35.
    Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. SIGPLAN Not, 44(3), 253– 264.CrossRefGoogle Scholar
  36. 36.
    Van Vleck, T. (2014). The IBM 360/67 and CP/CMS. URLhttp://www.multicians.org/thvv/360-67.html.
  37. 37.
    Xamarin (2014). The Mono Project. http://www.mono-project.com.
  38. 38.
    Yang, B.S., Moon, S.-M., Park, S., Lee, J., Lee, S., Park, J., Chung, Y.C., Kim, S., Ebcioglu, K., Altman, E. (1999). Latte: A java vm just-in-time compiler with fast and efficient register allocation. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, PACT ’99, pp 128 Washington, DC, USA, IEEE Computer Society.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Alexandre Carbon
    • 1
  • Yves Lhuillier
    • 1
  • Henri-Pierre Charles
    • 2
    • 3
  1. 1.CEA, LISTEmbedded Computing LaboratoryGif-sur-YvetteFrance
  2. 2.Univ. Grenoble AlpesGrenobleFrance
  3. 3.CEA, LISTMINATEC CampusGrenobleFrance

Personalised recommendations