Hardware Support for Efficient Resource Utilization in Manycore Processor Systems

  • A. Herkersdorf
  • A. Lankes
  • M. Meitinger
  • R. Ohlendorf
  • S. Wallentowitz
  • T. Wild
  • J. Zeppenfeld


Effective utilization of the available processing resources in current multi- and manycore systems primarily depends on the manual talent of the application programmer. This chapter analyses opportunities and suggests approaches to tackle the challenge of making proper use of parallel resources by means of a holistic, cross-layer and inter-disciplinary optimization of application, middleware and architecture aspects. Using heterogeneous network processors as an example, we show how application specific architecture optimizations in this processor domain can be adapted to benefit designs of homogeneous general purpose manycore systems. In addition, methods which have been applied successfully to HPC and scientific computing over the past decades are assessed and down-scaled to benefit manycores. Finally we show how bio-inspired principles (i.e., self-organization and self-adaptation) provide rich opportunities for meaningful adoption in both application-specific and general purpose manycores, for example to provide self-optimization of processor parameters and workload utilization. In summary, we present a set of suggestions for architectural improvements and building blocks that, from our perspective, are useful for future manycores in order to better support the exploitation of available parallel processing resources.


Manycore Multicore Hardware Support Network Processing Bio-Inspired Self-Organization Learning Classifier Platform Optimization Processing Efficiency Hardware Accelerators Supercomputing Network Processing Network-On-Chip High Performance Computing 



Particular thanks go to the German Research Foundation (DFG), the State of Bavaria and Infineon Technologies for supporting our work as part of the Priority Programmes “1148: Reconfigurable Computing” and “1183: Organic Computing”, the “Munich Centre for Advanced Computing” (Project B4, MAPCO) and the BMBF Collaborative industry project “RapidMPSoC” (grant BMBF 01M3085).


  1. 1.
    N.R.Adiga et al. An overview of the BlueGene/L Supercomputer. In Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–22, Los Alamitos, CA, USA, 2002. IEEE Computer Society PressGoogle Scholar
  2. 2.
    T.W. Ainsworth and T.M. Pinkston. On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus. Networks-on-Chip, 2007. First International Symposium on NOCS 2007, pages 18–29, 7–9 May 2007Google Scholar
  3. 3.
    F. Baker, Cisco Systems. Requirements for IP version 4 routers, IETF RFC 1812., 1995
  4. 4.
    S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. pages 338–342, 2003Google Scholar
  5. 5.
    D. Burger, S.W. Keckler, K.S. McKinley, M. Dahlin, L.K. John, C. Lin, C.R. Moore, J. Burrill, R.G. McDonald, W. Yoder, et al. Scaling to the End of Silicon with EDGE Architectures. Computer, pages 44–55, 2004Google Scholar
  6. 6.
    J. Fromm. Emergence of Complexity. Kassel University Press, Kassel, 2004Google Scholar
  7. 7.
    Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro, pages 51–61, 2007Google Scholar
  8. 8.
    Y. Inada and K. Kawachi. Order and Flexibility in the Motion of Fish Schools. Journal of Theoretical Biology, pages 371–387, 2002Google Scholar
  9. 9.
    C. Jesshope, M. Lankamp, and L. Zhang. Evaluating CMPs and Their Memory Architecture. In M. Berekovic, C. Muller-Schoer, C. Hochberger, and S. Wong, editors, Proc. Architecture of Computing Systems, pages 246–257, 2009Google Scholar
  10. 10.
    L. Kencl. Load Sharing for Multiprocessor Network Nodes. Dissertation, EPFL, Lausanne, Switzerland, 2003Google Scholar
  11. 11.
    S. Kent et al., BBN Technologies. Security Architecture for the Internet Protocol, IETF RFC 4301., 2005
  12. 12.
    S. Kumar, C.J. Hughes, and A. Nguyen. Carbon: Architectural Support For Fine-Grained Parallelism On Chip Multiprocessors. In ISCA ’07: Proceedings of the 34th annual international symposium on Computer architecture, pages 162–173, NY, USA, 2007. ACM, NYGoogle Scholar
  13. 13.
    A. Lankes, A. Herkersdorf, S. Sonntag, and H. Reinig. NoC Topology Exploration for Mobile Multimedia Applications. In 16th IEEE International Conference on Electronics, Circuits and Systems, Dec 2009Google Scholar
  14. 14.
    A. Lankes, T. Wild, and A. Herkersdorf. Hierarchical NoCs for Optimized Access to Shared Memory and IO Resources. Euromicro Symposium on Digital Systems Design, pages 255–262, 2009Google Scholar
  15. 15.
    M. Meitinger, R. Ohlendorf, T. Wild, and A. Herkersdorf. Application Scenarios for FlexPath NP. Technical Report TUM-LIS-TR-0501. Technische Universität München. Lehrstuhl für Integrierte Systeme, 2005Google Scholar
  16. 16.
    M. Meitinger, R. Ohlendorf, T. Wild, and A. Herkersdorf. FlexPath NP – A Network Processor Architecture with Flexible Processing Paths. SoC 2008, Tampere, Finland, Nov 2008Google Scholar
  17. 17.
    G. De Micheli. Robust System Design With Uncertain Information. In The Asia and South Pacific Design Automation Conference (ASP-DAC ’03) Keynote Speech, Kitakyushu, page 12, 2003Google Scholar
  18. 18.
    C. Müller-Schloer. Organic Computing: On The Feasibility Of Controlled Emergence. In CODES+ISSS ’04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 2–5, NY, USA, 2004. ACM, NYGoogle Scholar
  19. 19.
    R. Ohlendorf, A. Herkersdorf, and T. Wild. FlexPath NP – A Network Processor Concept with Application-Driven Flexible Processing Paths. CODES+ISSS 2005, Jersey City, NJ, USA, Sept 2005Google Scholar
  20. 20.
    R. Ohlendorf, M. Meitinger, T. Wild, and A. Herkersdorf. An Application-aware Load Balancing Strategy for Network Processors. HiPEAC 2010, Pisa, Italy, Jan 2010Google Scholar
  21. 21.
    P. Palatin, Y. Lhuillier, and O. Temam. CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs. In Proc. ACM International Symposium on MICRO-39 Microarchitecture 39th Annual IEEE, pages 247–258, 2006Google Scholar
  22. 22.
    W. Shi and L. Kencl. Sequence-Preserving Adaptive Load Balancers. ANCS 2006, San Jose, CA, USA, Dec 2006Google Scholar
  23. 23.
    J. Teich. Invasive Algorithms and Architectures. it – Information Technology, pages 300–310, 2008Google Scholar
  24. 24.
    D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J.F. Brown III, and A. Agarwal. On-Chip Interconnection Architecture Of The Tile Processor. IEEE Micro, pages 15–31, 2007Google Scholar
  25. 25.
    J. Zeppenfeld and A. Herkersdorf. Autonomic Workload Management for Multi-Core Processor Systems. In International Conference on Architecture of Computing Systems, 2010Google Scholar
  26. 26.
    J. Zeppenfeld, A. Bouajila, W. Stechele, and A. Herkersdorf. Learning Classifier Tables for Autonomic Systems on Chip. In GI Jahrestagung, pages 771–778, 2008Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • A. Herkersdorf
    • 1
  • A. Lankes
  • M. Meitinger
  • R. Ohlendorf
  • S. Wallentowitz
  • T. Wild
  • J. Zeppenfeld
  1. 1.Institute for Integrated Systems, TU MünchenMünchenGermany

Personalised recommendations