The Agamid design-space exploration framework

Task-accurate simulation of hardware-enhanced run-time management for many-core

Abstract

The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. 1.

    Ahn JH, Li S, Seongil O, Jouppi NP (2013) Mcsima+: a manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), IEEE, pp 74–85

  2. 2.

    Bergamaschi R, Nair I, Dittmann G, Patel H, Janssen G, Dhanwada N, Buyuktosunoglu A, Acar E, Nam GJ, Kucar D, et al (2007) Performance modeling for early analysis of multi-core systems. In: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, ACM, pp 209–214

  3. 3.

    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7

    Article  Google Scholar 

  4. 4.

    Cai L, Gajski D (2003) Transaction level modeling: an overview. In: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ACM, pp 19–24

  5. 5.

    Cain HW, Lepak KM, Schwartz BA, Lipasti MH (2002) Precise and accurate processor simulation. In: Workshop on computer architecture evaluation using commercial workloads, HPCA, vol 8

  6. 6.

    Carvalho E, Calazans N, Moraes F (2007) Heuristics for dynamic task mapping in noc-based heterogeneous MPSOCS. In: 18th IEEE/IFIP international workshop on rapid system prototyping, 2007. RSP 2007, IEEE, pp 34–40

  7. 7.

    Cho S, Demetriades S, Evans S, Jin L, Lee H, Lee K, Moeng M (2008) TPTS: a novel framework for very fast manycore processor architecture simulation. In: 37th international conference on parallel processing, ICPP’08, IEEE, pp 446–453

  8. 8.

    Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: Proceedings of the Twenty-Eighth Hawaii international conference on system sciences, IEEE, vol 2, pp 113–122

  9. 9.

    Dick RP, Rhodes DL, Wolf W (1998) TGFF: task graphs for free. In: Proceedings of the 6th international workshop on Hardware/software codesign, IEEE Computer Society, pp. 97–101

  10. 10.

    Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 38th annual international symposium on computer architecture (ISCA), IEEE, pp. 365–376

  11. 11.

    Fraboulet A, Risset T, Scherrer A (2004) Cycle accurate simulation model generation for soc prototyping. In: International workshop on embedded computer systems, Springer, pp. 453–462

  12. 12.

    Gailliard G (2010) Towards a common hardware/software specification and implementation approach for distributed, rel time and embedded systems, based on middlewares and object-oriented components. Ph.D. thesis, Université de Cergy Pontoise

  13. 13.

    Gerstlauer A, Haubelt C, Pimentel AD, Stefanov TP, Gajski DD, Teich J (2009) Electronic system-level synthesis methodologies. IEEE Trans Comput Aided Des Integr Circuits Syst 28(10):1517–1530

    Article  Google Scholar 

  14. 14.

    Girkar M, Polychronopoulos CD (1994) The hierarchical task graph as a universal intermediate representation. Int J Parallel Program 22(5):519–551

    Article  Google Scholar 

  15. 15.

    Grama A (2003) Introduction to parallel computing. Pearson Education, London

    Google Scholar 

  16. 16.

    Gregorek D, Garcia-Ortiz A (2014) A transaction-level framework for design-space exploration of hardware-enhanced operating systems. In: International symposium on system-on-chip (SOC 2014). IEEE

  17. 17.

    Gregorek D, Garcia-Ortiz A (2015) The DRACON embedded many-core: hardware-enhanced run-time management using a network of dedicated control nodes. In: International symposium on VLSI (ISVLSI)

  18. 18.

    Gregorek D, Schmidt R, García-Ortiz A (2015) Transaction level analysis for a clustered and hardware-enhanced task manager on homogeneous many-core systems. In: HIP3ES. arXiv:1502.02852

  19. 19.

    Grötker T, Liao S, Martin G, Swan S (2002) System design with systemC\(^{{\rm TM}}\). Springer, Berlin

    Google Scholar 

  20. 20.

    Gupta N, Mandal S, Malave J, Mandal A, Mahapatra R (2010) A hardware scheduler for real time multiprocessor system on chip. In: 23rd international conference on VLSI design, 2010. VLSID’10, IEEE, pp 264–269

  21. 21.

    Haririan P, Garcia-Ortiz A (2014) Non-intrusive DVFS emulation in GEM5 with application to self-aware architectures. In: 2014 9th international symposium on reconfigurable and communication-centric systems-on-chip (ReCoSoC), IEEE, pp 1–7

  22. 22.

    IEEE Design Automation Standards Committee (2011) IEEE std 1666-2011, IEEE standard for standard systemc\(\textregistered \) language reference manual

  23. 23.

    Keutzer K, Rabaey JM, Sangiovanni-Vincentelli A et al (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput Aided Des Integr Circuits Syst 19(12):1523–1543

    Article  Google Scholar 

  24. 24.

    Kinsy MA, Pellauer M, Devadas S (2013) Heracles: a tool for fast RTL-based design space exploration of multicore processors. In: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, ACM, pp. 125–134

  25. 25.

    Kuz I, Anderson Z, Shinde P, Roscoe T (2011) Multicore os benchmarks: we can do better. In: Proceedings of the 13th USENIX conference on Hot topics in operating systems, USENIX Association, pp 10

  26. 26.

    Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59(3):381–422

    Article  Google Scholar 

  27. 27.

    Lee J, Nicopoulos C, Lee HG, Panth S, Lim SK, Kim J (2013) Isonet: hardware-based job queue management for many-core architectures. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(6):1080–1093. https://doi.org/10.1109/TVLSI.2012.2202699

    Article  Google Scholar 

  28. 28.

    Leupers R, Temam O (2010) Processor and system-on-chip simulation. Springer, Berlin

    Book  Google Scholar 

  29. 29.

    Lindh L (1991) Fastchart-a fast time deterministic CPU and hardware based real-time-kernel. In: Proceedings of Euromicro’91 workshop on real time systems, IEEE, pp 36–40

  30. 30.

    Liu W, Xu J, Wu X, Ye Y, Wang X, Zhang W, Nikdast M, Wang Z (2011) A NOC traffic suite based on real applications. In: IEEE computer society annual symposium on VLSI (ISVLSI), IEEE, pp 66–71

  31. 31.

    Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM sigplan notices, ACM, vol 40, pp 190–200

    Article  Google Scholar 

  32. 32.

    Mariani G, Palermo G, Zaccaria V, Silvano C (2012) Evaluating run-time resource management policies for multi-core embedded platforms with the EMME evaluation framework. In: ARCS workshops (ARCS), IEEE, pp 1–6

  33. 33.

    Miller JE, Kasture H, Kurian G, Gruenwald III C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: 2010 IEEE 16th international symposium on high performance computer architecture (HPCA), IEEE, pp 1–12

  34. 34.

    Nollet V, Verkest D, Corporaal H (2010) A safari through the MPSOC run-time management jungle. J Signal Process Syst 60(2):251–268

    Article  Google Scholar 

  35. 35.

    Perez JM, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE international conference on cluster computing, IEEE, pp 142–151

  36. 36.

    Podobas A, Brorsson M (2010) A comparison of some recent task-based parallel programming models. In: MULTIPROG’2010, Jan 2010, Pisa

  37. 37.

    Rhoads S (2006) Plasma-most MIPS i (tm) opcodes: overview. Internet: http://opencores.org/project, plasma, 2 May 2012

  38. 38.

    Rosenblum M, Herrod S, Witchel E, Gupta A et al (1995) Complete computer system simulation: the simos approach. IEEE Parallel Distrib Technol Syst Appl 3(4):34–43

    Article  Google Scholar 

  39. 39.

    Sanchez D, Kozyrakis C (2013) ZSIM: fast and accurate microarchitectural simulation of thousand-core systems. In: ACM SIGARCH computer architecture news, ACM, vol 41, pp 475–486

    Article  Google Scholar 

  40. 40.

    Sinnen O (2007) Task scheduling for parallel systems, vol 60. Wiley, New York

    Book  Google Scholar 

  41. 41.

    Tobita T, Kasahara H (2002) A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. J Sched 5(5):379–394

    MathSciNet  Article  Google Scholar 

  42. 42.

    Wawrzynek J, Patterson D, Oskin M, Lu SL, Kozyrakis C, Hoe JC, Chiou D, Asanović K (2007) Ramp: research accelerator for multiple processors. IEEE Micro 2:46–57

    Article  Google Scholar 

  43. 43.

    Weichslgartner A, Heisswolf J, Zaib A, Wild T, Herkersdorf A, Becker J, Teich J (2015) Position paper: towards hardware-assisted decentralized mapping of applications for heterogeneous NOC architectures. In: ARCS 2015-The 28th international conference on proceedings of architecture of computing systems, VDE, pp 1–4

  44. 44.

    Wenisch TF, Wunderlich RE, Ferdman M, Ailamaki A, Falsafi B, Hoe JC (2006) Simflex: statistical sampling of computer system simulation. IEEE MICRO Spec Issue Comput Arch Simul Model 26(PARSA–ARTICLE–2007–001):19–31

    Google Scholar 

  45. 45.

    Wild T, Herkersdorf A, Lee GY (2005) TAPES—trace-based architecture performance evaluation with systemc. Des Autom Embed Syst 10(2–3):157–179

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniel Gregorek.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gregorek, D., Garcia-Ortiz, A. The Agamid design-space exploration framework. Des Autom Embed Syst 22, 293–314 (2018). https://doi.org/10.1007/s10617-018-9214-3

Download citation

Keywords

  • Many-core
  • Run-time management
  • Dedicated hardware
  • Design space exploration