Advertisement

Cache-conscious off-line real-time scheduling for multi-core platforms: algorithms and implementation

  • Viet Anh NguyenEmail author
  • Damien Hardy
  • Isabelle Puaut
Article
  • 27 Downloads

Abstract

Most schedulability analysis techniques for multi-core architectures assume a single worst-case execution time (WCET) per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local instruction or data caches, for which the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the tasks that were executed immediately before the task under study. In this paper, we propose two scheduling techniques for multi-core architectures equipped with local instruction and data caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule, that takes benefit of cache reuse between pairs of consecutive tasks. We propose an exact method, using an integer linear programming formulation, as well as a heuristic method based on list scheduling. The efficiency of the techniques is demonstrated through an implementation of these cache-conscious schedules on a real multi-core hardware: a 16-core cluster of the Kalray MPPA-256, Andey generation. We point out implementation issues that arise when implementing the schedules on this particular platform. In addition, we propose strategies to adapt the schedules to the identified implementation factors. An experimental evaluation reveals that our proposed scheduling methods significantly reduce the length of schedules as compared to cache-agnostic scheduling methods. Furthermore, our experiments show that among the identified implementation factors, shared bus contention has the most impact.

Keywords

Real-time scheduling Cache-conscious schedules Schedule implementation Multi-core architectures ILP Static list scheduling 

Notes

Acknowledgements

The authors would like to thank Byron Hawkins and anonymous reviewers for their useful comments on this paper. This work was partially funded by European Unions Horizon 2020 research and innovation program under Grant Agreement No. 688131, Project Argo (http://www.argo-project.eu/), and by PIA project CAPACITES (Calcul Parall-le pour Applications Critiques en Temps et Sret), Reference P3425-146781.

References

  1. Abdallah L, Jan M, Ermont J, Fraboul C (2016) Reducing the contention experienced by real-time core-to-i/o flows over a tilera-like network on chip. In: 28th Euromicro conference on real-time systems, ECRTS 2016, Toulouse, France, July 5–8, vol 86–96Google Scholar
  2. Altmeyer S, Davis RI, Indrusiak L, Maiza C, Nelis V, Reineke J (2015) A generic and compositional framework for multicore response time analysis. In: International conference on real time and networks systems, RTNS ’15, pp 129–138Google Scholar
  3. Arnaud A, Puaut I (2006) Dynamic instruction cache locking in hard real-time systems. In: International conference on real-time networks and systems (RTNS), pp 1–10Google Scholar
  4. Bahn JH, Yang J, Bagherzadeh N (2008) Parallel FFT algorithms on network-on-chips. In: Fifth international conference on information technology: new generations (ITNG 2008), pp 1087–1093Google Scholar
  5. Becker M, Dasari D, Nikolic B, Akesson B, Nélis V, Nolte T (2016) Contention-free execution of automotive applications on a clustered many-core platform. In: 28th Euromicro conference on real-time systems, ECRTS, pp 14–24Google Scholar
  6. Calandrino JM, Anderson JH (2009) On the design and implementation of a cache-aware multicore real-time scheduler. In: 21st Euromicro conference on real-time systems, pp. 194–204Google Scholar
  7. Carle T, Djemal M, Potop-Butucaru D, de Simone R, Zhang Z (2014) Static mapping of real-time applications onto massively parallel processor arrays. In: Proceedings of the 2014 14th international conference on application of concurrency to system design, ACSD ’14, pp 112–121Google Scholar
  8. Chattopadhyay S, Roychoudhury A, Mitra T (2010) Modeling shared cache and bus in multi-cores for timing analysis. In: Proceedings of the 13th international workshop on software & compilers for embedded systems, SCOPES ’10, pp 6:1–6:10Google Scholar
  9. Dasari D, Nélis V (2012) An analysis of the impact of bus contention on the WCET in multicores. In: Min G, Hu J, Liu LC, Yang LT, Seelam S, Lefèvre L (eds) 14th IEEE international conference on high performance computing and communication & 9th IEEE international conference on embedded software and systems, HPCC-ICESS 2012, Liverpool, UK, June 25–27, 2012. IEEE Computer Society, pp 1450–1457.  https://doi.org/10.1109/HPCC.2012.212
  10. Dasari D, Andersson B, Nélis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of cots-based multicores considering the contention on the shared memory bus. In: IEEE 10th international conference on trust, security and privacy in computing and communications, TrustCom 2011, Changsha, China, 16–18 November, 2011. IEEE Computer Society, pp 1068–1075.  https://doi.org/10.1109/TrustCom.2011.146
  11. Davis RI, Burns A (2011) A survey of hard real-time scheduling for multiprocessor systems. ACM Comput Surv 43(4):35:1–35:44CrossRefzbMATHGoogle Scholar
  12. Ding H, Liang Y, Mitra T (2013) Shared cache aware task mapping for WCRT minimization. In: 8th Asia and south Pacific design automation conference, ASP-DAC, pp 735–740Google Scholar
  13. Dupont de Dinechin B, van Amstel D, Poulhiès M, Lager G (2014) Time-critical computing on a single-chip massively parallel processor. In: Proceedings of the conference on design, automation & test in Europe, DATE ’14, pp 97:1–97:6Google Scholar
  14. Fernandez G, Abella J, Quiñones E, Rochange C, Vardanega T, Cazorla FJ (2014) Contention in multicore hardware shared resources: understanding of the state of the art. In: 14th international workshop on worst-case execution time analysis, OpenAccess series in informatics (OASIcs), pp 31–42Google Scholar
  15. Geer D (2005) Industry trends: chip makers turn to multicore processors. Computer 38:11–13CrossRefGoogle Scholar
  16. Guan N, Stigge M, Yi W, Yu G (2009) Cache-aware scheduling and analysis for multicores. In: Proceedings of the seventh ACM international conference on embedded software, EMSOFT ’09, pp 245–254Google Scholar
  17. Gurobi Optimization, Inc. (2015) Gurobi optimizer reference manual. Gurobi Optimization, Inc., OregonGoogle Scholar
  18. Hardy D, Piquet T, Puaut I (2009) Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In: Proceedings of the 30th IEEE real-time systems symposium, RTSS, pp 68–77Google Scholar
  19. Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans Comput 33(11):1023–1029CrossRefGoogle Scholar
  20. Kelter T, Falk H, Marwedel P, Chattopadhyay S, Roychoudhury A (2014) Static analysis of multi-core tdma resource arbitration delays. Real-Time Syst 50(2):185–229CrossRefzbMATHGoogle Scholar
  21. Kim H, de Niz D, Andersson B, Klein MH, Mutlu O, Rajkumar R (2014) Bounding memory interference delay in cots-based multi-core systems. In: 20th IEEE real-time and embedded technology and applications symposium, RTAS 2014, Berlin, Germany, April 15–17, 2014. IEEE Computer Society, pp 145–154.  https://doi.org/10.1109/RTAS.2014.6925998
  22. Kim H, de Niz D, Andersson B, Klein MH, Mutlu O, Rajkumar R (2016) Bounding and reducing memory interference in cots-based multi-core systems. Real-Time Syst 52(3):356–395.  https://doi.org/10.1007/s11241-016-9248-1 CrossRefGoogle Scholar
  23. Kwok YK, Ahmad I (1999a) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59:381–422Google Scholar
  24. Kwok YK, Ahmad I (1999b) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471Google Scholar
  25. Li YTS, Malik S (1995) Performance analysis of embedded software using implicit path enumeration. In: Proceedings of the 32nd annual ACM/IEEE design automation conference, pp 456–461Google Scholar
  26. Liang Y, Ding H, Mitra T, Roychoudhury A, Li Y, Suhendra V (2012) Timing analysis of concurrent programs running on shared cache multi-cores. Real-time Syst 48(6):638–680CrossRefzbMATHGoogle Scholar
  27. Maaita A, Pont MJ (2005) Using “planned pre-emption” to reduce levels of task jitter in a time-triggered hybrid scheduler. In: Proceedings of the second UK embedded forum (Birmingham, UK), pp 18–35Google Scholar
  28. Martinez S, Hardy D, Puaut I (2017) Quantifying wcet reduction of parallel applications by introducing slack time to limit resource contention. In: Proceedings of the 25th international conference on real-time networks and systems, RTNS 2017, Grenoble, France, October 04–06, 2017, pp 188–197Google Scholar
  29. Nélis V, Yomsi PM, Pinho LM, Fonseca JC, Bertogna M, Quiñones E, Vargas R, Marongiu A (2014) The challenge of time-predictability in modern many-core architectures. In: 14th international workshop on worst-case execution time analysis, OpenAccess series in informatics (OASIcs), vol 39, pp 63–72Google Scholar
  30. Nélis V, Yomsi PM, Pinho LM (2016) The variability of application execution times on a multi-core platform. In: 16th international workshop on worst-case execution time analysis (WCET 2016), OpenAccess series in informatics (OASIcs), pp 1–11Google Scholar
  31. Nemer F, Cassé H, Sainrat P, Awada A (2007) Improving the worst-case execution time accuracy by inter-task instruction cache analysis. In: IEEE second international symposium on industrial embedded systems, SIES, pp 25–32Google Scholar
  32. Nemhauser GL, Wolsey LA (1999) Integer and combinatorial optimization. Wiley interscience series in discrete mathematics and optimization. Wiley, New YorkGoogle Scholar
  33. Nguyen VA, Hardy D, Puaut I (2017) Cache-conscious offline real-time task scheduling for multi-core processors. In: 29th Euromicro conference on real-time systems (ECRTS 2017), pp 14:1–14:22Google Scholar
  34. Pellizzoni R, Betti E, Bak S, Yao G, Criswell J, Caccamo M, Kegley R (2011) A predictable execution model for cots-based embedded systems. In: Proceedings of the 2011 17th IEEE real-time and embedded technology and applications symposium, RTAS ’11, pp 269–279Google Scholar
  35. Perret Q, Maurère P, Noulard E, Pagetti C, Sainrat P, Triquet B (2016a) Mapping hard real-time applications on many-core processors. In: Proceedings of the 24th international conference on real-time networks and systems, RTNS ’16. ACM, pp 235–244Google Scholar
  36. Perret Q, Maurère P, Noulard E, Pagetti C, Sainrat P, Triquet B (2016b) Temporal isolation of hard real-time applications on many-core processors. In: 2016 IEEE real-time and embedded technology and applications symposium (RTAS), pp 37–47Google Scholar
  37. Phatrapornnant T, Pont MJ (2006) Reducing jitter in embedded systems employing a time-triggered software architecture and dynamic voltage scaling. IEEE Trans Comput 55(2):113–124.  https://doi.org/10.1109/TC.2006.29 CrossRefGoogle Scholar
  38. Phavorin G, Richard P, Goossens J, Chapeaux T, Maiza C (2015) Scheduling with preemption delays: anomalies and issues. In: Proceedings of the 23rd international conference on real time and networks systems, RTNS ’15, pp 109–118Google Scholar
  39. Potop-Butucaru D, Puaut I (2013) Integrated worst-case execution time estimation of multicore applications. In: 13th international workshop on worst-case execution time analysis, vol 30, pp 21–31Google Scholar
  40. Puaut I, Decotigny D (2002) Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In: Proceedings of the 23rd IEEE real-time systems symposium, pp 114–123Google Scholar
  41. Puffitsch W, Noulard E, Pagetti C (2015) Off-line mapping of multi-rate dependent task sets to many-core platforms. Real-Time Syst 51(5):526–565CrossRefzbMATHGoogle Scholar
  42. Rihani H, Moy M, Maiza C, Davis RI, Altmeyer S (2016) Response time analysis of synchronous data flow programs on a many-core processor. In: Proceedings of the 24th international conference on real-time networks and systems, RTNS ’16, pp 67–76Google Scholar
  43. Rouxel B, Derrien S, Puaut I (2017) Tightening contention delays while scheduling parallel applications on multi-core architectures. ACM Trans Embed Comput Syst 16:164:1–164:20CrossRefGoogle Scholar
  44. Sodani A, Gramunt R, Corbal J, Kim HS, Vinod K, Chinthamani S, Hutsell S, Agarwal R, Liu YC (2016) Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36:34–46CrossRefGoogle Scholar
  45. Suhendra V, Raghavan C, Mitra T (2006) Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In: International conference on compilers, architecture and synthesis for embedded systems, CASES ’06, pp 401–410Google Scholar
  46. Tendulkar P, Poplavko P, Galanommatis I, Maler O (2014) Many-core scheduling of data parallel applications using SMT solvers. In: 17th Euromicro conference on digital system design, DSD, pp 615–622Google Scholar
  47. Tessler C, Fisher N (2016) BUNDLE: real-time multi-threaded scheduling to reduce cache contention. In: IEEE real-time systems symposium, RTSS, pp 279–290Google Scholar
  48. Thies W, Amarasinghe S (2010) An empirical characterization of stream programs and its implications for language and compiler design. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques, PACT ’10, pp 365–376Google Scholar
  49. Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237CrossRefGoogle Scholar
  50. Ward BC, Thekkilakattil A, Anderson JH (2014) Optimizing preemption-overhead accounting in multiprocessor real-time systems. In: Proceedings of the 22nd international conference on real-time networks and systems, RTNS ’14, pp 235:235–235:243Google Scholar
  51. Wentzlaff D, Griffin P, Hoffmann H, Bao L, Edwards B, Ramey C, Mattina M, Miao CC, Brown JF III, Agarwal A (2007) On-chip interconnection architecture of the tile processor. IEEE Micro 27:15–31CrossRefGoogle Scholar
  52. Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley D, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F, Puaut I, Puschner P, Staschulat J, Stenström P (2008) The worst-case execution-time problem: overview of methods and survey of tools. ACM Trans Embed Comput Syst 7(3):36:1–36:53Google Scholar
  53. Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans. CAD Integr Circ Syst 28(7):966–978CrossRefGoogle Scholar
  54. Yao G, Pellizzoni R, Bak S, Betti E, Caccamo M (2012) Memory-centric scheduling for multicore hard real-time systems. Real-Time Syst 48(6):681–715CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Univ Rennes, Inria, CNRS, IRISARennesFrance
  2. 2.IRT Saint ExupéryToulouseFrance

Personalised recommendations