Skip to main content
Log in

A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs

Strategy and Two Case Studies for the Parallelization of Hard Real-Time Systems

International Journal of Parallel Programming Aims and scope Submit manuscript

Cite this article


Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring of these industrial applications is very expensive, often due to high certification efforts. Therefore, a strategy for a systematic parallelization of legacy code is needed. We present a parallelization approach for hard real-time systems, which ensures a high reusage of legacy code and preserves timing analysability. To show its applicability, we apply it on the core algorithm of an avionics application as well as on the control program of a large construction machine. We create models of the legacy programs showing the potential of parallelism, optimize them and change the source codes accordingly. The parallelized applications are placed on a predictable multi-core processor with up to 18 cores. For evaluation, we compare the worst case execution times and their speedups. Furthermore, we analyse limitations coming up at the parallelization process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others


  1. Due to e. g. synchronization overheads, some parts of the program might take longer. Therefore, it is important to keep an eye on periods and deadlines and to check if everything still works fine.

  2. For example, the meta-pattern in OPL (see requires the specification of name, problem, context, forces, solution, invariants, an example, known uses, related patterns, references, and author.

  3. See online version:

  4. Homepage:

  5. However, the XML example input file of our speedup approximation and parameter optimization tool gives a clue of the complete APD. It can be found at

  6. Open-source, Homepage:

  7. Part of the Rapita Verification Suite (RVS), Homepage:

  8. Besides Figs. 2 and 13 these pins are never shown to keep the diagrams compact.

  9. A configuration specifies which program parts should be executed in parallel, how many cores are utilized and which variables have to be synchronized.

  10. The tool is open source and can be downloaded at

  11. Our Timing-analyzable Algorithmic Skeleton (TAS) library is open source (LGPLv3 licence) and can be downloaded at

  12. Utilizing our PDPs, this can be realized with the Data Parallelism PDP, e. g. two threads, each doing SUM on half of the matrices. Finally, one thread would have to do the final SUM of the two resulting matrices.

  13. Original Homepage: parMERASA simulator (open source under BSD licence):

  14. For more than 1 core, it always has to be assumed that all memory requests of all other cores are processed by the memory controller before the own request is handled. This results in worst case memory access times of 54 cycles for 4 cores, 96 cycles for 8 cores and 138 cycles for 12 cores.

  15. At the pipeline PDP, data is moved to the next stage when all stages have finished their work. Therefore, every time the largest stage is finished, one result matrix comes out of the pipeline.

  16. In the sequential version, all activities have to be processed to get one result matrix. Then the next set of input data can be processed.

  17. Many components check the armrest switch because for security reasons they are not allowed to run when there is no driver sitting in the cab.

  18. Interrupts take place every 1 ms because this is the smallest period of periodic tasks.

  19. They have a high degree of independence–however, the control application of the foundation crane contains no components which are completely independent of all others since they all share the same data structures, e. g. for accessing sensors and actuators.

  20. Homepage:

  21. Homepage:

  22. Unfortunately, it was not possible to determine WCETs for tasks in the scheduler since RapiTime supports only analyzing functions in the program flow and OTAWA did not work on the TriCore platform because no detailed timing model is available.

  23. Nevertheless, we are aware that the OETs may be different on the target platform. Our results show that two cores are nearly filled by the periodic tasks now because of lower clock frequency and synchronization overheads.

  24. Download link:

  25. Fetch and increment barriers.

  26. Alternatively a get method can return a copy of the structure which can be kept locally for reading operations. However, if the structure is modified, consistency can be an issue when the local copy is written back.

  27. Available as open-source software:

  28. Our tool can be downloaded at

  29. The real WCET cannot be estimated, only a safe upper bound, see [59].

  30. Configurations with more cores usually have more parallel parts requiring more shared variables to be synchronized.

  31. There is also one speedup of 5.0—Kempf et al. describe that this is a benchmark testing different components of the system. The parallelized version tests all components simultanously instead of successively.


  1. Abella, J., Hardy, D., Puaut, I., Quiñones, E., Cazorla, F.: On the comparison of deterministic and probabilistic WCET estimation techniques. In: 26th Euromicro Conference on Real-Time Systems (ECRTS), pp. 266–275 (2014). doi:10.1109/ECRTS.2014.16

  2. Altmeyer, S., Cucu-Grosjean, L., Davis, R.: Static probabilistic timing analysis for real-time systems using random replacement caches. Real Time Syst. 51(1), 77–123 (2015). doi:10.1007/s11241-014-9218-4

    Article  MATH  Google Scholar 

  3. Amdahl, G.M.: Validity of the single processor approach to achieving large-scale computing capabilities. AFIPS Conference Proceedings, vol 30, pp. 483–485 (1967). doi:10.1145/1465482.1465560

  4. Audsley, N., Tindell, K., Burns, A.: The end of the line for static cyclic scheduling? In: Fifth Euromicro Workshop on Real-Time Systems. Proceedings, pp. 36–41 (1993). doi:10.1109/EMWRT.1993.639042

  5. Ballabriga, C., Cassé, H., Rochange, C., Sainrat, P.: OTAWA: an open toolbox for adaptive WCET analysis. In: Software Technologies for Embedded and Ubiquitous Systems (Lecture Notes in Computer Science), vol 6399, pp. 35–46. Springer, Berlin (2011). doi:10.1007/978-3-642-16256-5_6

  6. BAUER Maschinen GmbH: MC 128 foundation crane datasheet (2013).

  7. Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with eSkel. In: Euro-Par 2005 Parallel Processing (Lecture Notes in Computer Science), vol 3648, pp. 761–770. Springer, Berlin (2005). doi:10.1007/11549468_83

  8. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999). doi:10.1145/324133.324234

    Article  MathSciNet  MATH  Google Scholar 

  9. Bonenfant, A., Broster, I., Ballabriga, C., Bernat, G., Cassé, H., Houston, M., Merriam, N., de Michiel, M., Rochange, C., Sainrat, P.: Coding Guidelines for WCET Analysis Using Measurement-Based and Static Analysis Techniques. Technical Report IRIT/RR-2010-8-FR, IRIT-Institut de recherche en informatique de Toulouse (2010)

  10. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004). doi:10.1016/j.parco.2003.12.002

    Article  Google Scholar 

  11. Cordes, D., Marwedel, P.: Multi-objective aware extraction of task-level parallelism using genetic algorithms. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 394–399 (2012). doi:10.1109/DATE.2012.6176503

  12. Cordes, D., Marwedel, P., Mallik, A.: Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming. In: 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 267–276 (2010). doi:10.1145/1878961.1879009

  13. Cucu-Grosjean, L., Santinelli, L., Houston, M., Lo, C., Vardanega, T., Kosmidis, L., Abella, J., Mezzetti, E., Quiñones, E., Cazorla, F.: Measurement-based probabilistic timing analysis for multi-path programs. In: 24th Euromicro Conference on Real-Time Systems (ECRTS), pp. 91–101 (2012). doi:10.1109/ECRTS.2012.31

  14. Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: QUAFF: efficient C++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006). doi:10.1016/j.parco.2006.06.001

    Article  Google Scholar 

  15. Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)

    MATH  Google Scholar 

  16. Fraboulet, A., Risset, T., Scherrer, A.: Computer systems: architectures, modeling, and simulation: third and fourth international workshops, SAMOS 2004, Samos, Greece, July 21–23, 2004 and July 19–21, 2004. In: Proceedings, Chap. Cycle Accurate Simulation Model Generation for SoC Prototyping, pp. 453–462. Springer, Berlin (2004). doi:10.1007/978-3-540-27776-7_47

  17. Gebhard, G., Cullmann, C., Heckmann, R.: Software structure and WCET predictability. In: Bringing Theory to Practice: Predictability and Performance in Embedded Systems, vol 18, pp. 1–10. Dagstuhl, Germany (2011). doi:10.4230/OASIcs.PPES.2011.1

  18. Gerdes, M., Jahr, R., Ungerer, T.: parMERASA Pattern Catalogue: Timing Predictable Parallel Design Patterns. Technical Report 2013-11, University of Augsburg (2013)

  19. Gerdes, M., Kluge, F., Ungerer, T., Rochange, C., Sainrat, P.: Time analysable synchronisation techniques for parallelised hard real-time applications. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 671–676 (2012). doi:10.1109/DATE.2012.6176555

  20. Gerdes, M., Wolf, J., Guliashvili, I., Ungerer, T., Houston, M., Bernat, G., Schnitzler, S., Regler, H.: Large drilling machine control code—Parallelisation and WCET speedup. In: 6th IEEE International Symposium on Industrial Embedded Systems (SIES), pp. 91–94 (2011). doi:10.1109/SIES.2011.5953688

  21. González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software: Practice and Experience 40(12), 1135–1160 (2010). doi:10.1002/spe.1026

  22. Gustavsson, A., Gustafsson, J., Lisper, B.: Toward static timing analysis of parallel software. In: 12th International Workshop on Worst-Case Execution Time Analysis, vol 23, pp. 38–47. Dagstuhl, Germany (2012). doi:10.4230/OASIcs.WCET.2012.38

  23. Herlihy, M.: Wait-free synchronization. ACM Trans. Progr. Lang. Syst.: TOPLAS 13(1), 124–149 (1991). doi:10.1145/114005.102808

    Article  Google Scholar 

  24. Herlihy, M., Luchangco, V., Moir, M.: Obstruction-free synchronization: double-ended queues as an example. In: Proceedings of the 23rd International Conference on Distributed Computing Systems, pp. 522–529. IEEE (2003). doi:10.1109/ICDCS.2003.1203503

  25. Infineon: AURIX—TC27x B-Step, 32-bit Single-Chip Micro-controller. User’s Manual, v14.1

  26. Jahr, R., Frieb, M., Gerdes, M., Ungerer, T.: Model-based parallelization and optimization of an industrial control code. In: Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme X, Schloss Dagstuhl, Germany, 2014, Tagungsband Modellbasierte Entwicklung eingebetteter Systeme, pp. 63–72. fortiss GmbH, München, Schloss Dagstuhl (2014).

  27. Jahr, R., Gerdes, M., Ungerer, T.: On efficient and effective model-based parallelization of hard real-time applications. In: Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme IX, Schloss Dagstuhl, Germany, 2013, Tagungsband Modellbasierte Entwicklung eingebetteter Systeme, pp. 50–59. Fortiss GmbH, München, Schloss Dagstuhl (2013).

  28. Jahr, R., Gerdes, M., Ungerer, T.: A pattern-supported parallelization approach. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ’13, pp. 53–62. ACM, New York (2013). doi:10.1145/2442992.2442998

  29. Jahr, R., Gerdes, M., Ungerer, T., Ozaktas, H., Rochange, C., Zaykov, P.: Effects of structured parallelism by parallel design patterns on embedded hard real-time systems. In: IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pp. 1–10 (2014). doi:10.1109/RTCSA.2014.6910546

  30. Jahr, R., Stegmeier, A., Kiefhaber, R., Frieb, M., Ungerer, T.: User Manual for the Optimization and WCET Analysis of Software with Timing Analyzable Algorithmic Skeletons. Technical Report no. 2014-05, University of Augsburg (2014)

  31. Kehr, S., Quiñones, E., Böddeker, B., Schäfer, G.: Parallel execution of AUTOSAR legacy applications on multicore ECUs with timed implicit communication. In: 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). San Francisco, USA (2015). doi:10.1145/2744769.2744889

  32. Kempf, S., Veldema, R., Philippsen, M.: Is there hope for automatic parallelization of legacy industry automation applications? In: Gesellschaft für Informatik e.V. (ed.) Proceedings of the 24th Workshop on Parallel Systems and Algorithms (PARS), pp. 80–89 (2011).

  33. Keutzer, K., Massingill, B.L., Mattson, T.G., Sanders, B.A.: A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, ParaPLoP ’10, pp. 9:1–9:8. ACM, New York (2010). doi:10.1145/1953611.1953620

  34. Liu, X., Zhou, J., Zhang, D., Shen, Y., Guo, M.: A parallel skeleton library for embedded multicores. In: 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 65–73. IEEE (2010). doi:10.1109/ICPPW.2010.21

  35. Lukas, R.G.: Dynamic compaction. Geotechnical Engineering Circular No. 1(FHWA-SA-95-037), 1–97 (1995).

  36. Massingill, B.L., Mattson, T.G., Sanders, B.A.: Patterns for parallel application programs. In: Proceedings of the Sixth Pattern Languages of Programs Workshop (PLoP), Allerton Park in Monticello, IL (1999).

  37. Mattson, T.G., Sanders, B.A., Massingill, B.L.: Patterns for Parallel Programming, 1st edn. Addison-Wesley Professional, Boston, MA (2004)

  38. Meade, A., Buckley, J., Collins, J.J.: Challenges of evolving sequential to parallel code: an exploratory review. In: Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th Annual ERCIM Workshop on Software Evolution, IWPSE-EVOL ’11, pp. 1–5. ACM, New York (2011). doi:10.1145/2024445.2024447

  39. Metzlaff, S., Mische, J., Ungerer, T.: A real-time capable many-core model. In: Proceedings of 32nd IEEE Real-Time Systems Symposium: Work-in-Progress Session, pp. 21–24. Vienna, Austria (2011).

  40. OMG Unified Modeling Language™(OMG UML), Version 2.5. Standardization Document (2015).

  41. Ozaktas, H., Rochange, C., Sainrat, P.: Automatic WCET analysis of real-time parallel applications. In: 13th International Workshop on Worst-Case Execution Time Analysis, vol 30, pp. 11–20. Dagstuhl, Germany (2013). doi:10.4230/OASIcs.WCET.2013.11

  42. Panić, M., Kehr, S., Quiñones, E., Böddecker, B., Abella, J., Cazorla, F.J.: RunPar: an allocation algorithm for automotive applications exploiting runnable parallelism in multicores. In: 2014 ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). New Delhi, India (2014). doi:10.1145/2656075.2656096

  43. Predictable parMERASA Multicore Processor. Deliverable 5.3 of the parMERASA Project (2013).

  44. Puschner, P., Schoeberl, M.: On composable system timing, task timing, and WCET analysis. In: R. Kirner (ed.) 8th International Workshop on Worst-Case Execution Time Analysis (WCET’08) (OpenAccess Series in Informatics (OASIcs)), vol 8. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2008). doi:10.4230/OASIcs.WCET.2008.1662. Also published in print by Austrian Computer Society (OCG) with ISBN 978-3-85403-237-3

  45. Quinton, S., Bone, T.T., Hennig, J., Neukirchner, M., Negrean, M., Ernst, R.: Typical worst case response-time analysis and its use in automotive network design. In: Proceedings of the 51st Annual Design Automation Conference, DAC ’14, pp. 44:1–44:6. ACM, New York (2014). doi:10.1145/2593069.2602977

  46. Quinton, S., Hanke, M., Ernst, R.: Formal analysis of sporadic overload in real-time systems. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 515–520 (2012). doi:10.1109/DATE.2012.6176523

  47. Rapita Systems: RapiTime explained. White Paper MC-WP-001-17,

  48. Report on support of tools for case studies. Deliverable 3.12 of the parMERASA Project (2014).

  49. Rochange, C., Bonenfant, A., Sainrat, P., Gerdes, M., Wolf, J., Ungerer, T., Petrov, Z., Mikulu, F.: WCET analysis of a parallel 3D multigrid solver executed on the MERASA multi-core. In: 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2010), vol 15, pp. 90–100. Dagstuhl, Germany (2010). doi:10.4230/OASIcs.WCET.2010.90

  50. Saifullah, A., Agrawal, K., Lu, C., Gill, C.: Multi-core real-time scheduling for generalized parallel task models. In: IEEE 32nd Real-Time Systems Symposium (RTSS), pp. 217–226 (2011). doi:10.1109/RTSS.2011.27

  51. Schlingmann, S., Garbade, A., Weis, S., Ungerer, T.: Connectivity-sensitive algorithm for task placement on a many-core considering faulty regions. In: 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 417–422 (2011). doi:10.1109/PDP.2011.58

  52. Sensortechnik Wiedemann GmbH: ESX-3XL. Data Sheet. (2014)

  53. Sérot, J., Ginhac, D.: Skeletons for parallel image processing: an overview of the SKIPPER project. Parallel Comput. 28(12), 1685–1708 (2002). doi:10.1016/S0167-8191(02)00189-8

    Article  MATH  Google Scholar 

  54. Stegmeier, A., Frieb, M., Jahr, R., Ungerer, T.: Algorithmic skeletons for parallelization of embedded real-time systems. In: 3rd Workshop on High-Performance and Real-time Embedded Systems (HiRES) (2015).

  55. Ungerer, T., Bradatsch, C., Frieb, M., Kluge, F., Mische, J., Stegmeier, A., Jahr, R., Gerdes, M., Zaykov, P., Matusova, L., Li, Z.J.J., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Cassé, H., Bonenfant, A., Sainrat, P., Lay, N., George, D., Broster, I., Quiñones, E., Panić, M., Abella, J., Hernandez, C., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: Parallelizing industrial hard real-time applications for the parMERASA multi-core. Trans. Embed. Comput. Syst.: TECS (2016) (To appear)

  56. Ungerer, T., Bradatsch, C., Gerdes, M., Kluge, F., Jahr, R., Mische, J., Fernandes, J., Zaykov, P., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Casse, H., Bonenfant, A., Sainrat, P., Broster, I., Lay, N., George, D., Quiñones, E., Panić, M., Abella, J., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: parMERASA—multi-core execution of parallelised hard real-time applications supporting analysability. In: 2013 Euromicro Conference on Digital System Design (DSD), pp. 363–370 (2013). doi:10.1109/DSD.2013.46

  57. Ungerer, T., Bradatsch, C., Gerdes, M., Kluge, F., Jahr, R., Mische, J., Stegmeier, A., Frieb, M., Fernandes, J., Zaykov, P., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Casse, H., Bonenfant, A., Sainrat, P., Broster, I., Lay, N., George, D., Quiñones, E., Panić, M., Abella, J., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: Experiences and results of parallelisation of industrial hard real-time applications for the parMERASA multi-core. In: 3rd Workshop on High-Performance and Real-Time Embedded Systems (HiRES) (2015).

  58. Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’09, pp. 75–84. ACM, New York (2009). doi:10.1145/1504176.1504189

  59. Wilhelm, R., Engblom, J., Aandreas, E., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenström, P.: The worst-case execution time problem-overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7(3), 36:1–36:53 (2008). doi:10.1145/1347375.1347389

    Article  Google Scholar 

Download references


The research leading to these results has received funding from the European Union Seventh Framework Programme under the name parMERASA and Grant Agreement No. 287519.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Martin Frieb.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frieb, M., Jahr, R., Ozaktas, H. et al. A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs. Int J Parallel Prog 44, 1296–1336 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: