Abstract
Estimating execution time is a crucial task during the development of safety-critical embedded systems. Processor simulation or emulation tools on various abstraction levels offer a trade-off between accuracy and runtime. Typically, this requires detailed knowledge of the processor architecture and high manual effort to construct adequate models. In this paper, we explore how deep learning may be used as an alternative approach for building processor performance models. First, we describe how to obtain training data from recorded execution traces. Next, we evaluate various neural network architectures and hyperparameter values. The accuracy of the best network variants is finally compared to two simple baseline models and a mechanistic model based on the QEMU emulator. As an outcome of this evaluation, a model based on the Wavenet architecture is identified, which outperforms all other approaches by achieving a mean absolute percentage error of only 1.63%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp. 265–283 (2016)
Adileh, A., González-Álvarez, C., Ruiz, J.M.D.H., Eeckhout, L.: Racing to hardware-validated simulation. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58–67. IEEE (2019)
Bellard, F.: QEMU, a fast and portable dynamic translator. In: USENIX Annual Technical Conference, FREENIX Track, vol. 41, p. 46. California, USA (2005)
Buber, E., Diri, B.: Performance analysis and cpu vs gpu comparison for deep learning. In: 2018 6th International Conference on Control Engineering Information Technology (CEIT), pp. 1–6 (2018). https://doi.org/10.1109/CEIT.2018.8751930
Burger, D., Austin, T.M.: The simplescalar tool set, version 2.0. ACM SIGARCH computer architecture news 25(3), 13–25 (1997)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Eeckhout, L.: Computer architecture performance evaluation methods. Synthesis Lectures Comput. Architecture 5(1), 1–145 (2010)
El Hihi, S., Bengio, Y.: Hierarchical recurrent neural networks for long-term dependencies. In: Advances in Neural Information Processing Systems, pp. 493–499 (1996)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. (TOCS) 27(2), 1–37 (2009)
Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, vol. 3, pp. 189–194. IEEE (2000)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.orgwww.deeplearningbook.org
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hofmann, J., Alappat, C.L., Hager, G., Fey, D., Wellein, G.: Bridging the architecture gap: abstracting performance-relevant properties of modern server processors. arXiv preprint arXiv:1907.00048 (2019)
Hönig, T., Herzog, B., Schröder-Preikschat, W.: Energy-demand estimation of embedded devices using deep artificial neural networks. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. SAC ’19 (2019). https://doi.org/10.1145/3297280.3297338
Infineon Technologies AG: AURIX TC27x D-Step User’s Manual, December 2014
Infineon Technologies AG: AURIX TC3xx User’s Manual, February 2021
Kang, S., Yoo, D., Ha, S.: TQSIM: a fast cycle-approximate processor simulator based on QEMU. J. Syst. Architect. 66–67, 33–47 (2016). https://doi.org/10.1016/j.sysarc.2016.04.012
Lauterbach GmbH: TriCore Debugger and Trace (2021)
Luo, Y., Li, Y., Yuan, X., Yin, R.: QSim: framework for cycle-accurate simulation on out-of-order processors based on QEMU. In: 2012 Second International Conference on Instrumentation, Measurement, Computer, Communication and Control, pp. 1010–1015 (2012). https://doi.org/10.1109/IMCCC.2012.397
Mendis, C., Renda, A., Amarasinghe, D., Carbin, M.: Ithemal: accurate, portable and fast basic block throughput estimation using deep neural networks. In: Proceedings of the 36th International Conference on Machine Learning (2019)
Nicolescu, G., Mosterman, P.J.: Model-based design for embedded systems. Crc Press (2018)
Nussbaum, S., Smith, J.E.: Modeling superscalar processors via statistical simulation. In: Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 15–24. IEEE (2001)
Oord, A.v.d., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Powell, D.C., Franke, B.: Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators. In: Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, pp. 315–324 (2009)
Rachuj, S., Fey, D., Reichenbach, M.: Impact of performance estimation on fast processor simulators. In: Song, H., Jiang, D. (eds.) SIMUtools 2020. LNICST, vol. 370, pp. 79–93. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72795-6_7
Reichenbach, M., Knödtel, J., Rachuj, S., Fey, D.: RISC-V3: a RISC-V compatible CPU with a data path based on redundant number systems. IEEE Access 9, 43684–43700 (2021). https://doi.org/10.1109/ACCESS.2021.3063238
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Acknowledgement
We would like to express our gratitude to Elektronische Fahrwerksysteme GmbH for supporting this work. Furthermore, we are grateful to Lauterbach GmbH for their loan of an Off-chip Serial Trace device. This project would not have been feasible without such a device.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fricke, F. et al. (2022). Application Runtime Estimation for AURIX Embedded MCU Using Deep Learning. In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-15074-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15073-9
Online ISBN: 978-3-031-15074-6
eBook Packages: Computer ScienceComputer Science (R0)