Advertisement

Memory Analysis and Performance Modeling for HPC Applications on Embedded Hardware via Instruction Accurate Simulation

  • Alexander Ditter
  • Dominik Schoenwetter
  • Anton Kuzmin
  • Dietmar Fey
  • Vadym Aizinger
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 511)

Abstract

The efficient usage and development of embedded multi- and many-core systems is an important field of research for years and decades. In the last decade, utilizing embedded and especially low-power architectures for high performance computing (HPC) applications became an important part of research. The reason for this are the constantly increasing energy costs along with an increasing awareness of energy consumption in general. As suitable low-power HPC architectures are not yet available at a larger scale, simulation of new and appropriate architectures becomes an important factor for the success of low-power systems and clusters. In order to speed up simulation, at the cost of accuracy, different levels of abstraction were introduced. Currently the class of instruction accurate simulations seems to yield the best trade-off between speed and precision, especially when simulating complex multi- and many-core systems. In this paper we present our investigations about the accuracy of the state-of-the-art instruction accurate embedded multi- and many-core simulation environment Open Virtual Platforms (OVP) in comparison to an actual embedded hardware system from Altera. Our investigations include the actual usage of the same operating system running on both, the target hardware and the simulation as well as serial and parallel software benchmarks. We analyze the current accuracy of the simulation environment with respect to a performance model, based on the execution time of the simulation and the actual embedded hardware system. Using the instruction accurate simulation technology from OVP is for the simulation of embedded/low-power HPC hardware architectures and applications. Furthermore, we point out first steps towards further possibilities for obtaining a better performance model by the use of our simple memory model.

Keywords

Memory Access High Performance Computing Discontinuous Galerkin Real Hardware Many Integrate Core 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Köhler, C.: Enhancing Embedded Systems Simulation: A Chip-Hardware-in-the-Loop Simulation Framework. Vieweg+Teubner research. Vieweg+Teubner Verlag, Wiesbaden (2011)Google Scholar
  2. 2.
    Weaver, V.M., McKee, S.A.: Are cycle accurate simulations a waste of time? In: Proceedings of the 7th Workshop on Duplicating, Deconstructing, and Debunking, June 2008Google Scholar
  3. 3.
    Imperas Software Limited. Official Open Virtual Platforms Website. http://www.ovpworld.org/. Accessed 27 April 2015
  4. 4.
    Schoenwetter, D., Schneider, M., Fey, D.: A speed-up study for a parallelized white light interferometry preprocessing algorithm on a virtual embedded multiprocessor system. In: ARCS Workshops (ARCS), pp. 1–6, February 2012Google Scholar
  5. 5.
    Rajovic, N., Carpenter, P.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 40:1–40:12. ACM, New York (2013). http://doi.acm.org/10.1145/2503210.2503281
  6. 6.
    Göddeke, D., Komatitsch, D., Geveler, M., Ribbrock, D., Rajovic, N., Puzovic, N., Ramirez, A.: Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. J. Comput. Phys. 237, 132–150 (2013). http://dx.doi.org/10.1016/j.jcp.2012.11.031 CrossRefGoogle Scholar
  7. 7.
    Rajovic, N., Rico, A., Puzovic, N., Adeniyi-Jones, C., Ramirez, A.: Tibidabo: making the case for an ARM-based HPC System. Future Gener. Comput. Syst. 36, 322–334 (2014). http://www.sciencedirect.com/science/article/pii/S0167739X13001581 CrossRefGoogle Scholar
  8. 8.
    ITMC TU Dortmund. Official LiDO Website. https://www.itmc.uni-dortmund.de/dienste/hochleistungsrechnen/lido.html. Accessed 26 March 2015
  9. 9.
    Castro, M., Francesquini, E., Nguele, T.M., Mehaut, J.-F.: Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, IA3 2013, pp. 5:1–5:8. ACM, New York (2013). http://doi.acm.org/10.1145/2535753.2535757
  10. 10.
    Applegate, D., Bixby, R., Chvátal, V., Cook, W.: The Traveling Salesman Problem: A Computational Study: A Computational Study. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2011). http://books.google.de/books?id=zfIm94nNqPoC
  11. 11.
    KALRAY Corporation. Official KALRAY MPPA Processor Website. http://www.kalrayinc.com/kalray/products/#processors. Accessed 31 March 2015
  12. 12.
    NVIDIA Corporation. Official NVIDIA SECO Development Kit Website. https://developer.nvidia.com/seco-development-kit. Accessed 31 March 2015
  13. 13.
    Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovic, N., Ramirez, A.: Experiences with mobile processors for energy efficient HPC. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2013, pp. 464–468. EDA Consortium, San Jose (2013). http://dl.acm.org/citation.cfm?id=2485288.2485400
  14. 14.
    NVIDIA Corporation. Official NVIDIA Tegra 2 Website. http://www.nvidia.com/object/tegra-superchip.html. Accessed 27 March 2015
  15. 15.
    NVIDIA Corporation. Official NVIDIA Tegra 3 Website. http://www.nvidia.com/object/tegra-3-processor.html. Accessed 27 March 2015
  16. 16.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). http://doi.acm.org/10.1145/2024716.2024718 CrossRefGoogle Scholar
  17. 17.
    Bucy, J.S., Schindler, J., Schlosser, S.W., Ganger, G.R.: The DiskSim Simulation Environment Version 4.0 Reference Manual, May 2008Google Scholar
  18. 18.
    Rosenfeld, P., Cooper-Balis, E., Jacob, B.: DRAMSim2: a cycle accurate memory system simulator. Comput. Architect. Lett. 10(1), 16–19 (2011)CrossRefGoogle Scholar
  19. 19.
    Imperas Software Limited, OVP Guide to Using Processor Models, Imperas Buildings, North Weston, Thame, Oxfordshire, OX9 2HA, UK, January 2015, version 0.5, docs@imperas.comGoogle Scholar
  20. 20.
    Imperas Software Limited, OVPsim and Imperas CpuManager User Guide, Imperas Buildings, North Weston, Thame, Oxfordshire, OX9 2HA, UK, January 2015, version 2.3.7, docs@imperas.comGoogle Scholar
  21. 21.
    Altera Corporation. Cyclone V SoC Development Kit User Guide. https://www.altera.com/content/dam/altera-www/global/enUS/pdfs/literaure/ug/ugcvsocdevkit.pdf. Accessed 07 May 2015
  22. 22.
    Imperas Software Limited. Description of Altera Cyclone V SoC. http://www.ovpworld.org/library/wikka.php?wakka=AlteraCycloneVHPS. Accessed 29 April 2015
  23. 23.
    Janapsatya, A., Ignjatovic, A., Parameswaran, S., Henkel, J.: Instruction trace compression for rapid instruction cache simulation. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2007, pp. 803–808. EDA Consortium, San Jose (2007). http://dl.acm.org/citation.cfm?id=1266366.1266538
  24. 24.
    Hardman, J.: Official NAS Parallel Benchmarks Website. http://www.nas.nasa.gov/publications/npb.html. Accessed 23 April 2015
  25. 25.
    Dawson, C., Aizinger, V.: A discontinuous Galerkin method for three-dimensional shallow water equations. J. Sci. Comput. 22(1–3), 245–267 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Aizinger, V., Proft, J., Dawson, C., Pothina, D., Negjusse, S.: A three-dimensional discontinuous Galerkin model applied to the baroclinic simulation of Corpus Christi Bay. Ocean Dyn. 63(1), 89–113 (2013). https://www.math.fau.de/fileadmin/am1/users/aizinger/AizingerPDPN2013.pdf CrossRefGoogle Scholar
  27. 27.
    Cockburn, B., Shu, C.-W.: The local discontinuous Galerkin method for time-dependent convection-diffusion systems. SIAM J. Numer. Anal. 35(6), 2440–2463 (1998). http://dx.doi.org/10.1137/S0036142997316712 MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Branner, B.: The mandelbrot set. Proc. Symp. Appl. Math. 39, 75–105 (1989)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks - summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing 1991, pp. 158–165. IEEE (1991)Google Scholar
  30. 30.
    Genbrugge, D., Eeckhout, L.: Chip multiprocessor design space exploration through statistical simulation. IEEE Trans. Comput. 58(12), 1668–1681 (2009)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Kerrison, S., Eder, K.: Energy modeling of software for a hardware multithreaded embedded microprocessor. ACM Trans. Embed. Comput. Syst. 14(3), 56:1–56:25 (2015). http://doi.acm.org/10.1145/2700104 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alexander Ditter
    • 1
  • Dominik Schoenwetter
    • 1
  • Anton Kuzmin
    • 1
  • Dietmar Fey
    • 1
  • Vadym Aizinger
    • 2
  1. 1.Chair of Computer Science 3 (Computer Architecture)Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)ErlangenGermany
  2. 2.Chair of Applied Mathematics (AM1)Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)ErlangenGermany

Personalised recommendations