Abstract
Performance analysis of parallel applications is commonly based on execution traces that might be investigated through visualization techniques. The weak scalability of such techniques appears when traces get larger both in time (many events registered) and space (many processing elements), a very common situation for current large-scale HPC applications. In this paper we present an approach to tackle such scenarios in order to give a correct overview of the behavior registered in very large traces. Two configurable and controlled aggregation-based techniques are presented: one based exclusively on the temporal aggregation, and another that consists in a spatiotemporal aggregation algorithm. The paper also details the implementation and evaluation of these techniques in Ocelotl, a performance analysis and visualization tool that overcomes the current graphical and interpretation limitations by providing a concise overview registered on traces. The experimental results show that Ocelotl helps in detecting quickly and accurately anomalies in 8 GB traces containing up to 200 million of events.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This number is configurable through the GUI in Ocelotl.
References
Chassin de Kergommeaux, J.: Pajé, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Comput. 26(10), 1253–1274 (2000)
Dosimont, D., Lamarche-Perrin, R., Schnorr, L.M., Huard, G., Vincent, J.M.: A spatiotemporal data aggregation technique for performance analysis of large-scale execution traces. In: Proceedings of the 2014 IEEE International Conference on Cluster Computing (CLUSTER’14), Madrid (2014)
Dosimont, D., Pagano, G., Huard, G., Marangozova-Martin, V., Vincent, J.M.: Efficient analysis methodology for huge application traces. In: Proceedings of the 2014 International Conference on High Performance Computing & Simulation (HPCS), Bologna (2014)
Dosimont, D., Schnorr, L.M., Huard, G., Vincent, J.M.: A trace macroscopic description based on time aggregation. Technical report HAL RR-8524 (2014)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set. In: Tools for High Performance Computing, pp. 139–155. Springer, Berlin (2012)
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lamarche-Perrin, R., Demazeau, Y., Vincent, J.M.: How to build the best macroscopic description of your multi-agent system? In: Demazeau, Y., Ishida, T. (eds.) Proceedings of the 11th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS’13), Salamanca. LNCS/LNAI, vol. 7879, pp. 157–169. Springer, Berlin/Heidelberg (2013)
Lamarche-Perrin, R., Demazeau, Y., Vincent, J.M.: The best-partitions problem: how to build meaningful aggregations. In: Proceedings of the 2013 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Atlanta, pp. 399–404 (2013)
Lamarche-Perrin, R., Demazeau, Y., Vincent, J.M.: Building the best macroscopic representations of complex multi-agent systems. In: Transactions on Computational Collective Intelligence. LNCS. Springer, Berlin/Heidelberg (2014)
Lamarche-Perrin, R., Schnorr, L.M., Vincent, J.M., Demazeau, Y.: Evaluating trace aggregation for performance visualization of large distributed systems. In: Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, Monterey, pp. 139–140 (2014)
Linux Tools Project/LTTng2/User Guide – Eclipsepedia. http://wiki.eclipse.org/index.php/Linux_Tools_Project/LTTng2/User_Guide (2015)
Lusk, E., Chan, A.: Early experiments with the OpenMP/MPI hybrid programming model. In: OpenMP in a New Era of Parallelism. LNCS, vol. 5004, pp. 36–47. Springer, Berlin (2008)
Pagano, G., Dosimont, D., Huard, G., Marangozova-Martin, V., Vincent, J.M.: Trace management and analysis for embedded systems. In: 2013 IEEE 7th International Symposium on Embedded Multicore Socs (MCSoC), Tokyo, pp. 119–122 (2013). doi:10.1109/MCSoC.2013.28
Pillet, V., Labarta, J., Cortes, T., Girona, S.: Paraver: a tool to visualize and analyze parallel code. In: Proceedings of WoTUG: Transputer & Occam Developments, Manchester, vol. 44, pp. 17–31 (1995)
Prada-Rojas, C., Riss, F., Raynaud, X., De Paoli, S., Santana, M.: Observation tools for debugging and performance analysis of embedded linux applications. In: Conference on System Software, SoC and Silicon Debug-S4D, Sophia-Antipolis (2009)
Schnorr, L.M., Legrand, A., Vincent, J.M.: Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization. Concurr. Comput. 24(15), 1792–1816 (2012)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Dosimont, D., Corre, Y., Schnorr, L.M., Huard, G., Vincent, JM. (2015). Ocelotl: Large Trace Overviews Based on Multidimensional Data Aggregation. In: Niethammer, C., Gracia, J., Knüpfer, A., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-16012-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-16012-2_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16011-5
Online ISBN: 978-3-319-16012-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)