Abstract
Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency.
Similar content being viewed by others
References
Biancheri, C., Ezzati-Jivan, N., Dagenais, M.R.: Multilayer virtualized systems analysis with kernel tracing. In: IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW 2016), Aug 2016, pp. 1–6 (Online). https://doi.org/10.1109/W-FiCloud.2016.18
Ezzati-Jivan, N., Dagenais, M.R.: Multi-scale navigation of large trace data: a survey. Concurr. Comput. Pract. Exp. 29(10), e4068 (2017). https://doi.org/10.1002/cpe.4068
Desnoyers, M., Dagenais, M.: The LTTNG tracer: a low impact performance and behavior monitor for gnu/linux. In: Proceedings of the Ottawa Linux Symposium, vol. 2006 (2006)
Desnoyers, M., Dagenais, M.R.: Lockless multi-core high-throughput buffering scheme for kernel tracing. ACM SIGOPS Oper. Syst. Rev. 46(3), 65–81 (2012)
Rostedt, S.: Finding origins of latencies using ftrace. Proc, RT Linux WS (2009)
Eigler, F.C., Hat, R.: Problem solving with systemtap. In: Proceedings of the Ottawa Linux Symposium. Citeseer, pp. 261–268, (2006)
de Melo, A.C.: The new linux’perf’tools. In: Slidesfrom Linux Kongress (2010)
Fournier, P.-M., Desnoyers, M., Dagenais, M.R.: Combined tracing of the kernel and applications with LTTNG. In: Proceedings of the 2009 Linux Symposium (2009)
Matni, G., Dagenais, M.: Automata-based approach for kernel trace analysis. In: Canadian Conference on Electrical and Computer Engineering: CCECE’09. IEEE 2009, 970–973 (2009)
Wininger, F., Ezzati-Jivan, N., Dagenais, M.R.: A declarative framework for stateful analysis of execution traces. Softw. Qual. J. 25(1), 201–229 (2017). https://doi.org/10.1007/s11219-016-9311-0
Kouame, K., Ezzati-Jivan, N., Dagenais, M.R.: A flexible data-driven approach for execution trace filtering. IEEE International Congress on Big Data 2015, 698–703 (2015). https://doi.org/10.1109/BigDataCongress.2015.112
Montplaisir, A., Ezzati-Jivan, N., Wininger, F., Dagenais, M.R.: State history tree: an incremental disk-based data structure for very large interval data. In: International Conference on Social Computing, pp. 716–724 (2013). https://doi.org/10.1109/SocialCom.2013.107
Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. ACM Trans. Comput. Syst. (TOCS) 30(1), 3 (2012)
Nightingale, E. B., Peek, D., Chen, P. M., Flinn, J.: Parallelizing security checks on commodity hardware. In: ACM Sigplan Notices, vol. 43(3), pp. 308–318. ACM (2008)
Süßkraut, M., Knauth, T., Weigert, S., Schiffel, U., Meinhold, M., Fetzer, C.: Prospect: a compiler framework for speculative parallelization. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 131–140. ACM (2010)
Zilles, C., Sohi, G.: Master, slave speculative parallelization. In: Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture: (MICRO-35), pp. 85–96. IEEE (2002)
Wolf, F., Mohr, B.: Automatic performance analysis of hybrid mpi/openmp applications. J. Syst. Archit. 49(10), 421–439 (2003)
Geimer, M., Wolf, F., Wylie, B.J., Mohr, B.: Scalable parallel trace-based performance analysis. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 303–312. Springer (2006)
Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parall. Comput. 35(7), 375 – 388 (2009). (Online) http://www.sciencedirect.com/science/article/pii/S0167819109000398
Tumeo, A., Villa, O., Chavarria-Miranda, D.G.: Aho-corasick string matching on shared and distributed-memory parallel architectures. IEEE Trans. Parall. Distrib. Syst. 23(3), 436–443 (2012)
Schuff, D.L., Choe, Y.R., Pai, V.S.: Conservative vs. optimistic parallelization of stateful network intrusion detection. In: IEEE International Symposium on Performance Analysis of Systems and software: ISPASS, 2008, 32–43. IEEE (2008)
Vasiliadis, G., Polychronakis, M., Ioannidis, S.: Midea: a multi-parallel intrusion detection architecture. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 297–308. ACM (2011)
Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM (JACM) 27(4), 831–838 (1980)
Hillis, W.D., Steele Jr., G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986)
Mytkowicz, T., Musuvathi, M., Schulte, W.: Data-parallel finite-state machines. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 529–542. ACM (2014)
Desnoyers, M.: Common trace format (ctf) specification (Online). http://git.efficios.com/?p=ctf.git;a=blob_plain;f=common-trace-format-specification.md;hb=master
Vergé, A., Ezzati-Jivan, N., Dagenais, M.R.: Hardware-assisted software event tracing. Concurr. Comput. Pract. Exp. 29(10), (2017). https://doi.org/10.1002/cpe.4069
Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. ACM SIGARCH Comput. Archit. News 40(1), 199–210 (2012)
Reumont-Locke, F.: Méthodes efficaces de parallélisation de l’analyse de traces NOYAU. Masters thesis, École Polytechnique de Montréal (2015)
Acknowledgements
The financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Ericsson Software Research is gratefully acknowledged. We would also like to thank Francis Giraldeau for his advice and comments, as well as Geneviève Bastien and Julien Desfossez for their help.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Reumont-Locke, F., Ezzati-Jivan, N. & Dagenais, M.R. Efficient Methods for Trace Analysis Parallelization. Int J Parallel Prog 47, 951–972 (2019). https://doi.org/10.1007/s10766-019-00631-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-019-00631-4