Efficient Methods for Trace Analysis Parallelization


Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.


  2. 2.


  3. 3.



  1. 1.

    Biancheri, C., Ezzati-Jivan, N., Dagenais, M.R.: Multilayer virtualized systems analysis with kernel tracing. In: IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW 2016), Aug 2016, pp. 1–6 (Online). https://doi.org/10.1109/W-FiCloud.2016.18

  2. 2.

    Ezzati-Jivan, N., Dagenais, M.R.: Multi-scale navigation of large trace data: a survey. Concurr. Comput. Pract. Exp. 29(10), e4068 (2017). https://doi.org/10.1002/cpe.4068

    Article  Google Scholar 

  3. 3.

    Desnoyers, M., Dagenais, M.: The LTTNG tracer: a low impact performance and behavior monitor for gnu/linux. In: Proceedings of the Ottawa Linux Symposium, vol. 2006 (2006)

  4. 4.

    Desnoyers, M., Dagenais, M.R.: Lockless multi-core high-throughput buffering scheme for kernel tracing. ACM SIGOPS Oper. Syst. Rev. 46(3), 65–81 (2012)

    Article  Google Scholar 

  5. 5.

    Rostedt, S.: Finding origins of latencies using ftrace. Proc, RT Linux WS (2009)

  6. 6.

    Eigler, F.C., Hat, R.: Problem solving with systemtap. In: Proceedings of the Ottawa Linux Symposium. Citeseer, pp. 261–268, (2006)

  7. 7.

    de Melo, A.C.: The new linux’perf’tools. In: Slidesfrom Linux Kongress (2010)

  8. 8.

    Fournier, P.-M., Desnoyers, M., Dagenais, M.R.: Combined tracing of the kernel and applications with LTTNG. In: Proceedings of the 2009 Linux Symposium (2009)

  9. 9.

    Matni, G., Dagenais, M.: Automata-based approach for kernel trace analysis. In: Canadian Conference on Electrical and Computer Engineering: CCECE’09. IEEE 2009, 970–973 (2009)

  10. 10.

    Wininger, F., Ezzati-Jivan, N., Dagenais, M.R.: A declarative framework for stateful analysis of execution traces. Softw. Qual. J. 25(1), 201–229 (2017). https://doi.org/10.1007/s11219-016-9311-0

    Article  Google Scholar 

  11. 11.

    Kouame, K., Ezzati-Jivan, N., Dagenais, M.R.: A flexible data-driven approach for execution trace filtering. IEEE International Congress on Big Data 2015, 698–703 (2015). https://doi.org/10.1109/BigDataCongress.2015.112

    Article  Google Scholar 

  12. 12.

    Montplaisir, A., Ezzati-Jivan, N., Wininger, F., Dagenais, M.R.: State history tree: an incremental disk-based data structure for very large interval data. In: International Conference on Social Computing, pp. 716–724 (2013). https://doi.org/10.1109/SocialCom.2013.107

  13. 13.

    Veeraraghavan, K., Lee, D., Wester, B., Ouyang, J., Chen, P.M., Flinn, J., Narayanasamy, S.: Doubleplay: parallelizing sequential logging and replay. ACM Trans. Comput. Syst. (TOCS) 30(1), 3 (2012)

    Article  Google Scholar 

  14. 14.

    Nightingale, E. B., Peek, D., Chen, P. M., Flinn, J.: Parallelizing security checks on commodity hardware. In: ACM Sigplan Notices, vol. 43(3), pp. 308–318. ACM (2008)

  15. 15.

    Süßkraut, M., Knauth, T., Weigert, S., Schiffel, U., Meinhold, M., Fetzer, C.: Prospect: a compiler framework for speculative parallelization. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 131–140. ACM (2010)

  16. 16.

    Zilles, C., Sohi, G.: Master, slave speculative parallelization. In: Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture: (MICRO-35), pp. 85–96. IEEE (2002)

  17. 17.

    Wolf, F., Mohr, B.: Automatic performance analysis of hybrid mpi/openmp applications. J. Syst. Archit. 49(10), 421–439 (2003)

    Article  Google Scholar 

  18. 18.

    Geimer, M., Wolf, F., Wylie, B.J., Mohr, B.: Scalable parallel trace-based performance analysis. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 303–312. Springer (2006)

  19. 19.

    Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parall. Comput. 35(7), 375 – 388 (2009). (Online) http://www.sciencedirect.com/science/article/pii/S0167819109000398

    Article  Google Scholar 

  20. 20.

    Tumeo, A., Villa, O., Chavarria-Miranda, D.G.: Aho-corasick string matching on shared and distributed-memory parallel architectures. IEEE Trans. Parall. Distrib. Syst. 23(3), 436–443 (2012)

    Article  Google Scholar 

  21. 21.

    Schuff, D.L., Choe, Y.R., Pai, V.S.: Conservative vs. optimistic parallelization of stateful network intrusion detection. In: IEEE International Symposium on Performance Analysis of Systems and software: ISPASS, 2008, 32–43. IEEE (2008)

  22. 22.

    Vasiliadis, G., Polychronakis, M., Ioannidis, S.: Midea: a multi-parallel intrusion detection architecture. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 297–308. ACM (2011)

  23. 23.

    Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM (JACM) 27(4), 831–838 (1980)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Hillis, W.D., Steele Jr., G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986)

    Article  Google Scholar 

  25. 25.

    Mytkowicz, T., Musuvathi, M., Schulte, W.: Data-parallel finite-state machines. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 529–542. ACM (2014)

  26. 26.

    Desnoyers, M.: Common trace format (ctf) specification (Online). http://git.efficios.com/?p=ctf.git;a=blob_plain;f=common-trace-format-specification.md;hb=master

  27. 27.

    Vergé, A., Ezzati-Jivan, N., Dagenais, M.R.: Hardware-assisted software event tracing. Concurr. Comput. Pract. Exp. 29(10), (2017). https://doi.org/10.1002/cpe.4069

    Article  Google Scholar 

  28. 28.

    Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. ACM SIGARCH Comput. Archit. News 40(1), 199–210 (2012)

    Article  Google Scholar 

  29. 29.

    Reumont-Locke, F.: Méthodes efficaces de parallélisation de l’analyse de traces NOYAU. Masters thesis, École Polytechnique de Montréal (2015)

Download references


The financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Ericsson Software Research is gratefully acknowledged. We would also like to thank Francis Giraldeau for his advice and comments, as well as Geneviève Bastien and Julien Desfossez for their help.

Author information



Corresponding author

Correspondence to Naser Ezzati-Jivan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Reumont-Locke, F., Ezzati-Jivan, N. & Dagenais, M.R. Efficient Methods for Trace Analysis Parallelization. Int J Parallel Prog 47, 951–972 (2019). https://doi.org/10.1007/s10766-019-00631-4

Download citation


  • Tracing
  • Trace analysis
  • Parallel computing