State of the art of network protocol reverse engineering tools

  • Julien Duchêne
  • Colas Le Guernic
  • Eric  Alata
  • Vincent Nicomette
  • Mohamed Kaâniche
Original Paper


Communication protocols enable structured information exchanges between different entities. A description, at different levels of detail, is necessary for many applications, such as interoperability or security audits. When such a description is not available, one can resort to protocol reverse engineering to infer the format of exchanged messages or a model of the protocol. During the past 12 years, several tools have been developed in order to automate, entirely or partially, the protocol inference process. Each of those tools has been developed with a specific application goal for the inferred model, leading to specific needs, and thus different strengths and limitations. After identifying key challenges, the paper presents a survey of protocol reverse engineering tools developed in the last decade. We consider tools focusing on the inference of the format of individual messages or of the grammar of sequences of messages. Finally, we propose a classification of these tools according to different criteria, that is aimed at providing relevant insights about the techniques used by each of these tools and comparatively to other tools, for the classification of messages, the inference of their format or of the grammar of the protocol. This classification also permits to identify technical areas that are not sufficiently explored so far and that require further development in the future.


Reverse engineering Protocol inference Data structure inference Network trace analysis Binary application analysis 


  1. 1.
    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). doi: 10.1016/0890-5401(87)90052-6 MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Antunes, J., Neves, N., Verissimo, P.: Reverse engineering of protocols from network traces. In: 2011 18th Working Conference on Reverse Engineering (WCRE), pp. 169–178. IEEE, New York, NY (2011). doi: 10.1109/WCRE.2011.28
  3. 3.
    Beddoe, M.: Network Protocol Analysis using Bioinformatics Algorithms. (2004).
  4. 4.
    Beddoe, M.: Protocol Informatics Project. (2004).
  5. 5.
    Bohlin, T., Jonsson, B.: Regular Inference for Communication Protocol Entities. Technical Report 2008-024, Department of Information Technology, Uppsala University, Uppsala University, Sweden (2008)Google Scholar
  6. 6.
    Bossert, G.: Exploiting Semantic for the Automatic Reverse Engineering of Communication Protocols. PhD thesis, Supelec (2014)Google Scholar
  7. 7.
    Bossert, G., Guihery, F., Hiet, G.: Towards automated protocol reverse engineering using semantic information. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, pp. 51–62. ACM, Kyoto (2014). doi: 10.1145/2590296.2590346
  8. 8.
    Bossert, G., Hiet, G., Henin, T.: Modelling to simulate botnet command and control protocols for the evaluation of network intrusion detection systems. In: 2011 Conference on Network and Information Systems Security (SAR-SSI), pp. 1–8. IEEE, La Rochelle (2011). doi: 10.1109/SAR-SSI.2011.5931397
  9. 9.
    Caballero, J., Grieco, G., Marron, M., Lin, Z., Urbina, D.: ARTISTE: Automatic Generation of Hybrid Data Structure Signatures from Binary Code Executions. Technical Report TR-IMDEA-SW-2012-001, IMDEA Software Institute, Madrid (2012)Google Scholar
  10. 10.
    Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, pp. 621–634. ACM, New York, NY (2009). doi: 10.1145/1653662.1653737
  11. 11.
    Caballero, J., Song, D.: Rosetta: Extracting Protocol Semantics Using Binary Analysis with Applications to Protocol Replay and NAT Rewriting. Technical Report CMU-CyLab-07-014, Carnegie Mellon University, Pittsburgh (2007)Google Scholar
  12. 12.
    Caballero, J., Song, D.: Automatic protocol reverse-engineering: message format extraction and field semantics inference. Comput. Netw. 57(2), 451–474 (2013). doi: 10.1016/j.comnet.2012.08.003 CrossRefGoogle Scholar
  13. 13.
    Caballero, J., Yin, H., Liang, Z., Song, D.: Polyglot: automatic extraction of protocol message format using dynamic binary analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07, pp. 317–329. ACM, New York, NY (2007). doi: 10.1145/1315245.1315286
  14. 14.
    Caballero Bayerri, J.: Grammar and Model Extraction for Security Applications Using Dynamic Program Binary Analysis. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (2010)Google Scholar
  15. 15.
    Campana, G.: Fuzzgrind: un outil de fuzzing automatique. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2009)Google Scholar
  16. 16.
    Campana, G.: Fuzzgrind: an automatic fuzzing tool. In: Hack. lu. Hack. lu, Luxembourg (2009)Google Scholar
  17. 17.
    Cho, C.Y., Babi D., Shin, E.C.R., Song, D.: Inference and analysis of formal models of botnet command and control protocols. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS ’10, pp. 426–439. ACM, New York, NY (2010). doi: 10.1145/1866307.1866355
  18. 18.
    Cho, C.Y., Babi, D., Poosankam, P., Chen, K.Z., Wu, E.X., Song, D.: MACE: model-inference-assisted concolic exploration for protocol and vulnerability discovery. In: Proceedings of the 20th USENIX Conference on Security, SEC’11, p. 19. USENIX Association, Berkeley, CA (2011)Google Scholar
  19. 19.
    Chow, J.: Understanding Data Lifetime. Ph.D. thesis, Stanford University, Stanford, CA (2006)Google Scholar
  20. 20.
    Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: protocol specification extraction. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 110–125. IEEE, Berkeley (2009). doi: 10.1109/SP.2009.14
  21. 21.
    Cui, W., Kannan, J., Wang, H.J.: Discoverer: automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pp. 14:1–14:14. USENIX Association, Berkeley, CA (2007)Google Scholar
  22. 22.
    Cui, W., Paxson, V., Weaver, N., Katz, R.H.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2006).
  23. 23.
    Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: automatic reverse engineering of input formats. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 391–402. ACM, New York, NY (2008). doi: 10.1145/1455770.1455820
  24. 24.
    Cui, W., Peinado, M., Wang, H., Locasto, M.: ShieldGen: automatic data patch generation for unknown vulnerabilities with informed probing. In: IEEE Symposium on Security and Privacy, 2007. SP ’07, pp. 252–266. IEEE, Oakland (2007). doi: 10.1109/SP.2007.34
  25. 25.
    Guihery, F., Bossert, G.: The future of protocol reversing and simulation applied on ZeroAccess. In: 29C3: 29th Chaos Communication Congress ’12. C-3, Hambourg (2012)Google Scholar
  26. 26.
    Guihery, F., Bossert, G.: Netzob: un outil pour la rtro-conception de protocoles de communication. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2012)Google Scholar
  27. 27.
    de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY (2010)CrossRefzbMATHGoogle Scholar
  28. 28.
    Krueger, T., Gascon, H., Krmer, N., Rieck, K.: Learning stateful models for network honeypots. In: Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, AISec ’12, pp. 37–48. ACM, New York, NY (2012). doi: 10.1145/2381896.2381904
  29. 29.
    Krueger, T., Krmer, N., Rieck, K.: ASAP: automatic semantics-aware analysis of network payloads. In: Dimitrakakis, C., Gkoulalas-Divanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds.) Privacy and Security Issues in Data Mining and Machine Learning, No. 6549 in Lecture Notes in Computer Science, pp. 50–63. Springer, Berlin (2010). doi: 10.1007/978-3-642-19896-0_5
  30. 30.
    Leita, C.: SGNET: Automated Protocol Learning for the Observation of Malicious Threats. Ph.D. thesis, Universit de Nice (2008).
  31. 31.
    Leita, C., Mermoud, K., Dacier, M.: ScriptGen: an automated script generation tool for Honeyd. In: Computer Security Applications Conference, 21st Annual, pp. 12–214. IEEE, Tucson (2005). doi: 10.1109/CSAC.2005.49
  32. 32.
    Li, X., Chen, L.: A survey on methods of automatic protocol reverse engineering. In: 2011 Seventh International Conference on Computational Intelligence and Security (CIS), pp. 685–689. IEEE, Hainan (2011). doi: 10.1109/CIS.2011.156
  33. 33.
    Lim, J., Reps, T., Liblit, B.: Extracting output formats from executables. In: 13th Working Conference on Reverse Engineering, 2006. WCRE ’06, pp. 167–178. IEEE, Benevento (2006). doi: 10.1109/WCRE.2006.29
  34. 34.
    Lin, Z.: Reverse Engineering of Data Structures from Binary. Ph.D. thesis, Purdue University, West Lafayette, IA (2011)Google Scholar
  35. 35.
    Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008)Google Scholar
  36. 36.
    Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2010)Google Scholar
  37. 37.
    Narayan, J., Shukla, S.K., Clancy, T.C.: A survey of automatic protocol reverse engineering tools. ACM Comput. Surv. (CSUR) 48(3), 40 (2015)CrossRefGoogle Scholar
  38. 38.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970). doi: 10.1016/0022-2836(70)90057-4 CrossRefGoogle Scholar
  39. 39.
    Nei, M., Tajima, F., Tateno, Y.: Accuracy of estimated phylogenetic trees from molecular data. J. Mol. Evol. 19(2), 153–170 (1983)CrossRefGoogle Scholar
  40. 40.
    Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Ferrante, J., McKinley, K.S. (eds.) Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, CA, June 10–13, 2007, pp. 89–100. ACM (2007). doi: 10.1145/1250734.1250746
  41. 41.
    Newsome, J., Brumley, D., Franklin, J., Song, D.: Replayer: automatic protocol replay by binary analysis. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 311–321. ACM, New York, NY (2006). doi: 10.1145/1180405.1180444
  42. 42.
    Samba Team: Opening Windows to a Wider World.
  43. 43.
    Slowinska, A., Stancescu, T., Bos, H.: Dynamic Data Structure Excavation. Technical Report IR-CS-55, Vrije Universiteit Amsterdam, Amsterdam (2010)Google Scholar
  44. 44.
    Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2011)Google Scholar
  45. 45.
    Wang, R., Wang, X., Zhang, K., Li, Z.: Towards automatic reverse engineering of software security configurations. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 245–256. ACM, Limerick (2008). doi: 10.1145/1455770.1455802
  46. 46.
    Wang, Y., Zhang, Z., Guo, L.: Inferring protocol state machine from real-world trace. In: S. Jha, R. Sommer, C. Kreibich (eds.) Recent Advances in Intrusion Detection, No. 6307 in Lecture Notes in Computer Science, pp. 498–499. Springer, Berlin (2010). doi: 10.1007/978-3-642-15512-3_32
  47. 47.
    Wang, Y., Zhang, Z., Yao, D.D., Qu, B., Guo, L.: Inferring protocol state machine from network traces: a probabilistic approach. In: Lopez, J., Tsudik, G. (eds.) Applied Cryptography and Network Security, No. 6715 in Lecture Notes in Computer Science, pp. 1–18. Springer, Berlin (2011)Google Scholar
  48. 48.
    Wang, Z., Jiang, X., Cui, W., Wang, X., Grace, M.: ReFormat: automatic reverse engineering of encrypted messages. In: Backes, M., Ning, P. (eds.) Computer Security ESORICS 2009, No. 5789 in Lecture Notes in Computer Science, pp. 200–215. Springer, Berlin (2009)Google Scholar
  49. 49.
    Wondracek, G., Comparetti, P.M., Krügel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008).
  50. 50.
    Zalewski, M.: American Fuzzy Loop.
  51. 51.
    Zeng, J., Lin, Z.: Towards automatic inference of kernel object semantics from binary code. In: 18th International Symposium, RAID 2015, vol. 9404, pp. 538–561. Springer, Kyoto (2015). doi: 10.1007/978-3-319-26362-5

Copyright information

© Springer-Verlag France 2017

Authors and Affiliations

  • Julien Duchêne
    • 1
    • 3
  • Colas Le Guernic
    • 1
    • 2
  • Eric  Alata
    • 3
  • Vincent Nicomette
    • 3
  • Mohamed Kaâniche
    • 3
  1. 1.DGA Maîtrise de l’InformationRennesFrance
  2. 2.LHS, TAMISInriaRennesFrance
  3. 3.LAAS-CNRS, Université de Toulouse, CNRS, INSAToulouseFrance

Personalised recommendations