Skip to main content
Log in

Abstract

Communication protocols enable structured information exchanges between different entities. A description, at different levels of detail, is necessary for many applications, such as interoperability or security audits. When such a description is not available, one can resort to protocol reverse engineering to infer the format of exchanged messages or a model of the protocol. During the past 12 years, several tools have been developed in order to automate, entirely or partially, the protocol inference process. Each of those tools has been developed with a specific application goal for the inferred model, leading to specific needs, and thus different strengths and limitations. After identifying key challenges, the paper presents a survey of protocol reverse engineering tools developed in the last decade. We consider tools focusing on the inference of the format of individual messages or of the grammar of sequences of messages. Finally, we propose a classification of these tools according to different criteria, that is aimed at providing relevant insights about the techniques used by each of these tools and comparatively to other tools, for the classification of messages, the inference of their format or of the grammar of the protocol. This classification also permits to identify technical areas that are not sufficiently explored so far and that require further development in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Notes

  1. Model generalization offers the capability to define the format of a class of messages instead of a single instance of a message.

  2. Available at http://www.4tphi.net/~awalters/PI/PI.html.

  3. Available at https://github.com/tammok/PRISMA.

  4. Available at https://github.com/jasantunes/reverx.

  5. Available at https://www.netzob.org/.

  6. Available at https://github.com/Grindland/Fuzzgrind.

  7. Available at http://lcamtuf.coredump.cx/afl/.

References

  1. Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). doi:10.1016/0890-5401(87)90052-6

    Article  MathSciNet  MATH  Google Scholar 

  2. Antunes, J., Neves, N., Verissimo, P.: Reverse engineering of protocols from network traces. In: 2011 18th Working Conference on Reverse Engineering (WCRE), pp. 169–178. IEEE, New York, NY (2011). doi:10.1109/WCRE.2011.28

  3. Beddoe, M.: Network Protocol Analysis using Bioinformatics Algorithms. (2004). http://www.4tphi.net/~awalters/PI/pi.pdf

  4. Beddoe, M.: Protocol Informatics Project. (2004). http://www.4tphi.net/~awalters/PI/PI.html

  5. Bohlin, T., Jonsson, B.: Regular Inference for Communication Protocol Entities. Technical Report 2008-024, Department of Information Technology, Uppsala University, Uppsala University, Sweden (2008)

  6. Bossert, G.: Exploiting Semantic for the Automatic Reverse Engineering of Communication Protocols. PhD thesis, Supelec (2014)

  7. Bossert, G., Guihery, F., Hiet, G.: Towards automated protocol reverse engineering using semantic information. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, pp. 51–62. ACM, Kyoto (2014). doi:10.1145/2590296.2590346

  8. Bossert, G., Hiet, G., Henin, T.: Modelling to simulate botnet command and control protocols for the evaluation of network intrusion detection systems. In: 2011 Conference on Network and Information Systems Security (SAR-SSI), pp. 1–8. IEEE, La Rochelle (2011). doi:10.1109/SAR-SSI.2011.5931397

  9. Caballero, J., Grieco, G., Marron, M., Lin, Z., Urbina, D.: ARTISTE: Automatic Generation of Hybrid Data Structure Signatures from Binary Code Executions. Technical Report TR-IMDEA-SW-2012-001, IMDEA Software Institute, Madrid (2012)

  10. Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, pp. 621–634. ACM, New York, NY (2009). doi:10.1145/1653662.1653737

  11. Caballero, J., Song, D.: Rosetta: Extracting Protocol Semantics Using Binary Analysis with Applications to Protocol Replay and NAT Rewriting. Technical Report CMU-CyLab-07-014, Carnegie Mellon University, Pittsburgh (2007)

  12. Caballero, J., Song, D.: Automatic protocol reverse-engineering: message format extraction and field semantics inference. Comput. Netw. 57(2), 451–474 (2013). doi:10.1016/j.comnet.2012.08.003

    Article  Google Scholar 

  13. Caballero, J., Yin, H., Liang, Z., Song, D.: Polyglot: automatic extraction of protocol message format using dynamic binary analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07, pp. 317–329. ACM, New York, NY (2007). doi:10.1145/1315245.1315286

  14. Caballero Bayerri, J.: Grammar and Model Extraction for Security Applications Using Dynamic Program Binary Analysis. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (2010)

  15. Campana, G.: Fuzzgrind: un outil de fuzzing automatique. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2009)

  16. Campana, G.: Fuzzgrind: an automatic fuzzing tool. In: Hack. lu. Hack. lu, Luxembourg (2009)

  17. Cho, C.Y., Babi D., Shin, E.C.R., Song, D.: Inference and analysis of formal models of botnet command and control protocols. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS ’10, pp. 426–439. ACM, New York, NY (2010). doi:10.1145/1866307.1866355

  18. Cho, C.Y., Babi, D., Poosankam, P., Chen, K.Z., Wu, E.X., Song, D.: MACE: model-inference-assisted concolic exploration for protocol and vulnerability discovery. In: Proceedings of the 20th USENIX Conference on Security, SEC’11, p. 19. USENIX Association, Berkeley, CA (2011)

  19. Chow, J.: Understanding Data Lifetime. Ph.D. thesis, Stanford University, Stanford, CA (2006)

  20. Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: protocol specification extraction. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 110–125. IEEE, Berkeley (2009). doi:10.1109/SP.2009.14

  21. Cui, W., Kannan, J., Wang, H.J.: Discoverer: automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pp. 14:1–14:14. USENIX Association, Berkeley, CA (2007)

  22. Cui, W., Paxson, V., Weaver, N., Katz, R.H.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2006). http://research.microsoft.com/apps/pubs/default.aspx?id=153197

  23. Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: automatic reverse engineering of input formats. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 391–402. ACM, New York, NY (2008). doi:10.1145/1455770.1455820

  24. Cui, W., Peinado, M., Wang, H., Locasto, M.: ShieldGen: automatic data patch generation for unknown vulnerabilities with informed probing. In: IEEE Symposium on Security and Privacy, 2007. SP ’07, pp. 252–266. IEEE, Oakland (2007). doi:10.1109/SP.2007.34

  25. Guihery, F., Bossert, G.: The future of protocol reversing and simulation applied on ZeroAccess. In: 29C3: 29th Chaos Communication Congress ’12. C-3, Hambourg (2012)

  26. Guihery, F., Bossert, G.: Netzob: un outil pour la rtro-conception de protocoles de communication. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2012)

  27. de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY (2010)

    Book  MATH  Google Scholar 

  28. Krueger, T., Gascon, H., Krmer, N., Rieck, K.: Learning stateful models for network honeypots. In: Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, AISec ’12, pp. 37–48. ACM, New York, NY (2012). doi:10.1145/2381896.2381904

  29. Krueger, T., Krmer, N., Rieck, K.: ASAP: automatic semantics-aware analysis of network payloads. In: Dimitrakakis, C., Gkoulalas-Divanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds.) Privacy and Security Issues in Data Mining and Machine Learning, No. 6549 in Lecture Notes in Computer Science, pp. 50–63. Springer, Berlin (2010). doi:10.1007/978-3-642-19896-0_5

  30. Leita, C.: SGNET: Automated Protocol Learning for the Observation of Malicious Threats. Ph.D. thesis, Universit de Nice (2008). http://www.eurecom.fr/publication/2709

  31. Leita, C., Mermoud, K., Dacier, M.: ScriptGen: an automated script generation tool for Honeyd. In: Computer Security Applications Conference, 21st Annual, pp. 12–214. IEEE, Tucson (2005). doi:10.1109/CSAC.2005.49

  32. Li, X., Chen, L.: A survey on methods of automatic protocol reverse engineering. In: 2011 Seventh International Conference on Computational Intelligence and Security (CIS), pp. 685–689. IEEE, Hainan (2011). doi:10.1109/CIS.2011.156

  33. Lim, J., Reps, T., Liblit, B.: Extracting output formats from executables. In: 13th Working Conference on Reverse Engineering, 2006. WCRE ’06, pp. 167–178. IEEE, Benevento (2006). doi:10.1109/WCRE.2006.29

  34. Lin, Z.: Reverse Engineering of Data Structures from Binary. Ph.D. thesis, Purdue University, West Lafayette, IA (2011)

  35. Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008)

  36. Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2010)

  37. Narayan, J., Shukla, S.K., Clancy, T.C.: A survey of automatic protocol reverse engineering tools. ACM Comput. Surv. (CSUR) 48(3), 40 (2015)

    Article  Google Scholar 

  38. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970). doi:10.1016/0022-2836(70)90057-4

    Article  Google Scholar 

  39. Nei, M., Tajima, F., Tateno, Y.: Accuracy of estimated phylogenetic trees from molecular data. J. Mol. Evol. 19(2), 153–170 (1983)

    Article  Google Scholar 

  40. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Ferrante, J., McKinley, K.S. (eds.) Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, CA, June 10–13, 2007, pp. 89–100. ACM (2007). doi:10.1145/1250734.1250746

  41. Newsome, J., Brumley, D., Franklin, J., Song, D.: Replayer: automatic protocol replay by binary analysis. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 311–321. ACM, New York, NY (2006). doi:10.1145/1180405.1180444

  42. Samba Team: Opening Windows to a Wider World. http://www.samba.org

  43. Slowinska, A., Stancescu, T., Bos, H.: Dynamic Data Structure Excavation. Technical Report IR-CS-55, Vrije Universiteit Amsterdam, Amsterdam (2010)

  44. Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2011)

  45. Wang, R., Wang, X., Zhang, K., Li, Z.: Towards automatic reverse engineering of software security configurations. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 245–256. ACM, Limerick (2008). doi:10.1145/1455770.1455802

  46. Wang, Y., Zhang, Z., Guo, L.: Inferring protocol state machine from real-world trace. In: S. Jha, R. Sommer, C. Kreibich (eds.) Recent Advances in Intrusion Detection, No. 6307 in Lecture Notes in Computer Science, pp. 498–499. Springer, Berlin (2010). doi:10.1007/978-3-642-15512-3_32

  47. Wang, Y., Zhang, Z., Yao, D.D., Qu, B., Guo, L.: Inferring protocol state machine from network traces: a probabilistic approach. In: Lopez, J., Tsudik, G. (eds.) Applied Cryptography and Network Security, No. 6715 in Lecture Notes in Computer Science, pp. 1–18. Springer, Berlin (2011)

  48. Wang, Z., Jiang, X., Cui, W., Wang, X., Grace, M.: ReFormat: automatic reverse engineering of encrypted messages. In: Backes, M., Ning, P. (eds.) Computer Security ESORICS 2009, No. 5789 in Lecture Notes in Computer Science, pp. 200–215. Springer, Berlin (2009)

  49. Wondracek, G., Comparetti, P.M., Krügel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008). http://www.isoc.org/isoc/conferences/ndss/08/papers/13_automatic_network_protocol.pdf

  50. Zalewski, M.: American Fuzzy Loop. http://lcamtuf.coredump.cx/afl/technical_details.txt

  51. Zeng, J., Lin, Z.: Towards automatic inference of kernel object semantics from binary code. In: 18th International Symposium, RAID 2015, vol. 9404, pp. 538–561. Springer, Kyoto (2015). doi:10.1007/978-3-319-26362-5

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Duchêne.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duchêne, J., Le Guernic, C., Alata, E. et al. State of the art of network protocol reverse engineering tools. J Comput Virol Hack Tech 14, 53–68 (2018). https://doi.org/10.1007/s11416-016-0289-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-016-0289-8

Keywords

Navigation