Abstract
Communication protocols enable structured information exchanges between different entities. A description, at different levels of detail, is necessary for many applications, such as interoperability or security audits. When such a description is not available, one can resort to protocol reverse engineering to infer the format of exchanged messages or a model of the protocol. During the past 12 years, several tools have been developed in order to automate, entirely or partially, the protocol inference process. Each of those tools has been developed with a specific application goal for the inferred model, leading to specific needs, and thus different strengths and limitations. After identifying key challenges, the paper presents a survey of protocol reverse engineering tools developed in the last decade. We consider tools focusing on the inference of the format of individual messages or of the grammar of sequences of messages. Finally, we propose a classification of these tools according to different criteria, that is aimed at providing relevant insights about the techniques used by each of these tools and comparatively to other tools, for the classification of messages, the inference of their format or of the grammar of the protocol. This classification also permits to identify technical areas that are not sufficiently explored so far and that require further development in the future.
Notes
Model generalization offers the capability to define the format of a class of messages instead of a single instance of a message.
Available at http://www.4tphi.net/~awalters/PI/PI.html.
Available at https://github.com/tammok/PRISMA.
Available at https://github.com/jasantunes/reverx.
Available at https://www.netzob.org/.
Available at https://github.com/Grindland/Fuzzgrind.
Available at http://lcamtuf.coredump.cx/afl/.
References
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987). doi:10.1016/0890-5401(87)90052-6
Antunes, J., Neves, N., Verissimo, P.: Reverse engineering of protocols from network traces. In: 2011 18th Working Conference on Reverse Engineering (WCRE), pp. 169–178. IEEE, New York, NY (2011). doi:10.1109/WCRE.2011.28
Beddoe, M.: Network Protocol Analysis using Bioinformatics Algorithms. (2004). http://www.4tphi.net/~awalters/PI/pi.pdf
Beddoe, M.: Protocol Informatics Project. (2004). http://www.4tphi.net/~awalters/PI/PI.html
Bohlin, T., Jonsson, B.: Regular Inference for Communication Protocol Entities. Technical Report 2008-024, Department of Information Technology, Uppsala University, Uppsala University, Sweden (2008)
Bossert, G.: Exploiting Semantic for the Automatic Reverse Engineering of Communication Protocols. PhD thesis, Supelec (2014)
Bossert, G., Guihery, F., Hiet, G.: Towards automated protocol reverse engineering using semantic information. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, pp. 51–62. ACM, Kyoto (2014). doi:10.1145/2590296.2590346
Bossert, G., Hiet, G., Henin, T.: Modelling to simulate botnet command and control protocols for the evaluation of network intrusion detection systems. In: 2011 Conference on Network and Information Systems Security (SAR-SSI), pp. 1–8. IEEE, La Rochelle (2011). doi:10.1109/SAR-SSI.2011.5931397
Caballero, J., Grieco, G., Marron, M., Lin, Z., Urbina, D.: ARTISTE: Automatic Generation of Hybrid Data Structure Signatures from Binary Code Executions. Technical Report TR-IMDEA-SW-2012-001, IMDEA Software Institute, Madrid (2012)
Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, pp. 621–634. ACM, New York, NY (2009). doi:10.1145/1653662.1653737
Caballero, J., Song, D.: Rosetta: Extracting Protocol Semantics Using Binary Analysis with Applications to Protocol Replay and NAT Rewriting. Technical Report CMU-CyLab-07-014, Carnegie Mellon University, Pittsburgh (2007)
Caballero, J., Song, D.: Automatic protocol reverse-engineering: message format extraction and field semantics inference. Comput. Netw. 57(2), 451–474 (2013). doi:10.1016/j.comnet.2012.08.003
Caballero, J., Yin, H., Liang, Z., Song, D.: Polyglot: automatic extraction of protocol message format using dynamic binary analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07, pp. 317–329. ACM, New York, NY (2007). doi:10.1145/1315245.1315286
Caballero Bayerri, J.: Grammar and Model Extraction for Security Applications Using Dynamic Program Binary Analysis. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (2010)
Campana, G.: Fuzzgrind: un outil de fuzzing automatique. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2009)
Campana, G.: Fuzzgrind: an automatic fuzzing tool. In: Hack. lu. Hack. lu, Luxembourg (2009)
Cho, C.Y., Babi D., Shin, E.C.R., Song, D.: Inference and analysis of formal models of botnet command and control protocols. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS ’10, pp. 426–439. ACM, New York, NY (2010). doi:10.1145/1866307.1866355
Cho, C.Y., Babi, D., Poosankam, P., Chen, K.Z., Wu, E.X., Song, D.: MACE: model-inference-assisted concolic exploration for protocol and vulnerability discovery. In: Proceedings of the 20th USENIX Conference on Security, SEC’11, p. 19. USENIX Association, Berkeley, CA (2011)
Chow, J.: Understanding Data Lifetime. Ph.D. thesis, Stanford University, Stanford, CA (2006)
Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: protocol specification extraction. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 110–125. IEEE, Berkeley (2009). doi:10.1109/SP.2009.14
Cui, W., Kannan, J., Wang, H.J.: Discoverer: automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pp. 14:1–14:14. USENIX Association, Berkeley, CA (2007)
Cui, W., Paxson, V., Weaver, N., Katz, R.H.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2006). http://research.microsoft.com/apps/pubs/default.aspx?id=153197
Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: automatic reverse engineering of input formats. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 391–402. ACM, New York, NY (2008). doi:10.1145/1455770.1455820
Cui, W., Peinado, M., Wang, H., Locasto, M.: ShieldGen: automatic data patch generation for unknown vulnerabilities with informed probing. In: IEEE Symposium on Security and Privacy, 2007. SP ’07, pp. 252–266. IEEE, Oakland (2007). doi:10.1109/SP.2007.34
Guihery, F., Bossert, G.: The future of protocol reversing and simulation applied on ZeroAccess. In: 29C3: 29th Chaos Communication Congress ’12. C-3, Hambourg (2012)
Guihery, F., Bossert, G.: Netzob: un outil pour la rtro-conception de protocoles de communication. In: Symposium sur la Scurit des Technologies de l’Information et de la Communication, SSTIC. SSTIC, Rennes (2012)
de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY (2010)
Krueger, T., Gascon, H., Krmer, N., Rieck, K.: Learning stateful models for network honeypots. In: Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, AISec ’12, pp. 37–48. ACM, New York, NY (2012). doi:10.1145/2381896.2381904
Krueger, T., Krmer, N., Rieck, K.: ASAP: automatic semantics-aware analysis of network payloads. In: Dimitrakakis, C., Gkoulalas-Divanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds.) Privacy and Security Issues in Data Mining and Machine Learning, No. 6549 in Lecture Notes in Computer Science, pp. 50–63. Springer, Berlin (2010). doi:10.1007/978-3-642-19896-0_5
Leita, C.: SGNET: Automated Protocol Learning for the Observation of Malicious Threats. Ph.D. thesis, Universit de Nice (2008). http://www.eurecom.fr/publication/2709
Leita, C., Mermoud, K., Dacier, M.: ScriptGen: an automated script generation tool for Honeyd. In: Computer Security Applications Conference, 21st Annual, pp. 12–214. IEEE, Tucson (2005). doi:10.1109/CSAC.2005.49
Li, X., Chen, L.: A survey on methods of automatic protocol reverse engineering. In: 2011 Seventh International Conference on Computational Intelligence and Security (CIS), pp. 685–689. IEEE, Hainan (2011). doi:10.1109/CIS.2011.156
Lim, J., Reps, T., Liblit, B.: Extracting output formats from executables. In: 13th Working Conference on Reverse Engineering, 2006. WCRE ’06, pp. 167–178. IEEE, Benevento (2006). doi:10.1109/WCRE.2006.29
Lin, Z.: Reverse Engineering of Data Structures from Binary. Ph.D. thesis, Purdue University, West Lafayette, IA (2011)
Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008)
Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2010)
Narayan, J., Shukla, S.K., Clancy, T.C.: A survey of automatic protocol reverse engineering tools. ACM Comput. Surv. (CSUR) 48(3), 40 (2015)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970). doi:10.1016/0022-2836(70)90057-4
Nei, M., Tajima, F., Tateno, Y.: Accuracy of estimated phylogenetic trees from molecular data. J. Mol. Evol. 19(2), 153–170 (1983)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Ferrante, J., McKinley, K.S. (eds.) Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, CA, June 10–13, 2007, pp. 89–100. ACM (2007). doi:10.1145/1250734.1250746
Newsome, J., Brumley, D., Franklin, J., Song, D.: Replayer: automatic protocol replay by binary analysis. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 311–321. ACM, New York, NY (2006). doi:10.1145/1180405.1180444
Samba Team: Opening Windows to a Wider World. http://www.samba.org
Slowinska, A., Stancescu, T., Bos, H.: Dynamic Data Structure Excavation. Technical Report IR-CS-55, Vrije Universiteit Amsterdam, Amsterdam (2010)
Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2011)
Wang, R., Wang, X., Zhang, K., Li, Z.: Towards automatic reverse engineering of software security configurations. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08, pp. 245–256. ACM, Limerick (2008). doi:10.1145/1455770.1455802
Wang, Y., Zhang, Z., Guo, L.: Inferring protocol state machine from real-world trace. In: S. Jha, R. Sommer, C. Kreibich (eds.) Recent Advances in Intrusion Detection, No. 6307 in Lecture Notes in Computer Science, pp. 498–499. Springer, Berlin (2010). doi:10.1007/978-3-642-15512-3_32
Wang, Y., Zhang, Z., Yao, D.D., Qu, B., Guo, L.: Inferring protocol state machine from network traces: a probabilistic approach. In: Lopez, J., Tsudik, G. (eds.) Applied Cryptography and Network Security, No. 6715 in Lecture Notes in Computer Science, pp. 1–18. Springer, Berlin (2011)
Wang, Z., Jiang, X., Cui, W., Wang, X., Grace, M.: ReFormat: automatic reverse engineering of encrypted messages. In: Backes, M., Ning, P. (eds.) Computer Security ESORICS 2009, No. 5789 in Lecture Notes in Computer Science, pp. 200–215. Springer, Berlin (2009)
Wondracek, G., Comparetti, P.M., Krügel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, San Diego (2008). http://www.isoc.org/isoc/conferences/ndss/08/papers/13_automatic_network_protocol.pdf
Zalewski, M.: American Fuzzy Loop. http://lcamtuf.coredump.cx/afl/technical_details.txt
Zeng, J., Lin, Z.: Towards automatic inference of kernel object semantics from binary code. In: 18th International Symposium, RAID 2015, vol. 9404, pp. 538–561. Springer, Kyoto (2015). doi:10.1007/978-3-319-26362-5
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Duchêne, J., Le Guernic, C., Alata, E. et al. State of the art of network protocol reverse engineering tools. J Comput Virol Hack Tech 14, 53–68 (2018). https://doi.org/10.1007/s11416-016-0289-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-016-0289-8