Abstract
Binary authorship attribution refers to the process of discovering information related to the author(s) of anonymous binary code on the basis of stylometric characteristics extracted from the code. However, in practice, authorship attribution for binary code still requires considerable manual and error-prone reverse engineering analysis, which can be a daunting task given the sheer volume and complexity of today’s malware. In this chapter, we propose BinAuthor, a novel and the first compiler-agnostic method for identifying the authors of program binaries. Having filtered out unrelated functions (compiler and library) to detect user-related functions, it converts user-related functions into a canonical form to eliminate compiler/compilation effects. Then, it leverages a set of features based on collections of authors’ choices made during coding. These features capture an author’s programming coding habits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Contagio: malware dump. http://contagiodump.blogspot.ca. Accessed: February, 2018.
VirusSign: Malware Research & Data Center, Virus Free. http://www.virussign.com/. Accessed: February, 2017.
Google Code Jam Contest Dataset. http://code.google.com/codejam/, 2008–2017. Accessed: February, 2018.
GitHub-Build software better. https://github.com/, 2011. Accessed: May, 2019.
Materials supplement for the paper “Who Wrote This Code? Identifying the Authors of Program Binaries”. http://pages.cs.wisc.edu/~nater/esorics-supp/, 2011. Accessed: May, 2017.
Mcafee: Technical report. www.mcafee.com/ca/resources/wp-citadel-trojan-summary.pdf, 2011. Accessed: Mar, 2017.
Gephi plugin for nneo4j. https://marketplace.gephi.org/plugin/neo4j-graph-database-support/, 2015. Accessed: February, 2016.
Planet source code. http://www.planet-source-code.com/vb/default.asp?lngWId=3\ \#ContentWinners, 2015. Accessed: March, 2017.
Programmer De-anonymization from Binary Executables. https://github.com/calaylin/bda, 2015. Accessed: January, 2017.
C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.
Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.
Hex-Rays Decompiler. https://www.hex-rays.com/products/decompiler/, 2019. Accessed: June 2019.
Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.
Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.
Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. On the feasibility of malware authorship attribution. In International Symposium on Foundations and Practice of Security, pages 256–272. Springer, 2016.
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. FOSSIL: a resilient and efficient system for identifying FOSS functions in malware binaries. ACM Transactions on Privacy and Security (TOPS), 21(2):8, 2018.
Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. When coding style survives compilation: De-anonymizing programmers from executable binaries. The 25th Annual Network and Distributed System Security Symposium (NDSS), pages 255–270, 2018.
Rudi Cilibrasi and Paul Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, 2005.
Yaniv David, Nimrod Partush, and Eran Yahav. Similarity of binaries through re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 79–94. ACM, 2017.
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.
Tommi A Junttila and Petteri Kaski. Engineering an efficient canonical labeling tool for large and sparse graphs. In Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), volume 7, pages 135–149. SIAM, 2007.
Donald E Knuth. Backus normal form vs. backus naur form. Communications of the ACM, 7(12):735–736, 1964.
Ivan Krsul and Eugene H Spafford. Authorship analysis: Identifying the author of a program. Computers & Security, 16(3):233–257, 1997.
Kaspersky Lab. Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected. http://newsroom.kaspersky.eu/fileadmin/user_upload/en/Images/Lifestyle/20120611_Kaspersky_Lab_Press_Release_Flame_Stuxnet_cooperation_final_-_UK.pdf, 2012. Accessed: February, 2018.
Prasanta Chandra Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2:49–55, 1936.
Marion Marschalek. Big Game Hunting: Nation-state malware research, BlackHat. https://www.blackhat.com/docs/webcast/08202015-big-game-hunting.pdf/, 2015. Accessed: February, 2018.
Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Sigplan notices, volume 42, pages 89–100. ACM, 2007.
Gary Palmer et al. A road map for digital forensic research. In First Digital Forensic Research Workshop, pages 27–30, 2001.
Václav Rajlich. Software evolution and maintenance. In Proceedings of the Future of Software Engineering, pages 133–144. ACM, 2014.
Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.
Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 76–85. ACM, 2003.
Paria Shirani, Lingyu Wang, and Mourad Debbabi. BinShape: Scalable and robust binary library function identification using function shape. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 301–324. Springer, 2017.
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In IEEE Symposium on Security and Privacy (SP), pages 138–157. IEEE, 2016.
Eugene H Spafford and Stephen A Weeber. Software forensics: Can we track code to its authors? Computers & Security, 12(6):585–595, 1993.
Jean-Baptiste Tristan, Paul Govereau, and Greg Morrisett. Evaluating value-graph translation validation for llvm. ACM Sigplan Notices, 46(6):295–305, 2011.
Jason Tsong-Li Wang, Qicheng Ma, Dennis Shasha, and Cathy H. Wu. New techniques for extracting features from protein sequences. IBM Systems Journal, 40(2):426–441, 2001.
Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1091–1095, 2007.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Alrabaee, S. et al. (2020). Authorship Attribution. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-34238-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34237-1
Online ISBN: 978-3-030-34238-8
eBook Packages: Computer ScienceComputer Science (R0)