Abstract
This chapter presents an approach to fingerprint free open-source software (FOSS) packages. FOSS package identification is crucial for several important security applications, e.g., digital forensics , software license infringement , and malware detection . However, existing function identification approaches are insufficient for this purpose due to various challenges in applying practical methods of data mining and database searching, especially when the source code is inaccessible. Moreover, the task of automated detection of FOSS packages becomes more complicated with the introduction of obfuscation techniques, the use of different compilers and compilation settings, and software refactoring techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
GitHub-Build software better. https://github.com/, 2011. Accessed: May, 2019.
Full Analysis of Flame’s Command & Control servers. https://securelist.com/full-analysis-of-flames-command-control-servers/34216/, 2012. Accessed: July 2019.
C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.
GAS-Obfuscation. https://github.com/defuse/gas-obfuscation, 2016. Accessed: March, 2017.
PELock is a software security solution designed for protection of any 32 bit Windows applications. https://www.pelock.com/, 2016. Accessed: January, 2016.
Sourceforge. http://sourceforge.net, 2016. Accessed: February, 2019.
The Codeproject repository. http://www.codeproject.com/, 2016. Accessed: July 2019.
The Z table. http://www.stat.ufl.edu/athienit/Tables/Ztable.pdf, 2016. Accessed: February 2017.
Tracelet system. https://github.com/Yanivmd/TRACY, 2016. Accessed: February, 2018.
Nynaeve: Adventure in Windows debugging and reverse enigineering. http://www.nynaeve.net/, 2017. Accessed: March, 2017.
Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.
DARPA: Cyber Grand Challenge. https://cgc.darpa.mil/, 2019. Accessed: June 2019.
EXEINFO PE. http://exeinfo.atwebpages.com/, 2019. Accessed: June 2019.
Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.
Tigress, a Diversifying Virtualizer/Obfuscator for the C language. http://tigress.cs.arizona.edu/, 2019. Accessed: June 2019.
Shahid Alam, R Nigel Horspool, Issa Traore, and Ibrahim Sogukpinar. A framework for metamorphic malware analysis and real-time detection. Computers & Security, 48:212–233, 2015.
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.
Boldizsár Bencsáth. Duqu, Flame, Gauss: Followers of Stuxnet, 2012.
Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Mark Felegyhazi. The cousins of stuxnet: Duqu, flame, and gauss. Future Internet, 4(4):971–1003, 2012.
Bencsáth, B and Buttyán, L and Félegyházi, M. sKyWIper (aka Flame aka Flamer): A Complex Malware for Targeted Attacks. Technical report, Laboratory of Cryptography and System Security (CrySyS Lab), Department of Telecommunications, Budapest University of Technology and Economics, 2012.
Daniel Bilar. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2):156–168, 2007.
Morton B Brown and Wilfrid Joseph Dixon. BMDP statistical software. Univ. of California Press, 1983.
Shuang Cang and Derek Partridge. Feature ranking and best feature subset using mutual information. Neural Computing & Applications, 13(3):175–184, 2004.
Silvio Cesare, Yang Xiang, and Wanlei Zhou. Control flow-based malware variantdetection. IEEE Transactions on Dependable and Secure Computing (TDSC), 11(4):307–317, 2014.
Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. Identifying dormant functionality in malware programs. In IEEE Symposium on Security and Privacy (S&P), pages 61–76. IEEE, 2010.
Scott A Czepiel. Maximum likelihood estimation of logistic regression models: theory and implementation. Available at czep. net/stat/mlelr. pdf, 2002.
Yaniv David and Eran Yahav. Tracelet-based code search in executables. ACM SIGPLAN Notices, 49(6):349–360, 2014.
Jesse Davis and Mark Goadrich. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.
José Gaviria de la Puerta, Borja Sanz, Igor Santos, and Pablo García Bringas. Using Dalvik Opcodes for Malware Detection on Android. In Hybrid Artificial Intelligent Systems, pages 416–426. Springer, 2015.
Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.
Mohammad Reza Farhadi, Benjamin CM Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. Scalable code clone search for malware analysis. Digital Investigation, 15:46–60, 2015.
Eric Filiol and Sébastien Josse. A statistical model for undecidable viral detection. Journal in Computer Virology, 3(2):65–74, 2007.
Martin Fowler. Refactoring: improving the design of existing code. Pearson Education India, 1999.
Carlos Gañán, Orcun Cetin, and Michel van Eeten. An empirical analysis of Zeus C&C lifetime. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pages 97–108. ACM, 2015.
Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines, pages 129–143. Springer, 2003.
Ilfak Guilfanov. Fast library identification and recognition technology. Liège, Belgium: DataRescue, 1997.
Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.
Jiyong Jang, Maverick Woo, and David Brumley. Towards automatic software lineage inference. In USENIX Security Symposium (USENIX Security 13), pages 81–96, 2013.
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.
Wei Ming Khoo. Decompilation as search. Technical report, University of Cambridge, Computer Laboratory, 2013.
Wei Ming Khoo, Alan Mycroft, and Ross Anderson. Rendezvous: a search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 329–338. IEEE Press, 2013.
Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic worm detection using structural information of executables. In International Workshop on Recent Advances in Intrusion Detection (RAID), pages 207–226. Springer, 2005.
Kaspersky Lab. Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected. http://newsroom.kaspersky.eu/fileadmin/user_upload/en/Images/Lifestyle/20120611_Kaspersky_Lab_Press_Release_Flame_Stuxnet_cooperation_final_-_UK.pdf, 2012. Accessed: February, 2018.
Da Lin and Mark Stamp. Hunting for undetectable metamorphic viruses. Journal in computer virology, 7(3):201–214, 2011.
Martina Lindorfer, Alessandro Di Federico, Federico Maggi, Paolo Milani Comparetti, and Stefano Zanero. Lines of malicious code: insights into the malicious software industry. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC), pages 349–358. ACM, 2012.
Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389–400. ACM, 2014.
Ryan McDonald and Fernando Pereira. Identifying gene and protein mentions in text using conditional random fields. BMC bioinformatics, 6(1):1, 2005.
Jason Milletary. Citadel trojan malware analysis. Dell SecureWorks Counter Threat Unit Intelligence Services, pages 10–18, 2012.
Ned Moran and James Bennett. Supply Chain Analysis: From Quartermaster to Sun-shop, volume 11. FireEye Labs, 2013.
Ginger Myles and Christian Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, pages 314–318. ACM, 2005.
Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. SARVAM: Search and retrieval of malware. In Worshop on Next Generation Malware Attacks and Defense (NGMAD), 2013.
Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(8):1226–1238, 2005.
Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation, 14:S146–S155, 2015.
Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.
Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.
Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. Detecting code clones in binary executables. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 117–128. ACM, 2009.
Marc Shapiro and Susan Horwitz. The effects of the precision of pointer analysis. In Static Analysis, pages 16–34. Springer, 1997.
Mark Stamp. A revealing introduction to hidden markov models. Department of Computer Science San Jose State University, 2004.
Annie H Toderici and Mark Stamp. Chi-squared distance and metamorphic virus detection. Journal of Computer Virology and Hacking Techniques, 9(1):1–14, 2013.
S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. The Journal of Machine Learning Research, 11:1201–1242, 2010.
Chaitanya Yavvari, Arnur Tokhtabayev, Huzefa Rangwala, and Angelos Stavrou. Malware characterization using behavioral components. In International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security, pages 226–239. Springer, 2012.
Yanfang Ye, Tao Li, Yong Chen, and Qingshan Jiang. Automatic malware categorization using cluster ensemble. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 95–104. ACM, 2010.
Yijia Zhang, Hongfei Lin, Zhihao Yang, and Yanpeng Li. Neighborhood hash graph kernel for protein–protein interaction extraction. Journal of biomedical informatics, 44(6):1086–1092, 2011.
Yijia Zhang, Hongfei Lin, Zhihao Yang, Jian Wang, and Yanpeng Li. Hash subgraph pairwise kernel for protein-protein interaction extraction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(4):1190–1202, 2012.
Eric R Ziegel. Probability and Statistics for Engineering and the Sciences. Technometrics, 2012.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Alrabaee, S. et al. (2020). Free Open-Source Software Fingerprinting. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-34238-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34237-1
Online ISBN: 978-3-030-34238-8
eBook Packages: Computer ScienceComputer Science (R0)