Skip to main content

Free Open-Source Software Fingerprinting

  • Chapter
  • First Online:
Binary Code Fingerprinting for Cybersecurity

Abstract

This chapter presents an approach to fingerprint free open-source software (FOSS) packages. FOSS package identification is crucial for several important security applications, e.g., digital forensics , software license infringement , and malware detection . However, existing function identification approaches are insufficient for this purpose due to various challenges in applying practical methods of data mining and database searching, especially when the source code is inaccessible. Moreover, the task of automated detection of FOSS packages becomes more complicated with the introduction of obfuscation techniques, the use of different compilers and compilation settings, and software refactoring techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. GitHub-Build software better. https://github.com/, 2011. Accessed: May, 2019.

  2. Full Analysis of Flame’s Command & Control servers. https://securelist.com/full-analysis-of-flames-command-control-servers/34216/, 2012. Accessed: July 2019.

  3. C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.

  4. GAS-Obfuscation. https://github.com/defuse/gas-obfuscation, 2016. Accessed: March, 2017.

  5. PELock is a software security solution designed for protection of any 32 bit Windows applications. https://www.pelock.com/, 2016. Accessed: January, 2016.

  6. Sourceforge. http://sourceforge.net, 2016. Accessed: February, 2019.

  7. The Codeproject repository. http://www.codeproject.com/, 2016. Accessed: July 2019.

  8. The Z table. http://www.stat.ufl.edu/athienit/Tables/Ztable.pdf, 2016. Accessed: February 2017.

  9. Tracelet system. https://github.com/Yanivmd/TRACY, 2016. Accessed: February, 2018.

  10. Nynaeve: Adventure in Windows debugging and reverse enigineering. http://www.nynaeve.net/, 2017. Accessed: March, 2017.

  11. Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.

  12. DARPA: Cyber Grand Challenge. https://cgc.darpa.mil/, 2019. Accessed: June 2019.

  13. EXEINFO PE. http://exeinfo.atwebpages.com/, 2019. Accessed: June 2019.

  14. Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.

  15. Tigress, a Diversifying Virtualizer/Obfuscator for the C language. http://tigress.cs.arizona.edu/, 2019. Accessed: June 2019.

  16. Shahid Alam, R Nigel Horspool, Issa Traore, and Ibrahim Sogukpinar. A framework for metamorphic malware analysis and real-time detection. Computers & Security, 48:212–233, 2015.

    Article  Google Scholar 

  17. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.

    Article  Google Scholar 

  18. Boldizsár Bencsáth. Duqu, Flame, Gauss: Followers of Stuxnet, 2012.

    Google Scholar 

  19. Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Mark Felegyhazi. The cousins of stuxnet: Duqu, flame, and gauss. Future Internet, 4(4):971–1003, 2012.

    Article  Google Scholar 

  20. Bencsáth, B and Buttyán, L and Félegyházi, M. sKyWIper (aka Flame aka Flamer): A Complex Malware for Targeted Attacks. Technical report, Laboratory of Cryptography and System Security (CrySyS Lab), Department of Telecommunications, Budapest University of Technology and Economics, 2012.

    Google Scholar 

  21. Daniel Bilar. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2):156–168, 2007.

    Article  Google Scholar 

  22. Morton B Brown and Wilfrid Joseph Dixon. BMDP statistical software. Univ. of California Press, 1983.

    Google Scholar 

  23. Shuang Cang and Derek Partridge. Feature ranking and best feature subset using mutual information. Neural Computing & Applications, 13(3):175–184, 2004.

    Article  Google Scholar 

  24. Silvio Cesare, Yang Xiang, and Wanlei Zhou. Control flow-based malware variantdetection. IEEE Transactions on Dependable and Secure Computing (TDSC), 11(4):307–317, 2014.

    Article  Google Scholar 

  25. Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. Identifying dormant functionality in malware programs. In IEEE Symposium on Security and Privacy (S&P), pages 61–76. IEEE, 2010.

    Google Scholar 

  26. Scott A Czepiel. Maximum likelihood estimation of logistic regression models: theory and implementation. Available at czep. net/stat/mlelr. pdf, 2002.

    Google Scholar 

  27. Yaniv David and Eran Yahav. Tracelet-based code search in executables. ACM SIGPLAN Notices, 49(6):349–360, 2014.

    Article  Google Scholar 

  28. Jesse Davis and Mark Goadrich. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.

    Google Scholar 

  29. José Gaviria de la Puerta, Borja Sanz, Igor Santos, and Pablo García Bringas. Using Dalvik Opcodes for Malware Detection on Android. In Hybrid Artificial Intelligent Systems, pages 416–426. Springer, 2015.

    Google Scholar 

  30. Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.

    Google Scholar 

  31. Mohammad Reza Farhadi, Benjamin CM Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. Scalable code clone search for malware analysis. Digital Investigation, 15:46–60, 2015.

    Article  Google Scholar 

  32. Eric Filiol and Sébastien Josse. A statistical model for undecidable viral detection. Journal in Computer Virology, 3(2):65–74, 2007.

    Article  Google Scholar 

  33. Martin Fowler. Refactoring: improving the design of existing code. Pearson Education India, 1999.

    MATH  Google Scholar 

  34. Carlos Gañán, Orcun Cetin, and Michel van Eeten. An empirical analysis of Zeus C&C lifetime. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pages 97–108. ACM, 2015.

    Google Scholar 

  35. Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines, pages 129–143. Springer, 2003.

    Google Scholar 

  36. Ilfak Guilfanov. Fast library identification and recognition technology. Liège, Belgium: DataRescue, 1997.

    Google Scholar 

  37. Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.

    Google Scholar 

  38. Jiyong Jang, Maverick Woo, and David Brumley. Towards automatic software lineage inference. In USENIX Security Symposium (USENIX Security 13), pages 81–96, 2013.

    Google Scholar 

  39. Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.

    Google Scholar 

  40. Wei Ming Khoo. Decompilation as search. Technical report, University of Cambridge, Computer Laboratory, 2013.

    Google Scholar 

  41. Wei Ming Khoo, Alan Mycroft, and Ross Anderson. Rendezvous: a search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 329–338. IEEE Press, 2013.

    Google Scholar 

  42. Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic worm detection using structural information of executables. In International Workshop on Recent Advances in Intrusion Detection (RAID), pages 207–226. Springer, 2005.

    Google Scholar 

  43. Kaspersky Lab. Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected. http://newsroom.kaspersky.eu/fileadmin/user_upload/en/Images/Lifestyle/20120611_Kaspersky_Lab_Press_Release_Flame_Stuxnet_cooperation_final_-_UK.pdf, 2012. Accessed: February, 2018.

    Google Scholar 

  44. Da Lin and Mark Stamp. Hunting for undetectable metamorphic viruses. Journal in computer virology, 7(3):201–214, 2011.

    Article  Google Scholar 

  45. Martina Lindorfer, Alessandro Di Federico, Federico Maggi, Paolo Milani Comparetti, and Stefano Zanero. Lines of malicious code: insights into the malicious software industry. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC), pages 349–358. ACM, 2012.

    Google Scholar 

  46. Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389–400. ACM, 2014.

    Google Scholar 

  47. Ryan McDonald and Fernando Pereira. Identifying gene and protein mentions in text using conditional random fields. BMC bioinformatics, 6(1):1, 2005.

    Google Scholar 

  48. Jason Milletary. Citadel trojan malware analysis. Dell SecureWorks Counter Threat Unit Intelligence Services, pages 10–18, 2012.

    Google Scholar 

  49. Ned Moran and James Bennett. Supply Chain Analysis: From Quartermaster to Sun-shop, volume 11. FireEye Labs, 2013.

    Google Scholar 

  50. Ginger Myles and Christian Collberg. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, pages 314–318. ACM, 2005.

    Google Scholar 

  51. Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. SARVAM: Search and retrieval of malware. In Worshop on Next Generation Malware Attacks and Defense (NGMAD), 2013.

    Google Scholar 

  52. Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(8):1226–1238, 2005.

    Article  Google Scholar 

  53. Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation, 14:S146–S155, 2015.

    Article  Google Scholar 

  54. Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.

    Google Scholar 

  55. Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.

    Google Scholar 

  56. Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. Detecting code clones in binary executables. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 117–128. ACM, 2009.

    Google Scholar 

  57. Marc Shapiro and Susan Horwitz. The effects of the precision of pointer analysis. In Static Analysis, pages 16–34. Springer, 1997.

    Google Scholar 

  58. Mark Stamp. A revealing introduction to hidden markov models. Department of Computer Science San Jose State University, 2004.

    Google Scholar 

  59. Annie H Toderici and Mark Stamp. Chi-squared distance and metamorphic virus detection. Journal of Computer Virology and Hacking Techniques, 9(1):1–14, 2013.

    Google Scholar 

  60. S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. The Journal of Machine Learning Research, 11:1201–1242, 2010.

    Google Scholar 

  61. Chaitanya Yavvari, Arnur Tokhtabayev, Huzefa Rangwala, and Angelos Stavrou. Malware characterization using behavioral components. In International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security, pages 226–239. Springer, 2012.

    Google Scholar 

  62. Yanfang Ye, Tao Li, Yong Chen, and Qingshan Jiang. Automatic malware categorization using cluster ensemble. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 95–104. ACM, 2010.

    Google Scholar 

  63. Yijia Zhang, Hongfei Lin, Zhihao Yang, and Yanpeng Li. Neighborhood hash graph kernel for protein–protein interaction extraction. Journal of biomedical informatics, 44(6):1086–1092, 2011.

    Article  Google Scholar 

  64. Yijia Zhang, Hongfei Lin, Zhihao Yang, Jian Wang, and Yanpeng Li. Hash subgraph pairwise kernel for protein-protein interaction extraction. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(4):1190–1202, 2012.

    Article  Google Scholar 

  65. Eric R Ziegel. Probability and Statistics for Engineering and the Sciences. Technometrics, 2012.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alrabaee, S. et al. (2020). Free Open-Source Software Fingerprinting. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34238-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34237-1

  • Online ISBN: 978-3-030-34238-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics