Skip to main content

L(a)ying in (Test)Bed

How Biased Datasets Produce Impractical Results for Actual Malware Families’ Classification

  • Conference paper
  • First Online:
Information Security (ISC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11723))

Included in the following conference series:

Abstract

The number of malware variants released daily turned manual analysis into an impractical task. Although potentially faster, automated analysis techniques (e.g., static and dynamic) have shortcomings that are exploited by malware authors to thwart each of them, i.e., prevent malicious software from being detected or classified accordingly. Researchers then invested in traditional machine learning algorithms to try to produce efficient, effective classification methods. The produced models are also prone to errors and attacks. Novel representations of the “subject” were proposed to overcome previous limitations, such as malware textures. In this paper, our initial proposal was to evaluate the application of texture analysis for malware classification using samples collected in-the-wild in order to compare them with state-of-the-art results. During our tests, we discovered that texture analysis may be unfeasible for the task at hand, if we use the same malware representation employed by other authors. Furthermore, we also discovered that naive premises associated to the selection of samples in the datasets caused the introduction of biases that, in the end, produced unreal results. Finally, our tests with a broader unfiltered dataset show that texture analysis may be impractical for correct malware classification in a real world scenario, in which there is a great variety of families and some of them make use of quite sophisticate obfuscation techniques.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Additional information about samples will be available after acceptance to do not violate the conference blindness requirement.

References

  1. Al-Anezi, M.M.K.: Generic packing detection using several complexity analysis for accurate malware detection. Int. J. Adv. Comput. Sci. 5(1) (2014)

    Google Scholar 

  2. Awad, R.A., Sayre, K.D.: Automatic clustering of malware variants. In: Intelligence and Security Informatics (ISI), pp. 298–303. IEEE (2016)

    Google Scholar 

  3. Bertolini, D., Oliveira, L.S., Justino, E., Sabourin, R.: Texture-based descriptors for writer identification and verification. Expert Syst. Appl. 40, 2069–2080 (2013)

    Article  Google Scholar 

  4. Conti, G., et al.: Automated mapping of large binary objects using primitive fragment type classification. Digit. Investig. 7, S3–S12 (2010)

    Article  Google Scholar 

  5. Costa, Y.M., Oliveira, L., Koerich, A.L., Gouyon, F., Martins, J.: Music genre classification using LBP textural features. Signal Process. 92, 2723–2737 (2012)

    Article  Google Scholar 

  6. Damodaran, A., Di Troia, F., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hack. Tech. 13, 1–12 (2017)

    Article  Google Scholar 

  7. Kabanga, E.K., Kim, C.H.: Malware images classification using convolutional neural network. J. Comput. Commun. 6, 153 (2017)

    Article  Google Scholar 

  8. Kosmidis, K., Kalloniatis, C.: Machine learning and images for malware detection and classification. In: Pan-Hellenic Conference on Informatics. ACM (2017)

    Google Scholar 

  9. Laks: Sarvam blog (2014). http://sarvamblog.blogspot.com.br

  10. Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13

    Chapter  Google Scholar 

  11. Luo, J.S., Lo, D.C.T.: Binary malware image classification using machine learning with local binary pattern. In: IEEE Big Data (2017)

    Google Scholar 

  12. Makandar, A., Patrot, A.: Malware analysis and classification using artificial neural network. In: I-TACT (2015)

    Google Scholar 

  13. Makandar, A., Patrot, A.: An approach to analysis of malware using supervised learning classification. In: International Conference on Recent Trends in Engineering, Science and Technology (2016)

    Google Scholar 

  14. Makandar, A., Patrot, A.: Malware class recognition using image processing techniques. In: ICDMAI (2017)

    Google Scholar 

  15. Makandar, A., Patrot, A.: Malware image analysis and classification using support vector machine. Int. J. Trends CS Eng. 4, 01–03 (2015)

    Google Scholar 

  16. Makandar, A., Patrot, A.: Wavelet statistical feature based malware class recognition and classification using supervised learning classifier. Orient. J. CS Technol. 10, 400–406 (2017)

    Article  Google Scholar 

  17. Makandar, A., Patrot, A.: Trojan malware image pattern classification. In: Guru, D.S., Vasudev, T., Chethan, H.K., Sharath Kumar, Y.H. (eds.) Proceedings of International Conference on Cognition and Recognition. LNNS, vol. 14, pp. 253–262. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5146-3_24

    Chapter  Google Scholar 

  18. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 23rd Annual Computer Security Applications Conference (2007)

    Google Scholar 

  19. Nataraj, L.: A signal processing approach to malware analysis. UCSB (2015)

    Google Scholar 

  20. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: International Symposium on Visualization for Cyber Security. ACM (2011)

    Google Scholar 

  21. Nataraj, L., Kirat, D., Manjunath, B., Vigna, G.: SARVAM: search and retrieval of malware. In: ACSAC NGMAD (2013)

    Google Scholar 

  22. Nataraj, L., Yegneswaran, V., Porras, P., Zhang, J.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Workshop on Security and AI. ACM (2011)

    Google Scholar 

  23. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)

    Article  Google Scholar 

  24. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)

    Article  Google Scholar 

  25. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. ML Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Rezende, E., Ruppert, G., Carvalho, T., Ramos, F., de Geus, P.: Malicious software classification using transfer learning of ResNet-50 deep neural network. In: ICMLA (2017)

    Google Scholar 

  27. Rezende, E., Ruppert, G., Carvalho, T., Theophilo, A., Ramos, F., Geus, P.: Malicious software classification using VGG16 deep neural network’s bottleneck features. In: Latifi, S. (ed.) Information Technology - New Generations. AISC, vol. 738, pp. 51–59. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77028-4_9

    Chapter  Google Scholar 

  28. Rossow, C., et al.: Prudent practices for designing malware experiments: status quo and outlook. In: S&P. IEEE (2012)

    Google Scholar 

  29. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11

    Chapter  Google Scholar 

  30. Singh, A.: Malware classification using image representation. Master’s thesis. Indian Institute of Technology Kanpur (2017)

    Google Scholar 

  31. Thakare, V.S., Patil, N.N., Sonawane, J.S.: Survey on image texture classification techniques. Int. J. Adv. Technol. 4, 97–104 (2013)

    Article  Google Scholar 

  32. VirusTotal: Virustotal (2017). https://www.virustotal.com/#/home/upload

  33. van der Walt, S., et al.: The scikit-image contributors: scikit-image: image processing in Python. PeerJ (2014)

    Google Scholar 

  34. Yakura, H., Shinozaki, S., Nishimura, R., Oyama, Y., Sakuma, J.: Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In: Conference on Data and Application Security and Privacy, CODASPY 2018. ACM (2018)

    Google Scholar 

  35. Yue, S.: Imbalanced malware images classification: a CNN based approach. CoRR (2017). http://arxiv.org/abs/1708.08042

  36. Zhang, J., Qin, Z., Yin, H., Ou, L., Xiao, S., Hu, Y.: Malware variant detection using opcode image recognition with small training sets. In: ICCCN. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Grégio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beppler, T., Botacin, M., Ceschin, F.J.O., Oliveira, L.E.S., Grégio, A. (2019). L(a)ying in (Test)Bed. In: Lin, Z., Papamanthou, C., Polychronakis, M. (eds) Information Security. ISC 2019. Lecture Notes in Computer Science(), vol 11723. Springer, Cham. https://doi.org/10.1007/978-3-030-30215-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30215-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30214-6

  • Online ISBN: 978-3-030-30215-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics