MalCommunity: A Graph-Based Evaluation Model for Malware Family Clustering

Chen, Yihang; Liu, Fudong; Shan, Zheng; Liang, Guanghui

doi:10.1007/978-981-13-2203-7_21

Yihang Chen¹⁴,
Fudong Liu¹⁴,
Zheng Shan¹⁴ &
…
Guanghui Liang¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 901))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

1647 Accesses
4 Citations

Abstract

Malware clustering analysis plays an important role in large-scale malware homology analysis. However, the generation approach of the ground truth data is usually ignored. The Labels from Anti-virus(AV) engines are most commonly used but some of them are inaccurate or inconsistent. To overcome the drawback, many researchers make ground truth data based on voting mechanism such as AVclass, but this method is difficult to evaluate different-granularity clustering results. Graph-based method like VAMO is more robust but it needs to maintain a large-size database. In this paper, we propose a novel evaluation model named MalCommunity based on the graph named Malware Relation Graph. Different from VAMO, the construction of the graph is free from a large-size database and just needs the AV label information of the samples in the test set. We introduce community detection algorithm Fast Newman to divide the sample set and use modularity parameter to measure the target clustering results. The experiment results indicate that our model has the ability of noise immunity of malware family classification inconsistency and granularity inconsistency from AV labels. Our model is also convenient to evaluate different-granularity clustering methods with different heights.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://www.securityweek.com/blackenergy-killdisk-infect-ukrainian-mining-railway-systems
Biggs, J.: Hackers release source code for a powerful DDoS app called Mirai. TechCrunch, 10 October 2016. Accessed 19 Oct 2016
Google Scholar
https://www.fireeye.com/blog/threat-research/2014/10/apt28-a-window-into-russias-cyber-espionage-operations.html
https://www.fireeye.com/blog/threat-research/2017/05/smb-exploited-wannacry-use-of-eternalblue.html
Internet Security Center Qihoo 2015. 2014 Internet Security Research Report in China. http://zt.360.cn/report/
Kingsoft: 2015–2016 Internet Security Research Report in China (2016). http://cn.cmcm.com/news/media/2016-01-14/60.html
Perdisci, R., Manchon, U.: VAMO: towards a fully automated malware clustering validity analysis. In: Computer Security Applications Conference, pp. 329–338 (2012)
Google Scholar
Kirat, D., Nataraj, L., Vigna, G., et al.: Sigmal: A static signal processing based malware triage. In: Proceedings of the 29th Annual Computer Security Applications Conference, pp. 89–98. ACM (2013)
Google Scholar
Drew, J., Moore, T., Hahsler, M.: Polymorphic malware detection using sequence classification methods. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 81–87. IEEE (2016)
Google Scholar
Yakdan, K., Dechand, S., Gerhards-Padilla, E., et al.: Helping Johnny to analyze malware: a usability-optimized decompiler and malware analysis user study. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 158–177. IEEE (2016)
Google Scholar
Xu, Z., Zhang, J., Gu, G., Lin, Z.: GoldenEye: efficiently and effectively unveiling malware’s targeted environment. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 22–45. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_2
Chapter Google Scholar
Hu, X., Kang, G.S.: DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles. In: Computer Security Applications Conference, pp. 79–88 (2013)
Google Scholar
Spensky, C., Hu, H., Leach, K.: LO-PHI: low-observable physical host instrumentation for malware analysis. In: Network and Distributed System Security Symposium (2016)
Google Scholar
Kittel, T., Vogl, S., Kirsch, J., Eckert, C.: Counteracting data-only malware with code pointer examination. In: Bos, H., Monrose, F., Blanc, G. (eds.) RAID 2015. LNCS, vol. 9404, pp. 177–197. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26362-5_9
Chapter Google Scholar
Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20. IEEE (2015)
Google Scholar
Rajab, M.A., Ballard, L., Lutz, N., et al.: CAMP: content-agnostic malware protection. In: Network and Distributed System Security Symposium (2013)
Google Scholar
Invernizzi, L., Miskovic, S., Torres, R., et al.: Nazca: detecting malware distribution in large-scale networks. In: Network and Distributed System Security Symposium (2014)
Google Scholar
Taylor, T., Snow, K.Z., Otterness, N., Monrose, F.: Cache, trigger, impersonate: enabling context-senstive honeyclient analysis on-the-wire. In: Network and Distributed System Security Symposium (2016)
Google Scholar
Li, Z., Alrwais, S., Xie, Y., et al.: Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 112–126. IEEE (2013)
Google Scholar
Kwon, B.J., Mondal, J., Jang, J., et al.: The dropper effect: insights into malware distribution with downloader graph analytics. In: ACM SIGSAC Conference on Computer and Communications Security, pp. 1118–1129. ACM (2015)
Google Scholar
Plohmann, D., Yakdan, K., Klatt, M., et al.: A comprehensive measurement study of domain generating malware. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 263–278. USENIX Association (2016)
Google Scholar
Le Blond, S., Gilbert, C., Upadhyay, U., Gomez-Rodriguez, M., Choffnes, D.R.: A broad view of the ecosystem of socially engineered exploit documents. In: Network and Distributed System Security Symposium (2017)
Google Scholar
http://vxheaven.org/
https://www.virustotal.com/
https://www.av-comparatives.org/
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Chapter Google Scholar
https://github.com/malicialab/avclass
Kirat, D., Vigna, G.: MalGene: automatic extraction of malware analysis evasion signature. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 769–780. ACM (2015)
Google Scholar
Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1524–1533. ACM (2014)
Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., et al.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, NDSS 2009, San Diego, California, USA, February. DBLP (2009)
Google Scholar
Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13
Chapter Google Scholar
Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1_12
Chapter Google Scholar
Newman, M.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. U.S.A. 103(23), 8577–8582 (2006)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2), 107–145 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key of Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450001, Henan, China
Yihang Chen, Fudong Liu, Zheng Shan & Guanghui Liang

Authors

Yihang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fudong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Shan
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fudong Liu .

Editor information

Editors and Affiliations

Zhengzhou University, Zhengzhou, Henan, China
Qinglei Zhou
Zhengzhou University of Light Industry, Zhengzhou, Henan, China
Yong Gan
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin University of Science and Technology, Harbin, China
Xianhua Song
Zhengzhou Institute of Technology, Zhengzhou, China
Yan Wang
National Academy of Guo Ding Institute of Data Science, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Liu, F., Shan, Z., Liang, G. (2018). MalCommunity: A Graph-Based Evaluation Model for Malware Family Clustering. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds) Data Science. ICPCSEE 2018. Communications in Computer and Information Science, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-13-2203-7_21

Download citation

DOI: https://doi.org/10.1007/978-981-13-2203-7_21
Published: 09 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2202-0
Online ISBN: 978-981-13-2203-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics