Skip to main content
Log in

Dynamic malware detection and phylogeny analysis using process mining

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

In the last years, mobile phones have become essential communication and productivity tools used daily to access business services and exchange sensitive data. Consequently, they also have become one of the biggest targets of malware attacks. New malware is created everyday, most of which is generated as variants of existing malware by reusing its malicious code. This paper proposes an approach for malware detection and phylogeny studying based on dynamic analysis using process mining. The approach exploits process mining techniques to identify relationships and recurring execution patterns in the system call traces gathered from a mobile application in order to characterize its behavior. The recovered characterization is expressed in terms of a set of declarative constraints between system calls and represents a sort of run-time fingerprint of the application. The comparison between the so defined fingerprint of a given application with those of known malware is used to verify: (1) if the application is malware or trusted, (2) in case of malware, which family it belongs to, and (3) how it differs from other known variants of the same malware family. An empirical study conducted on a dataset of 1200 trusted and malicious applications across ten malware families has shown that the approach exhibits a very good discrimination ability that can be exploited for malware detection and malware evolution studying. Moreover, the study has also shown that the approach is robust to code obfuscation techniques increasingly being used by nowadays malware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. http://www.win.tue.nl/declare/declare-miner/.

  2. http://www.cs.waikato.ac.nz/ml/weka/.

  3. APK is the file format for an Android executable application. So, in the paper, we use the term APK as a synonym of Android application.

  4. https://developer.android.com/studio/run/emulator.html.

  5. Visit http://www.xes-standard.org/.

  6. There are two notions of support that can be defined by Declare language: one based on the percentage of constraints activations that leads to a fulfillment (called event-based constraint support) and the other based on the percentage of traces in which the constraint is satisfied (trace-based support). In our context, we considered trace-based support since we are interested in mining the behavior that is shared by all the traces (having assumed that, for different applications, such behavior models the malicious payload).

  7. An excerpt is available at https://github.com/mlbresearch/syscall-traces-dataset.

  8. https://code.google.com/p/signapk/.

  9. https://github.com/faber03/AndroidMalwareEvaluatingTools.

  10. The classifiers are 60 since we have 10 malware families classifiers plus a single “catch-all” classifier, multiplied by 6 learning algorithms.

References

  1. Androguard. https://code.google.com/p/androguard/, last visit 24 November 2014

  2. Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, AISec ’12, pp. 3–14, New York, NY, USA. ACM (2012)

  3. Arora, A., Garg, S., Peddoju, S.K.: Malware detection using network traffic analysis in android based mobile devices. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies (NGMAST), pp. 66–71 (Sept 2014)

  4. Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., Rieck, K.: DREBIN: efficient and explainable detection of android malware in your pocket. In: Proceedings of 21th Annual Network and Distributed System Security Symposium (NDSS) (2014)

  5. Battista, P., Mercaldo, F., Nardone, V., Santone, A., Visaggio, C.A.: Identification of android malware families with model checking. In: International Conference on Information Systems Security and Privacy. SCITEPRESS (2016)

  6. Bernardi, M.L., Cimitile, M., Di Francescomarino, C., Maggi, F.M.: Do activity lifecycles affect the validity of a business rule in a business process? Inf. Syst. 62, 42–59 (2016)

    Article  Google Scholar 

  7. Bernardi, M.L., Cimitile, M., Di Lucca, G.A., Maggi, F.M.: Using declarative workflow languages to develop process-centric web applications. In: 16th IEEE International Enterprise Distributed Object Computing Conference Workshops, EDOC Workshops, Beijing, China, September 10–14, 2012, pp. 56–65 (2012)

  8. Bernardi, M.L., Cimitile, M., Mercaldo, F., Distante, D.: A constraint-driven approach for dynamic malware detection. In: 14th IEEE Annual Conference on Privacy Security and Trust (2016)

  9. Bose, R.P., Maggi, F.M., Aalst, W.M.P.: Enhancing Declare Maps Based on Event Correlations, chapter Business Process Management: 11th International Conference, BPM 2013, Beijing, China, August 26–30, 2013. Proceedings, pp. 97–112. Springer, Berlin (2013)

  10. Burattin, A., Cimitile, M., Maggi, F.M., Sperduti, A.: Online discovery of declarative process models from event streams. IEEE Trans. Serv. Comput. 8(6), 833–846 (2015)

    Article  Google Scholar 

  11. Canfora, G., Mercaldo, F., Visaggio, C.A.: A classifier of malicious android applications. In: 2013 Eighth International Conference on Availability, Reliability and Security (ARES), pp. 607–614 (Sept 2013)

  12. Canfora, G., Di Sorbo, A., Mercaldo, F., Visaggio, C.A.: Obfuscation techniques against signature-based detection: a case study. In: 2015 Mobile Systems Technologies Workshop (MST), pp. 21–26. IEEE (2015)

  13. Canfora, G., Medvet, E., Mercaldo, F., Visaggio, C.A.: Availability, Reliability, and Security in Information Systems: IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2014 and 4th International Workshop on Security and Cognitive Informatics for Homeland Defense, SeCIHD 2014, Fribourg, Switzerland, September 8–12, 2014. Proceedings, chapter Detection of Malicious Web Pages Using System Calls Sequences, pp. 226–238. Springer, Cham (2014)

  14. Canfora, G., Medvet, E., Mercaldo, F., Visaggio, C.A.: Detecting android malware using sequences of system calls. In: Proceedings of the 3rd International Workshop on Software Development Lifecycle for Mobile, DeMobile 2015, pp. 13–20, New York, NY, USA, 2015. ACM (2015)

  15. Carrera, E., Erdélyi, G.: Digital genome mapping—advanced binary malware analysis. In: Virus Bulletin Conference, Vol. 11 (2004)

  16. Chin, E., Felt, A.P., Greenwood, K., Wagner, D.: Analyzing inter-application communication in android. In: Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, MobiSys ’11, pp. 239–252, New York, NY, USA, 2011. ACM (2011)

  17. Enck, W., Octeau, D., McDaniel, P., Chaudhuri, S.: A study of android application security. In: Proceedings of the 20th USENIX Conference on Security, SEC’11, pp. 21–21, Berkeley, CA, USA, 2011. USENIX Association (2011)

  18. Gartner Report of February 2017. http://www.gartner.com/newsroom/id/3609817 (2017)

  19. Hayes, M., Walenstein, A., Lakhotia, A.: Evaluation of malware phylogeny modelling systems using automated variant generation. J. Comput. Virol. 5(4), 335–343 (2008)

    Article  Google Scholar 

  20. Holmes, G., Donkin, A., Witten, I.H.: Weka: A machine learning workbench. In: Proceedings of the Second Australia and New Zealand Conference on Intelligent Information Systems, pp. 357–361. Citeseer (1994)

  21. Isohara, T., Takemori, K., Kubota, A.: Kernel-based behavior analysis for android malware detection. In: Proceedings of the 2011 Seventh International Conference on Computational Intelligence and Security, CIS ’11, pp. 1011–1015, Washington, DC, USA, 2011. IEEE Computer Society (2011)

  22. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  23. Jang, J., Brumley, D., Venkataraman, S.: BitShred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pp. 309–320, New York, NY, USA, 2011. ACM (2011)

  24. Jeong, Y., Lee, H., Cho, S., Han, S., Park, M.: A kernel-based monitoring approach for analyzing malicious behavior on android. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, pp. 1737–1738, New York, NY, USA, 2014. ACM (2014)

  25. Jiang, X., Zhou, Y.: Android Malware. Springer, New York (2013)

    Book  Google Scholar 

  26. Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)

    Article  Google Scholar 

  27. Khoo, W.M., Lió, P.: Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families. In: 2011 First SysSec Workshop (SysSec), pp. 3–10. IEEE (2011)

  28. Ma, J., Dunagan, J., Wang, H.J., Savage, S., Voelker, G.M.: Finding diversity in remote code injection exploits. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, IMC ’06, pp. 53–64, New York, NY, USA, 2006. ACM (2006)

  29. Mobile Threat Report. https://www.f-secure.com/documents/996508/1030743/Threat_Report_H1_2014.pdf, last visit 26 February 2016

  30. Mario, F.M., Bernardi, L., Cimitile, M.: Process mining meets malware evolution: a study of the behavior of malicious code. In: 2015 Fourth International Symposium on Computing and Networking (CANDAR) (Dec 2016)

  31. Mercaldo, F., Nardone, V., Santone, A., Visaggio, C.A.: Download malware? No, thanks. How formal methods can block update attacks. In: Proceedings of the 4th FME Workshop on Formal Methods in Software Engineering, pp. 22–28. ACM (2016)

  32. Mercaldo, F., Nardone, V., Santone, A., Visaggio, C.A.: Ransomware steals your phone. Formal methods rescue it. In: International Conference on Formal Techniques for Distributed Objects, Components, and Systems, pp. 212–221. Springer (2016)

  33. Oberheide, J., Mille, C.: Dissecting the android bouncer. In: SummerCon (2012)

  34. Pesic, M., Schonenberg, H., van der Aalst, W.M.P.: Declare: full support for loosely-structured processes. EDOC 2007, 287–300 (2007)

    Google Scholar 

  35. Picinbono, B.: On deflection as a performance criterion in detection. IEEE Trans. Aerosp. Electron. Syst. 31(3), 1072–1081 (1995)

    Article  Google Scholar 

  36. Rastogi, V., Chen, Y., Jiang, X.: Catch me if you can: evaluating android anti-malware against transformation attacks. IEEE Trans. Inf. Forensics Secur. 9(1), 99–108 (2014)

    Article  Google Scholar 

  37. Rastogi, V., Chen, Y., Jiang, X.: DroidChameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, ASIA CCS ’13, pp. 329–334, New York, NY, USA, 2013. ACM (2013)

  38. Reina, A., Fattori, A., Cavallaro, L.: A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors. In: Proceedings of EuroSec (2013)

  39. Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: Proceedings of the European Intelligence and Security Informatics Conference (2012)

  40. Schmidt, A.-D., Schmidt, H.-G., Clausen, J., Yuksel, K.A., Kiraz, O., Camtepe, A., Albayrak, S.: Enhancing security of linux-based android devices. In: Proceedings of 15th International Linux Kongress (2008)

  41. Spreitzenbarth, M., Freiling, F., Echtler, F., Schreck, T., Hoffmann, J.: Mobile-sandbox: having a deeper look into android applications. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, pp. 1808–1815, New York, NY, USA, 2013. ACM (2013)

  42. Tchakounté, F., Dayang, P.: System calls analysis of malwares on android. Int. J. Sci. Tecnol. (IJST) 2(9), 669–674 (2013)

    Google Scholar 

  43. van der Aalst, W.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  44. van Dongen, B.F., de Medeiros,A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., van der Aalst, W.M.P.: The prom framework: a new era in process mining tool support. In: Proceedings of the 26th International Conference on Applications and Theory of Petri Nets, ICATPN’05, pp. 444–454, Berlin, Heidelberg, 2005. Springer (2005)

  45. Virustotal. https://www.virustotal.com/, last visit 1 March 2016

  46. Walenstein, A., Lakhotia, A.: A transformation-based model of malware derivation. In: 2012 7th International Conference on Malicious and Unwanted Software (MALWARE), pp. 17–25 (Oct 2012)

  47. Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Detecting software theft via system call based birthmarks. In: Proceedings of the 2009 Annual Computer Security Applications Conference, ACSAC ’09, pp. 149–158, Washington, DC, USA, 2009. IEEE Computer Society (2009)

  48. Wei, T.-E., Mao, C.-H., Jeng, A.B., Lee, H.-M., Wang, H.-T., Wu, D.-J.: Android malware detection via a latent network behavior analysis. In: Proceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, TRUSTCOM ’12, pp. 1251–1258, Washington, DC, USA, 2012. IEEE Computer Society (2012)

  49. Xiao, X., Zhang, S., Mercaldo, F., Hu, G., Sangaiah, A.K.: Android malware detection based on system call sequences and LSTM. Multimedia Tools and Applications (Sept 2017)

  50. Yan, L.K., Yin, H.: DroidScope: Seamlessly reconstructing the OS and Dalvik semantic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX Conference on Security Symposium, Security’12, pp. 29–29, Berkeley, CA, USA, 2012. USENIX Association (2012)

  51. Zheng, M., Lee, P.P.C., Lui, J.C.S.: ADAM: an automatic and extensible platform to stress test android anti-virus systems. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. pp. 82–101. Springer (2012)

  52. Zheng, M., Sun, M., Lui, J.C.S.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TRUSTCOM ’13, pp. 163–171, Washington, DC, USA, 2013. IEEE Computer Society (2013)

  53. Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of the Second ACM Conference on Data and Application Security and Privacy, CODASPY ’12, pp. 317–326, New York, NY, USA, 2012. ACM (2012)

  54. Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12, pp. 95–109, Washington, DC, USA, 2012. IEEE Computer Society (2012)

Download references

Acknowledgements

This work has been partially supported by H2020 EU-funded projects NeCS and C3ISP and EIT-Digital Project HII.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Cimitile.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bernardi, M.L., Cimitile, M., Distante, D. et al. Dynamic malware detection and phylogeny analysis using process mining. Int. J. Inf. Secur. 18, 257–284 (2019). https://doi.org/10.1007/s10207-018-0415-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-018-0415-3

Keywords

Navigation