Skip to main content
Log in

On the Feasibility of Anomaly Detection with Fine-Grained Program Tracing Events

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

The efficacy of anomaly detection is fundamentally limited by the descriptive power of the input events. Today’s anomaly detection systems are optimized for coarse-grained events of specific types such as system logs and API traces. An attack can evade detection by avoiding noticeable manifestations in the coarse-grained events. Intuitively, we may fix the loopholes by reducing the event granularity, but this brings up two obvious challenges. First, fine-grained events may not have the rich semantics needed for feature construction. Second, the anomaly detection algorithms may not scale for the volume of the fine-grained events. We propose the application profile extractor (APE) that utilizes compression-based sequential pattern mining to generate compact profiles from fine-grained program traces for anomaly detection algorithms. With minimal assumptions on the event semantics, the profile generation are compatible with a wide variety of program traces. In addition, the compact profiles scale anomaly detection algorithms for the high data rate of fine-grained program tracing. We also outline scenarios that justify the need for anomaly detection with fine-grained program tracing events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Wang, Q., Hassan, W.U., Li, D., Jee, K., Yu, X., Zou, K., Rhee, J., Chen, Z., Cheng, W., Gunter, C. et al.: You are what you do: hunting stealthy malware via data provenance analysis. In: Symposium on Network and Distributed System Security (NDSS). (2020)

  2. Wang, J., Tang, Y., He, S., Zhao, C., Sharma, P.K., Alfarraj, O., Tolba, A.: Logevent2vec: logevent-to-vector based anomaly detection for large-scale logs in internet of things. Sensors 20(9), 2451 (2020)

    Article  Google Scholar 

  3. Sekar, R., Bendre, M., Dhurjati, D., Bollineni, P.: A fast automaton-based method for detecting anomalous program behaviors. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy. S&P 2001. IEEE, pp. 144–155 (2000)

  4. Feng, H.H., Kolesnikov, O.M., Fogla, P., Lee, W., Gong, W.: Anomaly detection using call stack information. In: Proceedings of the 2003 IEEE Symposium on Security and Privacy. IEEE, pp. 62–75 (2003)

  5. Gao, D., Reiter, M.K., Song, D.: On gray-box program tracking for anomaly detection. In: Proceedings of the 13th USENIX Security Symposium. USENIX (2004)

  6. Shu, X., Yao, D.D., Ryder, B.G.: A formal framework for program anomaly detection. In: International Symposium on Recent Advances in Intrusion Detection. Springer, pp. 270–292 (2015)

  7. Parampalli, C., Sekar, R., Johnson, R.: A practical mimicry attack against powerful system-call monitors. In: Proceedings of the 2008 ACM Symposium on Information, Computer and Communications Security pp. 156–167(2008)

  8. Kawakoya, Y., Iwamura, M., Shioji, E., Hariu, T.: Api chaser: anti-analysis resistant malware analyzer. In: International Workshop on Recent Advances in Intrusion Detection. Springer, pp. 123–143 (2013)

  9. Ma, W., Duan, P., Liu, S., Gu, G., Liu, J.C.: Shadow attacks: automatically evading system-call-behavior based malware detection. J. Comput. Virol. 8(1–2), 1 (2012)

    Article  Google Scholar 

  10. Yason, M.V.: The art of unpacking. Retrieved Feb 12, 2008 (2007)

  11. Ming, J., Xin, Z., Lan, P., Wu, D., Liu, P., Mao, B.: Replacement attacks: automatically impeding behavior-based malware specifications. In: International Conference on Applied Cryptography and Network Security. Springer, pp. 497–517 (2015)

  12. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. 24(5), 823 (2010)

    Article  Google Scholar 

  13. Larus, J.R.: Efficient program tracing. Computer 26(5), 52 (1993)

    Article  Google Scholar 

  14. Intel 64 and ia-32 architectures software developers manual. volume 3 (3a, 3b, 3c & 3d): System programming guide. https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html. Accessed: 16 March 2021

  15. Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. 7(1), 34 (2014)

    Article  MathSciNet  Google Scholar 

  16. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. (CSUR) 43(1), 1 (2010)

    Article  Google Scholar 

  17. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: International Conference on Extending Database Technology, Springer, pp. 1–17 (1996)

  18. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424 (2004)

    Article  Google Scholar 

  19. Le, B., Duong, H., Truong, T., Fournier-Viger, P.: Fclosm, fgensm: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inform. Syst. 53(1), 71 (2017)

    Article  Google Scholar 

  20. Wang, J., Han, J., Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering, IEEE, pp. 79–90 (2004)

  21. The llvm compiler infrastructure. https://llvm.org/. Accessed 30 June 2020

  22. Xml-rpc. http://xmlrpc.com/spec.md. Accessed: 30 June 2020

  23. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175 (1992)

    MathSciNet  Google Scholar 

  24. Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151 (1998)

    Article  Google Scholar 

  25. Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No. 99CB36344) (IEEE), pp. 133–145 (1999)

  26. Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4 (1986)

    Article  Google Scholar 

  27. Apostolico, A., Guerra, C.: The longest common subsequence problem revisited. Algorithmica 2(1–4), 315 (1987)

    Article  MathSciNet  Google Scholar 

  28. Akgül, M.: In: Combinatorial optimization. Springer, pp. 85–122 (1992)

  29. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83 (1955)

    Article  MathSciNet  Google Scholar 

  30. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19(2), 263 (1993)

    Google Scholar 

  31. Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268 (1973)

    Article  MathSciNet  Google Scholar 

  32. Overview of linux capabilities. https://man7.org/linux/man-pages/man7/capabilities.7.html. Accessed 01 July 2021

  33. Zhao, Y., Liang, R., Chen, X., Zou, J.: Evaluation indicators for open-source software: a review. Cybersecurity 4(1), 20 (2021)

    Article  Google Scholar 

  34. Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy. IEEE, pp. 120–128 (1996)

  35. Moon, D., Pan, S.B., Kim, I.: Host-based intrusion detection system for secure human-centric computing. J. Supercomput. 72(7), 2520 (2016)

    Article  Google Scholar 

  36. Peisert, S., Bishop, M., Karin, S., Marzullo, K.: Analysis of computer intrusions using sequences of function calls. IEEE Trans. Dependable Secure Comput. 4(2), 137 (2007)

    Article  Google Scholar 

  37. Abreu, R., Bobrow, D.G., Eldardiry, H., Feldman, A., Hanley, J., Honda, T., de Kleer, J., Perez, A., Archer, D., Burke, D.: Diagnosing advanced persistent threats: a position paper. In: DX@ Safeprocess, pp. 193–200 (2015)

  38. Xu, K., Tian, K., Yao, D., Ryder, B.G.: A sharper sense of self: probabilistic reasoning of program behaviors for anomaly detection with context sensitivity. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE, pp. 467–478 (2016)

  39. Giffin, J.T., Jha, S., Miller, B.P.: Efficient context-sensitive intrusion detection. In: NDSS (2004)

  40. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  41. Mao, J., Wang, R., Chen, Y., Xiao, Y., Jia, Y., Liang, Z.: A function-level behavior model for anomalous behavior detection in hybrid mobile applications. In: 2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI). IEEE, pp. 497–505 (2016)

  42. Mao, J., Bian, J., Bai, G., Wang, R., Chen, Y., Xiao, Y., Liang, Z.: Detecting malicious behaviors in Javascript applications. IEEE Access 6, 12284 (2018)

    Article  Google Scholar 

  43. Yoon, M.K., Mohan, S., Choi, J., Sha, L.: Memory heat map: anomaly detection in real-time embedded systems using memory behavior. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, pp. 1–6 (2015)

  44. Marín, G., Casas, P.,  apdehourat, G.: Rawpower: Deep learning based anomaly detection from raw network traffic measurements. In: Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, pp. 75–77 (2018)

  45. Zhou, L., Shu, J., Jia, X.: Collaborative anomaly detection in distributed SDN. In: GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, pp. 1–6 (2020)

  46. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139 (2020)

    Article  MathSciNet  Google Scholar 

  47. Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.: Holmes: real-time apt detection through correlation of suspicious information flows. In: Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1137–1152 (2019)

  48. Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 116–127 (2007)

  49. Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. (TOCS) 32(2), 1 (2014)

    Article  Google Scholar 

  50. Zhu, D., Jung, J., Song, D., Kohno, T., Wetherall, D.: Tainteraser: protecting sensitive data leaks using application-level taint tracking. ACM SIGOPS Oper. Syst. Rev. 45(1), 142 (2011)

    Article  Google Scholar 

  51. Pin: a dynamic binary instrumentation tool. https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool. Accessed 30 June 2020

  52. Kemerlis, V.P., Portokalidis, G., Jee, K., Keromytis, A.D.: Libdft: practical dynamic data flow tracking for commodity systems. In: Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments, pp. 121–132 (2012)

  53. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Not. 42(6), 89 (2007)

    Article  Google Scholar 

  54. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54 (2017)

    Google Scholar 

  55. Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16 (2016)

    Article  Google Scholar 

  56. Liangboonprakong, C., Sornil, O.: Classification of malware families based on n-grams sequential pattern features. In: 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA) IEEE, pp. 777–782 (2013)

  57. Yuan, E., Malek, S.: Mining software component interactions to detect security threats at the architectural level. In: 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA). IEEE, pp. 211–220 (2016)

  58. Homayoun, S., Dehghantanha, A., Ahmadzadeh, M., Hashemi, S., Khayami, R.: Know abnormal, find evil: frequent pattern mining for ransomware threat hunting and intelligence. In: EEE Transactions on Emerging Topics in Computing (2017)

  59. Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218 (2016)

    Article  Google Scholar 

  60. Husák, M., Kašpar, J., Bou-Harb, E., Čeleda, P.: On the sequential pattern and rule mining in the analysis of cyber security alerts. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, pp. 1–10 (2017)

  61. Agrawal, R., Srikant, R. et al.: Fast algorithms for mining association rules. In: Proceeding of the 20th International Conference of Very Large Data Bases, VLDB, vol. 1215 Citeseer, vol. 1215, pp. 487–499 (1994)

Download references

Acknowledgements

The work was supported by Ministry of Science and Technology of the Republic of China (Grant Nos: 110-2628-E-A49-004, 109-2221-E-001-019-MY3 and 110-2218-E-001-001-MBK) and Academia Sinica (Grant No: AS-KPQ-109-DSTCP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Sung Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Detailed Results on APE Effectiveness

Detailed Results on APE Effectiveness

Note: “Config.” means configuration. “Max F1 Score” is the maximum F1 score among the execution results. The recall and precision corresponding to the operating point at max F1 score are also presented (Tables 10, 11, 12, and 13) .

Table 10 ROC area, max F1 score, recall, and precision for bash
Table 11 ROC area, max F1 score, recall, and precision for git
Table 12 ROC area, max F1 score, recall, and precision for nginx
Table 13 ROC area, max F1 score, recall, and precision for bwapp

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, HW., Wu, YS. & Huang, Y. On the Feasibility of Anomaly Detection with Fine-Grained Program Tracing Events. J Netw Syst Manage 30, 28 (2022). https://doi.org/10.1007/s10922-021-09635-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-021-09635-3

Keywords

Navigation