Skip to main content

TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining

  • 1196 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13407)

Abstract

Host behaviour modelling is widely deployed in today’s corporate environments to aid in the detection and analysis of cyber attacks. Audit logs containing system-level events are frequently used for behavior modeling as they can provide detailed insight into cyber-threat occurrences. However, mapping low-level system events in audit logs to high-level behaviors has been a major challenge in identifying host contextual behavior for the purpose of detecting potential cyber threats. Relying on domain expert knowledge may limit its practical implementation. This paper presents TapTree, an automated process-tree based technique to extract host behavior by compiling system events’ semantic information. After extracting behaviors as system generated process trees, TapTree integrates event semantics as a representation of behaviors. To further reduce pattern matching workloads for the analyst, TapTree aggregates semantically equivalent patterns and optimizes representative behaviors. In our evaluation against a recent benchmark audit log dataset (DARPA OpTC), TapTree employs tree pattern queries and sequential pattern mining techniques to deduce the semantics of connected system events, achieving high accuracy for behavior abstraction and then Advanced Persistent Threat (APT) attack detection. Moreover, we illustrate how to update the baseline model gradually online, allowing it to adapt to new log patterns over time.

Keywords

  • Process tree
  • Behavioral anomaly detection
  • Sequential pattern mining
  • APT detection

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Liu, F., Wen, Y., Zhang D., Jiang, X., Xing, X., Meng, D.: Log2vec: a heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1777–1794 (2019)

    Google Scholar 

  2. Mamun, M., Shi, K.: DeepTaskAPT: insider apt detection using task-tree based deep learning. arXiv preprint arXiv:2108.13989 (2021)

  3. Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285–1298 (2017)

    Google Scholar 

  4. Tatam, M., Shanmugam, B., Azam, S., Kannoorpatti, K.: A review of threat modelling approaches for apt-style attacks. Heliyon 7(1), e05969 (2021)

    CrossRef  Google Scholar 

  5. Lee, K.H., Zhang, X., Xu, D.: LogGC: garbage collecting audit log. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1005–1016 (2013)

    Google Scholar 

  6. Liu, Y., et al.: Towards a timely causality analysis for enterprise security. In: NDSS (2018)

    Google Scholar 

  7. Hossain, M.N., et al.: SLEUTH: real-time attack scenario reconstruction from cots audit data. In: The 26th USENIX Security Symposium, pp. 487–504 (2017)

    Google Scholar 

  8. Zong, B., et al.: Behavior query discovery in system-generated temporal graphs. arXiv preprint arXiv:1511.05911 (2015)

  9. Han, X., Pasquier, T., Bates, A., Mickens, J., Seltzer, M.: UNICORN: runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020)

  10. Zeng, J., Chua, Z.L., Chen, Y., Ji, K., Liang, Z., Mao, J.: WATSON: abstracting behaviors from audit logs via aggregation of contextual semantics. In: Proceedings of the 28th Annual Network and Distributed System Security Symposium, NDSS (2021)

    Google Scholar 

  11. Mamun, M., Lu, R., Gaudet, M.: Tell them from me: an encrypted application profiler. In: Liu, J.K., Huang, X. (eds.) NSS 2019. LNCS, vol. 11928, pp. 456–471. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36938-5_28

    CrossRef  Google Scholar 

  12. Zhang, K., Xu, J., Min, M.R., Jiang, G., Pelechrinis, K., Zhang, H.: Automated it system failure prediction: a deep learning approach. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1291–1300. IEEE (2016)

    Google Scholar 

  13. Zheng, P., Yuan, S., Wu, X., Li, J., Lu, A.: One-class adversarial nets for fraud detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 1286–1293 (2019)

    Google Scholar 

  14. Liu, X., et al.: LogNADS: network anomaly detection scheme based on semantic representation. Future Generation Computer Systems 124, 390–405 (2021)

    CrossRef  Google Scholar 

  15. Nammous, M.K., Saeed, K.: Natural language processing: speaker, language, and gender identification with LSTM. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds.) Advanced Computing and Systems for Security. AISC, vol. 883, pp. 143–156. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3702-4_9

    CrossRef  Google Scholar 

  16. Weir, C., Arantes, R., Hannon, H., Kulseng, M.: Operationally transparent cyber (OpTC) (2021)

    Google Scholar 

  17. Mazzawi, H., et al.: Anomaly detection in large databases using behavioral patterning. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1140–1149. IEEE (2017)

    Google Scholar 

  18. Cochrane, T., Foster, P., Chhabra, V., Lemercier, M., Salvi, C., Lyons, T.: SK-tree: a systematic malware detection algorithm on streaming trees via the signature kernel. arXiv preprint arXiv:2102.07904 (2021)

  19. Kent, A.D.: Comprehensive, multi-source cyber-security events data set. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM, USA (2015)

    Google Scholar 

  20. Wang, Q., et al.: You are what you do: hunting stealthy malware via data provenance analysis. In: NDSS (2020)

    Google Scholar 

  21. Balaban, M., Moshiri, N., Mai, U., Jia, X., Mirarab, S.: TreeCluster: clustering biological sequences using phylogenetic trees. PLoS One 14(8), e0221068 (2019)

    CrossRef  Google Scholar 

  22. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp. 3–14. IEEE (1995)

    Google Scholar 

  23. Mooney, C.H., Roddick, J.F.: Sequential pattern mining-approaches and algorithms. ACM Comput. Surv. (CSUR) 45(2), 1–39 (2013)

    CrossRef  MATH  Google Scholar 

  24. Lesh, N., Zaki, M.J., Ogihara, M.: Mining features for sequence classification. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 342–346 (1999)

    Google Scholar 

  25. Lesh, N., Zaki, M.J., Oglhara, M.: Scalable feature mining for sequential data. IEEE Intell. Syst. Appl. 15(2), 48–56 (2000)

    CrossRef  Google Scholar 

  26. Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12(1), 40–48 (2010)

    CrossRef  Google Scholar 

  27. Shen, Y., Mariconti, E., Vervier, P.A., Stringhini, G.: Tiresias: Predicting security events through deep learning. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 592–605 (2018)

    Google Scholar 

  28. Li, Z., Cheng, X., Sun, L., Zhang, J., Chen, B.: A hierarchical approach for advanced persistent threat detection with attention-based graph neural networks. Secur. Commun. Netw. 2021, Article ID 9961342 (2021). https://doi.org/10.1155/2021/9961342.

  29. Király, F.J., Oberhauser, H.: Kernels for sequentially ordered data. J. Mach. Learn. Res. 20(31), 1–45 (2019)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

We would like to thank the Communications Security Establishment Canada team, especially Dr. Benoit Hamelin for supporting the project and providing the materials needed for this work. A special thanks to Kevin Shi from the University of Windsor for all the support during his co-op term with NRC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mamun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Crown

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mamun, M., Buffett, S. (2022). TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15777-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15776-9

  • Online ISBN: 978-3-031-15777-6

  • eBook Packages: Computer ScienceComputer Science (R0)