Abstract
The cyberworld being threatened by continuous imposters needs the development of intelligent methods for identifying threats while keeping in mind all the constraints that can be encountered. Advanced Persistent Threats (APT) have become an important national issue as they secretly steal information over a long period of time. Depending on the objective, adversaries use different tactics throughout the APT campaign to compromise the systems. Therefore, this kind of attack needs immediate attention as such attack tactics are hard to detect for being interleaved with benign activities. Moreover, existing solutions to detect APT attacks are computationally expensive, since keeping track of every system behavior is both costly and challenging. In addition, because of the data imbalance issue that appears due to few malicious events compared to the innumerable benign events in the system, the performance of the existing detection models is affected. In this work, we propose novel machine learning (ML) approaches to classify such attack tactics. More specifically, we convert APT traces into a graph, generate nodes, and eventually graph embeddings, and classify using ML. For ML, we use proposed advanced approaches to address class imbalance issues and compare our approaches with other baseline models and show the effectiveness of our approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Red Canary, September 2020. https://github.com/redcanaryco/atomic-red-team
Ayoade, G., et al.: Evolving advanced persistent threat detection using provenance graph and metric learning. In: 2020 IEEE Conference on Communications and Network Security (CNS), pp. 1–9 (2020)
CALDERA: Caldera. https://github.com/mitre/caldera. Accessed 10 June 2021
Cer, D., et al.: Universal sentence encoder. CoRR abs/1803.11175 (2018)
Chen, C., Shyu, M.: Clustering-based binary-class classification for imbalanced data sets. In: Proceedings of the IEEE International Conference on Information Reuse and Integration, IRI 2011, 3–5 August 2011, Las Vegas, Nevada, USA, pp. 384–389. IEEE Systems, Man, and Cybernetics Society (2011)
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf. Sci. 465, 1–20 (2018)
Endgameinc: Red team automation (RTA). https://github.com/endgameinc/RTA. Accessed 10 June 2021
Gao, Y., Li, Y.F., Chandra, S., Khan, L., Thuraisingham, B.: Towards self-adaptive metric learning on the fly. In: The World Wide Web Conference, pp. 503–513. ACM (2019)
Gao, Y., Li, Y.F., Lin, Y., Aggarwal, C., Khan, L.: SetConv: a new approach for learning from imbalanced data (2021)
Gephi: The open graph viz platform. https://gephi.org/. Accessed 10 June 2021
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 1024–1034. Curran Associates, Inc. (2017)
Han, X., Pasquier, T., Bates, A., Mickens, J., Seltzer, M.: Unicorn: runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020)
Hassan, W., Bates, A., Marino, D.: Tactical provenance analysis for endpoint detection and response systems. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1172–1189. IEEE Computer Society, Los Alamitos, CA, USA, May 2020
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Jiang, X., Walters, A., Xu, D., Spafford, E., Buchholz, F., Wang, Y.M.: Provenance-aware tracing ofworm break-in and contaminations: a process coloring approach. In: 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), p. 38 (2006). https://doi.org/10.1109/ICDCS.2006.69
King, S.T., Chen, P.M.: Backtracking intrusions. SIGOPS Oper. Syst. Rev. 37(5), 223–236 (2003)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Lee, K.H., Zhang, X., Xu, D.: High accuracy attack provenance via binary-based execution partition. In: NDSS (2013)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
Ma, S., Zhang, X., Xu, D.: Protracer: towards practical provenance tracing by alternating between logging and tainting. In: NDSS (2016)
Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.: Holmes: real-time APT detection through correlation of suspicious information flows. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 1137–1152. IEEE (2019)
Myneni, S., et al.: DAPT 2020 - constructing a benchmark dataset for advanced persistent threats. In: Wang, G., Ciptadi, A., Ahmadzadeh, A. (eds.) MLHat 2020. CCIS, vol. 1271, pp. 138–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59621-7_8
Oprea, A., Li, Z., Norris, R., Bowers, K.: Made: security analytics for enterprise threat detection. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 124–136. ACSAC 2018. Association for Computing Machinery, New York, NY, USA (2018)
Pasquier, T., et al.: Practical whole-system provenance capture. In: Proceedings of the 2017 Symposium on Cloud Computing, pp. 405–418. SoCC 2017, ACM, New York, NY, USA (2017)
Pasquier, T., et al.: Runtime analysis of whole-system provenance (2018)
Pasquier, T., et al.: Runtime analysis of whole-system provenance. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1601–1616. CCS 2018, ACM, New York, NY, USA (2018)
Pei, K., et al.: Hercule: attack story reconstruction via community discovery on correlated log graph. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 583–595. ACSAC 2016, Association for Computing Machinery, New York, NY, USA (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Rabanser, S., Shchur, O., Günnemann, S.: Introduction to tensor decompositions and their applications in machine learning. CoRR abs/1711.10781 (2017)
Sheyner, O., Haines, J., Jha, S., Lippmann, R., Wing, J.M.: Automated generation and analysis of attack graphs. In: Proceedings 2002 IEEE Symposium on Security and Privacy, pp. 273–284 (2002)
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Thakur, V.: The sykipot attacks (2011). https://www.symantec.com/connect/blogs/sykipot-attacks
TinkerPop, A.: Apache tinkerpop. https://tinkerpop.apache.org/. Accessed 10 June 2021
Xiang, S., Nie, F., Zhang, C.: Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn. 41(12), 3600–3612 (2008)
Acknowledgements
The research reported herein was supported in part by NIST award 60NANB20D178, NSF awards DMS-1737978, DGE-2039542; and an IBM faculty award (Research).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Akbar, K.A., Wang, Y., Islam, M.S., Singhal, A., Khan, L., Thuraisingham, B. (2021). Identifying Tactics of Advanced Persistent Threats with Limited Attack Traces. In: Tripathy, S., Shyamasundar, R.K., Ranjan, R. (eds) Information Systems Security. ICISS 2021. Lecture Notes in Computer Science(), vol 13146. Springer, Cham. https://doi.org/10.1007/978-3-030-92571-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-92571-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92570-3
Online ISBN: 978-3-030-92571-0
eBook Packages: Computer ScienceComputer Science (R0)