Skip to main content

AttacKG: Constructing Technique Knowledge Graph from Cyber Threat Intelligence Reports

  • Conference paper
  • First Online:
Computer Security – ESORICS 2022 (ESORICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13554))

Included in the following conference series:

Abstract

Cyber attacks are becoming more sophisticated and diverse, making attack detection increasingly challenging. To combat these attacks, security practitioners actively summarize and exchange their knowledge about attacks across organizations in the form of cyber threat intelligence (CTI) reports. However, as CTI reports written in natural language texts are not structured for automatic analysis, the report usage requires tedious manual efforts of threat intelligence recovery. Additionally, individual reports typically cover only a limited aspect of attack patterns (e.g., techniques) and thus are insufficient to provide a comprehensive view of attacks with multiple variants.

In this paper, we propose AttacKG to automatically extract structured attack behavior graphs from CTI reports and identify the associated attack techniques. We then aggregate threat intelligence across reports to collect different aspects of techniques and enhance attack behavior graphs into technique knowledge graphs (TKGs).

In our evaluation against real-world CTI reports from diverse intelligence sources, AttacKG effectively identifies 28,262 attack techniques with 8,393 unique Indicators of Compromises. To further verify the accuracy of AttacKG in extracting threat intelligence, we run AttacKG on 16 manually labeled CTI reports. Experimental results show that AttacKG accurately identifies attack-relevant entities, dependencies, and techniques with F1-scores of 0.887, 0.896, and 0.789, which outperforms the state-of-the-art approaches. Moreover, our TKGs directly benefit downstream security practices built atop attack techniques, e.g., advanced persistent threat detection and cyber attack reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We make AttacKG’s implementation publicly available at https://github.com/li-zhenyuan/Knowledge-enhanced-Attack-Graph.

  2. 2.

    https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.1.0.

  3. 3.

    https://spacy.io/api/entityruler.

  4. 4.

    https://github.com/msg-systems/coreferee.

  5. 5.

    https://github.com/ksatvat/EXTRACTOR.

  6. 6.

    https://github.com/mpurba1/TTPDrill-0.3.

References

  1. AlienVault OTX. https://otx.alienvault.com

  2. Atomic Red Team. https://github.com/redcanaryco/atomic-red-team

  3. Cisco Talos Intelligence Group - Comprehensive Threat Intelligence. https://blog.talosintelligence.com/

  4. Extractor. https://github.com/ksatvat/EXTRACTOR

  5. Frankenstein Campaign. https://blog.talosintelligence.com/2019/06/frankenstein-campaign.html

  6. IBM X-Force. https://exchange.xforce.ibmcloud.com/

  7. Introduction to STIX. https://oasis-open.github.io/cti-documentation/stix/intro.html

  8. ioc parser. https://github.com/armbues/ioc_parser

  9. Levenshtein distance. https://en.wikipedia.org/wiki/Levenshtein_distance

  10. mandiant/OpenIOC_1.1. https://github.com/mandiant/OpenIOC_1.1

  11. Microsoft says SolarWinds hackers stole source code for 3 products. https://arstechnica.com/information-technology/2021/02/microsoft-says-solarwinds-hackers-stole-source-code-for-3-products

  12. MITRE ATT &CK®. https://attack.mitre.org/

  13. Multiple Cobalt Personality Disorder. https://blog.talosintelligence.com/2018/07/multiple-cobalt-personality-disorder.html

  14. Security intelligence|Microsoft Security Blog. https://www.microsoft.com/security/blog/security-intelligence/

  15. spaCy. https://spacy.io/

  16. The Top Ten MITRE ATT &CK Techniques. https://www.picussecurity.com/resource/the-top-ten-mitre-attck-techniques

  17. Top MITRE ATT &CK Techniques. https://redcanary.com/threat-detection-report/techniques/

  18. Transparent Computing. https://www.darpa.mil/program/transparent-computing

  19. OceanLotus Campaign (2021). https://www.volexity.com/blog/2020/11/06/oceanlotus-extending-cyber-espionage-operations-through-fake-websites/

  20. Gao, P., Liu, X., Choi, E., et al.: A system for automated open-source threat intelligence gathering and management. In: SIGMOD (2021)

    Google Scholar 

  21. Gao, P., Shao, F., Liu, X., et al.: Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence. ICDE (2021)

    Google Scholar 

  22. Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 129–134. IEEE (2018)

    Google Scholar 

  23. Hossain, M.N., Sheikhi, S., Sekar, R.: Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In: IEEE S &P (2020)

    Google Scholar 

  24. Husari, G., Al-Shaer, E., Ahmed, M., Chu, B., Niu, X.: TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI Sources. In: ACM International Conference Proceeding Series, vol. Part F1325 (2017)

    Google Scholar 

  25. Husari, G., Niu, X., Chu, B., Al-Shaer, E.: Using entropy and mutual information to extract threat actions from cyber threat intelligence. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 1–6. IEEE (2018)

    Google Scholar 

  26. Kurogome, Y., Otsuki, Y., et al.: Eiger: Automated IOC generation for accurate and interpretable endpoint malware detection. In: ACM ACSAC (2019)

    Google Scholar 

  27. Legoy, V., Caselli, M., Seifert, C., Peter, A.: Automated retrieval of att &ck tactics and techniques for cyber threat reports. arXiv preprint arXiv:2004.14322 (2020)

  28. Li, G., Dunn, M., Pearce, P., et al.: Reading the Tea Leaves: A Comparative Analysis of Threat Intelligence. In: Usenix Security Symposium (2019)

    Google Scholar 

  29. Li, Z., Chen, Q.A., Yang, R., Chen, Y.: Threat Detection and Investigation with System-level Provenance Graphs: A Survey. Computer & Security 106 (2021)

    Google Scholar 

  30. Li, Z., Soltani, A., Yusof, A., et al.: Poster: Towards automated and large-scale cyber attack reconstruction with apt reports. In: NDSS’22 Poster Session

    Google Scholar 

  31. Liao, X., Yuan, K., Wang, X., et al.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: CCS (2016)

    Google Scholar 

  32. Milajerdi, S.M., Gjomemo, R., Eshete, B., et al.: Poirot: aligning attack behavior with kernel audit records for cyber threat hunting. In: CCS, November 2019

    Google Scholar 

  33. Mu, D., Cuevas, A., Yang, L., et al.: Understanding the reproducibility of crowd-reported security vulnerabilities. In: Usenix Security Symposium (2018)

    Google Scholar 

  34. Ramnani, R.R., Shivaram, K., Sengupta, S.: Semi-automated information extraction from unstructured threat advisories. In: Proceedings of the 10th Innovations in Software Engineering Conference, pp. 181–187 (2017)

    Google Scholar 

  35. Satvat, K., Gjomemo, R., Venkatakrishnan, V.: Extractor: Extracting attack behavior from threat reports. In: IEEE EuroS &P (2021)

    Google Scholar 

  36. Uetz, R., Hemminghaus, C., Hackländer, L., et al.: Reproducible and adaptable log data generation for sound cybersecurity experiments. In: Annual Computer Security Applications Conference, pp. 690–705 (2021)

    Google Scholar 

  37. Zhu, Z., Dumitras, T.: ChainSmith: Automatically learning the semantics of malicious campaigns by mining threat intelligence reports. In: IEEE European Symposium on Security and Privacy (2018)

    Google Scholar 

Download references

Acknowledgement

This paper is supported in part by National Science Foundation with the Award Number (FAIN) 2148177, and by the National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity R &D Program (Award No. NRF-NCL-P2-0001) and administered by the National Cybersecurity R &D Directorate. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Chen .

Editor information

Editors and Affiliations

Appendices

A Selecting the Threshold Value

The selection of the threshold value for node/graph alignment scores affects the accuracy and efficiency of AttacKG. Specifically, too low a threshold for graph alignment score could result in premature matching (false positives), while too high could lead to missing reasonable matches (false negatives). For node alignment score, too low a threshold could leave unnecessary alignment candidates and cost longer report analysis time, while too high could lead to false negatives. Thus, there are trade-offs in choosing optimal threshold values. To determine optimal threshold values, we measure the F-score and report analysis time using varying threshold values, as shown in Fig. 5, and select optimal threshold values (0.65 for node alignment, 0.85 for graph alignment) that make each index better at the same time.

Fig. 5.
figure 5

Threshold selection

B Ablation Study of AttacKG

In particular, we first remove part of the attributes in entities: the IoC information and natural language text termed \(AttacKG _{w \setminus o\ IoC\ information}\) and \(AttacKG _{w \setminus o\ natural\ language\ text}\), respectively. Note that unlike the EXTRACTOR’s practice of merging entities, which may result in information loss, we only remove partial entity attributes without sacrificing the structural information of attack graphs. Moreover, we obtain another variant by filtering dependencies in attack graphs termed \(AttacKG _{w \setminus o\ dependencies}\). That is, we predict attack techniques only based on entity sets. Finally, we disable the graph simplification component termed \(AttacKG _{w \setminus o\ graph\ simplification}\).

As different component combinations may affect the distribution of alignment scores, we adjust and choose identification thresholds separately for AttacKG variants in light of the optimal F1-scores. The results are summarized in Table 6. We find that removing any component would degrade AttacKG ’s performance, which well justifies our design choice. Especially, \(AttacKG _{w \setminus o\ dependencies}\) consistently performs the worst across all evaluation metrics. It verifies the substantial influence of graph structures in technique identification.

Table 6. Ablation study of different components used in technique identification.

C Efficiency of AttacKG

Settings. We experimentally compared AttacKG’s efficiency with TTPDrill and Extractor on the 16 CTI report samples mentioned in Sect. 4.1 on a PC with AMD Ryzen 7-4800H Processor 2.9 GHz, 8 Cores, and 16 Gigabytes of memory, running Windows 11 64-bit Professional. The size of the reports used as samples ranges from 61 words to 1029 words, with an average of 278.2 words.

Results. Extractor is the most complex system that consists of multiple NLP models and thus has the highest runtime overhead, taking 239.70 s on average to parse a report. Compared to Extractor, AttacKG adopts a simpler CTI report parsing pipeline. On average, it takes 8.9 s and 15.1 s for graph extraction and technique identification, respectively, totaling 24.0 s. TTPDrill, on the other hand, uses the simplest model without constructing attack graphs and thus is the fastest, taking only 5.9 s on average per report, but at the cost of a high false-positive rate.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Zeng, J., Chen, Y., Liang, Z. (2022). AttacKG: Constructing Technique Knowledge Graph from Cyber Threat Intelligence Reports. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17140-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17139-0

  • Online ISBN: 978-3-031-17140-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics