Abstract
Advanced Persistent Threats (APTs) are among the most sophisticated threats facing critical organizations worldwide. APTs employ specific tactics, techniques, and procedures (TTPs) which make them difficult to detect in comparison to frequent and aggressive attacks. In fact, current network intrusion detection systems struggle to detect APTs communications, allowing such threats to persist unnoticed on victims’ machines for months or even years.
In this paper, we present EarlyCrow, an approach to detect APT malware command and control over HTTP(S) using contextual summaries. The design of EarlyCrow is informed by a novel threat model focused on TTPs present in traffic generated by tools recently used as part of APT campaigns. The threat model highlights the importance of the context around the malicious connections, and suggests traffic attributes which help APT detection. EarlyCrow defines a novel multipurpose network flow format called PairFlow, which is leveraged to build the contextual summary of a PCAP capture, representing key behavioral, statistical and protocol information relevant to APT TTPs. We evaluate the effectiveness of EarlyCrow on unseen APTs obtaining a headline macro average F1-score of 93.02% with FPR of \(0.74\%\).
Keywords
- Advanced persistent threats
- Network intrusion detection
- Command and control
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
EarlyCrow code, datasets, and experiments are publicly available at [1].
- 2.
- 3.
References
EarlyCrow github repository. https://github.com/ICL-ml4csec/EarlyCrowAPT
Ahmad, A., Webb, J., Desouza, K.C., Boorman, J.: Strategically-motivated advanced persistent threat: definition, process, tactics and a disinformation model of counterattack. Comput. Secur. 86, 402–418 (2019)
Alageel, A., Maffeis, S.: Hawk-Eye: holistic detection of APT command and control domains. In: ACM SAC, pp. 1664–1673. ACM (2021)
Anderson, B., McGrew, D.: Identifying encrypted malware traffic with contextual flow data. In: ACM AISec, pp. 35–46 (2016)
Arp, D., et al.: Dos and don’ts of machine learning in computer security. In: USENIX Security (2022)
AsSadhan, B., Moura, J.M., Lapsley, D., Jones, C., Strayer, W.T.: Detecting botnets using command and control traffic. In: IEEE NCA, pp. 156–162. IEEE (2009)
Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: USENIX Security, pp. 807–822 (2016)
Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: ACSAC, pp. 129–138 (2012)
Bortolameotti, R., et al.: DECANTeR: DEteCtion of anomalous outbouNd HTTP TRaffic by passive application fingerprinting. In: ACSAC, pp. 373–386 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE S &P, pp. 39–57. IEEE (2017)
Clements, J., Yang, Y., Sharma, A., Hu, H., Lao, Y.: Rallying adversarial techniques against deep learning for network security. arXiv preprint arXiv:1903.11688 (2019)
Clements, J., Yang, Y., Sharma, A.A., Hu, H., Lao, Y.: Rallying adversarial techniques against deep learning for network security. In: IEEE SSCI, pp. 01–08. IEEE (2021)
The MITRE Corporation: Application layer protocol: web protocols. https://attack.mitre.org/techniques/T1071/001/. Accessed 18 Dec 2021
The MITRE Corporation: Command and control. https://attack.mitre.org/tactics/TA0011/. Accessed 18 Dec 2021
The MITRE Corporation: Data obfuscation: protocol impersonation. https://attack.mitre.org/techniques/T1001/003/. Accessed 18 Dec 2021
The MITRE Corporation: Dynamic DNS. https://attack.mitre.org/techniques/T1568/. Accessed 18 Dec 2021
The MITRE Corporation: Dynamic resolution: fast flux DNS. https://attack.mitre.org/techniques/T1568/001/. Accessed 18 Dec 2021
The MITRE Corporation: Encrypted channel. https://attack.mitre.org/techniques/T1573/. Accessed 18 Dec 2021
The MITRE Corporation: Fallback channel TTP. https://attack.mitre.org/techniques/T1008/. Accessed 18 Dec 2021
The MITRE Corporation: Non-application layer protocol. https://attack.mitre.org/techniques/T1095/. Accessed 18 Dec 2021
The MITRE Corporation: Protocol tunneling. https://attack.mitre.org/techniques/T1572/. Accessed 18 Dec 2021
Farinholt, B., Rezaeirad, M., McCoy, D., Levchenko, K.: Dark matter: uncovering the DarkComet RAT ecosystem. In: WWW, pp. 2109–2120 (2020)
Farinholt, B., et al.: To catch a ratter: monitoring the behavior of amateur DarkComet RAT operators in the wild. In: IEEE S &P, pp. 770–787. IEEE (2017)
FireEye: Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with sunburst backdoor, 13 December 2020. https://www.mandiant.com/resources/evasive-attacker-leverages-solarwinds-supply-chain-compromises-with-sunburst-backdoor
Han, D., et al.: Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors. IEEE J. Sel. Areas Commun. 39(8), 2632–2647 (2021)
Hashemi, M.J., Cusack, G., Keller, E.: Towards evaluation of NIDSs in adversarial setting. In: ACM Big-DAMA, pp. 14–21 (2019)
Hashemi, M.J., Keller, E.: Enhancing robustness against adversarial examples in network intrusion detection systems. In: IEEE NFV-SDN, pp. 37–43. IEEE (2020)
Heinemeyer, M.: Fin7.5: the infamous cybercrime rig “FIN7” continues its activities. https://securelist.com/fin7-5-the-infamous-cybercrime-rig-fin7-continues-its-activities/90703//. Accessed 18 July 2021
Homoliak, I., Teknøs, M., Ochoa, M., Breitenbacher, D., Hosseini, S., Hanacek, P.: Improving network intrusion detection classifiers by non-payload-based exploit-independent obfuscations: an adversarial approach. EAI Endorsed Trans. Secur. Saf. 5, 17 (2018)
Hu, X., et al.: BAYWATCH: robust beaconing detection to identify infected hosts in large-scale enterprise networks. In: IEEE/IFIP DSN, pp. 479–490. IEEE (2016)
Hutchins, E.M., Cloppert, M.J., Amin, R.M.: Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. In: Leading Issues in Information Warfare & Security Research, vol. 1, p. 80 (2011)
Invernizzi, L., et al.: Nazca: detecting malware distribution in large-scale networks. In: NDSS, vol. 14, pp. 23–26 (2014)
Jansen, W.: Abusing cloud services to fly under the radar. https://research.nccgroup.com/2021/01/12/abusing-cloud-services-to-fly-under-the-radar/. Accessed 18 Dec 2021
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: KDD, pp. 1245–1254. ACM (2009)
Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.: HOLMES: real-time APT detection through correlation of suspicious information flows. In: IEEE S &P, pp. 1137–1152. IEEE (2019)
Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. In: NDSS (2018)
Nelms, T., Perdisci, R., Ahamad, M.: ExecScent: mining for new C &C domains in live networks with adaptive control protocol templates. In: USENIX Security, pp. 589–604 (2013)
Oprea, A., Li, Z., Norris, R., Bowers, K.: MADE: security analytics for enterprise threat detection. In: ACSAC, pp. 124–136 (2018)
Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: IEEE/IFIP DSN, pp. 45–56. IEEE (2015)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: NSDI, vol. 10, p. 14 (2010)
Rezaeirad, M., Farinholt, B., Dharmdasani, H., Pearce, P., Levchenko, K., McCoy, D.: Schrödinger’s RAT: profiling the stakeholders in the remote access trojan ecosystem. In: USENIX Security, pp. 1043–1060 (2018)
Schindler, T.: Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats. In: GI-Jahrestagung (2017)
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: IEEE S &P, pp. 305–316. IEEE (2010)
Stinson, E., Mitchell, J.C.: Towards systematic evaluation of the evadability of bot/botnet detection methods. In: WOOT, vol. 8, pp. 1–9 (2008)
Szegedy, C., et al.: Intriguing properties of neural networks. CoRR abs/1312.6199 (2014)
Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: BotFinder: finding bots in network traffic without deep packet inspection. In: CoNEXT, pp. 349–360 (2012)
Wang, J., Qixu, L., Di, W., Dong, Y., Cui, X.: Crafting adversarial example to bypass flow- &ML-based botnet detector via RL. In: RAID, pp. 193–204 (2021)
Wang, Z.: Deep learning-based intrusion detection with adversaries. IEEE Access 6, 38367–38384 (2018)
Zhou, Y., Kantarcioglu, M., Thuraisingham, B., Xi, B.: Adversarial support vector machine learning. In: KDD, pp. 1059–1067. ACM (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A PairFlow
A PairFlow
EarlyCrow defines a novel multipurpose network flow format called PairFlow, which is leveraged to build the contextual summary of a PCAP capture, representing key behavioral, statistical and protocol information relevant to APT TTPs. We discuss the details of each component in the following.
1.1 A.1 Tracking
Packets Retrieving. The tracking module identifies all unique pair connections on the network and filters out those using non-IP protocols (Fig. 5,
). For each unique pair connection, PairFlow tracks, bidirectionally, all packets related to a pair. These packets are designated with an initial Flow ID. The Flow ID holds unchanged for all packets during the same time window for a given pair connection. Each packet will maintain its individual index for the aggregation step later. Packets with the same Flow ID may also use different protocols. Therefore, each one has a one hot encoding flag called Encoding Protocol Flag (EPFLAG) used later for further filtering. These flags started with EPFLAG_Protocol, where a protocol is a subset of {TCP, UDP, DNS, ICMP, HTTP, SSL/TLS}.
DNS Requests and Responses. The tracked packets do not include DNS requests and responses, which are responsible for locating the IP address needed to establish a connection. That is due to the pair connection being between the host and the DNS server, which is different than the destination. Similar to [4], to track these DNS packets, a destination of the present pair will be used as a Local PTR to find all DNS response packets from the PCAP repository. Once found, the DNS response resource records will be used to find all related DNS requests. Now, any packets belonging to the pair connection are attached and sorted according to their arrival time. Those packets outside of time window are not included.
1.2 A.2 Aggregation
Header Generation. Besides the individual packet ID from the PCAP, every packet is also designated with a Flow ID composed of a ContextualSummary ID (CSID) and a PairFlow ID (PFID). The former is unique for the lifetime of a pair, while the latter is unique for a time window. Any packets from that PairFlow will always have the same Flow ID. To assign the PFID, the aggregation module will check the ContextualSummary repository to find if the pair has been processed in the past (Fig. 5,
). If so, the incoming PFID will be the last used PFID for the same pair and ContextualSummary ID, incremented by one. Otherwise, a new and unique ContextualSummary will be created, and the PFID will start with zero.
Packets Aggregation. The aggregator module creates a PairFlow to store PairFlow ID, sorted packet index, pair connection, time window, EPFlag, FQDNs, URL, UAs, SSL/TLS settings, and initial flow-based statistics. The initial flow-based statistics include the number of protocol-based packets (i.e., TCP, UDP, ICMP, HTTP, SSL/TLS, DNS packets), total (encrypted) bytes, total (encrypted) bytes sent/received. Time-based statistics include packet Time to Live (TTL) and delta packets interarrival time max/min/median and the flow duration at the same time window. Similar to [6], we separate TCP packets into data and control packets to be used later in the encapsulation process. Finally, preprocessed flows are dispatched to the encapsulation step for further processing.
1.3 A.3 Encapsulation
The encapsulation phase explicitly groups packet behavior, FQDN and URL, HTTP(S) and initial statistical behavior implicit in preprocessed flows in order to make contextual information readily available (Fig. 5,
). The data types involved include list of strings and tuples, Boolean and numeric fields, as shown in Table 5.
Packet Behavior. Packet Behavior encapsulates all packets according to their protocol type (TCP, UDP, and ICMP) in a list of tuples. The first element is the packet index for traceability of a given packet inside the original PCAP for further investigation.
The TCP plane involves the control and data sub-planes as shown in Fig. 5. Each packet in the data sub-plane holds protocol name, request/response and their types, content type, timestamp, and packet length for each packet. For example, an HTTP request packet can be described as (460854, ‘HTTP’, ‘Request’, ‘GET’, ‘Empty Content’, 1066.51, 383) and its response (460895, ‘HTTP’, ‘Response’, 200, ‘text/javascript’, 1066.86, 429). This helps the upper system work on time series traffic and monitor the anomaly for a given PairFlow. Further packet-level statistical analysis such as counting GET/POST, HTTP response types, content analysis can be achieved as described in Sect. 3.3.
The control sub-plane provides the behavior of the initial connections before the data exchange begins, the TCP continuation, or the termination of the TCP connection. For example, when TCP establishes a connection with three-way handshaking, it will summarize SYN, SNYACK, ACK packets as follows (72095, ‘0x02’, 215.73 s, 74), (72126, ‘0x12’, 215.78 s, 70 B), (72127, ‘0x10’, 215.78 s, 66 B). Then it will follow a stream of packets with TCP flag = 0x10 (ACK) until the connection is disconnected with flag FIN. This will be useful for analyzing any problem with time series or monitoring the discontinuity of such a PairFlow as we can see in Sect. 3.4.
UDP plane records all UDP-based packets with protocol name, packet type, timestamp, and packet length. For example, if there are two packets for DNS which are request and response for a specific domain, they will be summarized as follows: (21160, ‘DNS’, ‘DNS Request’, 141.44 s, 75 B), (21219, ‘DNS’, ‘DNS Response’, 141.54, 547 B). ICMP Plane is similar to the UDP plane but for the ICMP only. However, the type and code are reporting ICMP settings for each packet. The plane can be helpful for any classifier detecting ICMP-based attacks.
FQDN and URL. As depicted in Fig. 5, domain list encapsulates all FQDNs related information in a list of tuples. Each tuple holds an FQDN, its A and NS resource records, and the domain age extracted from the WHOIS file. This helps malicious domain detectors, which often rely on FQDN strings, relative DNS zone, and WHOIS files. URL encapsulates each relevant element of URL during a connection in a tuple which includes FQDN, web page filename, the number of parameters, values and fragments, and whether it contains encoded strings or not.
HTTP(S). HTTP encapsulates HTTP-level information for a given connection, in particular, distinct HTTP server names, status codes, content types and UAs. TLS Protocols summarizes the security settings between a client and server. Cipher suites for both client and server are stored in a list. Cipher suites includes the key exchange/agreement (e.g. RSA, Elliptic-curve Diffie-Hellman (ECDH), Elliptic Curve Digital Signature Algorithm (ECDSA)), authentication (e.g. RSA), block/stream ciphers (e.g. AES, RC4) with their block cipher mode (e.g. CBC) and message authentication (e.g. MD5, SHA-x). Extension types are also listed for each connection which summarizes the cipher suite settings such as extended master secret, session tickets, and Elliptic Curve (EC) point formats. Supported Groups are also stored, known as the EC setting (e.g., secp256r1, secp521r1).
Initial Statistical Behavior. A few essential fields are important to be summarized statistically. We calculate max, min, mean packet TTL, delta packets interarrival time, and duration for a given PairFlow. We also calculate the total (encrypted) bytes and the ratio of sent/received (encrypted) bytes. Max, min, median of cipher suites bytes, and server and client extension bytes are also calculated. We also provide a statistical summary of individual protocol number of packets such as raw TCP, raw UDP, ICMP, DNS, HTTP, TLS, and SSL. We summarize statistical fields in Table 5.
1.4 A.4 Variants Extraction
PairFlow processing also exports four variant JSON files which can be used by any external classifier (Fig. 5,
). FQDN.json includes all domains and their hostname lists that have been accessed during a given PairFlow. In addition, resource records such as A, NS are also included and domain age extracted from WHOIS file, which appears to be useful for domain detection [3]. TCP-UDP-ICMP.json is dedicated for those classifiers use time-series for detection [6, 30]. All three planes are presented here in addition to related statistical fields such as packet TTL and delta packets interarrival time. HTTP.json is employed for those interested to detect malicious HTTP connections [30, 38]. Other classifiers may deploy HTTPS.json for detecting encrypted communications without deciphering the traffic [4]. A detailed study of the other variants is left for future work.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alageel, A., Maffeis, S. (2022). EarlyCrow: Detecting APT Malware Command and Control over HTTP(S) Using Contextual Summaries. In: Susilo, W., Chen, X., Guo, F., Zhang, Y., Intan, R. (eds) Information Security. ISC 2022. Lecture Notes in Computer Science, vol 13640. Springer, Cham. https://doi.org/10.1007/978-3-031-22390-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-22390-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22389-1
Online ISBN: 978-3-031-22390-7
eBook Packages: Computer ScienceComputer Science (R0)