Skip to main content

EarlyCrow: Detecting APT Malware Command and Control over HTTP(S) Using Contextual Summaries

  • 736 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13640)


Advanced Persistent Threats (APTs) are among the most sophisticated threats facing critical organizations worldwide. APTs employ specific tactics, techniques, and procedures (TTPs) which make them difficult to detect in comparison to frequent and aggressive attacks. In fact, current network intrusion detection systems struggle to detect APTs communications, allowing such threats to persist unnoticed on victims’ machines for months or even years.

In this paper, we present EarlyCrow, an approach to detect APT malware command and control over HTTP(S) using contextual summaries. The design of EarlyCrow is informed by a novel threat model focused on TTPs present in traffic generated by tools recently used as part of APT campaigns. The threat model highlights the importance of the context around the malicious connections, and suggests traffic attributes which help APT detection. EarlyCrow defines a novel multipurpose network flow format called PairFlow, which is leveraged to build the contextual summary of a PCAP capture, representing key behavioral, statistical and protocol information relevant to APT TTPs. We evaluate the effectiveness of EarlyCrow on unseen APTs obtaining a headline macro average F1-score of 93.02% with FPR of \(0.74\%\).


  • Advanced persistent threats
  • Network intrusion detection
  • Command and control

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    EarlyCrow code, datasets, and experiments are publicly available at [1].

  2. 2.

  3. 3.


  1. EarlyCrow github repository.

  2. Ahmad, A., Webb, J., Desouza, K.C., Boorman, J.: Strategically-motivated advanced persistent threat: definition, process, tactics and a disinformation model of counterattack. Comput. Secur. 86, 402–418 (2019)

    CrossRef  Google Scholar 

  3. Alageel, A., Maffeis, S.: Hawk-Eye: holistic detection of APT command and control domains. In: ACM SAC, pp. 1664–1673. ACM (2021)

    Google Scholar 

  4. Anderson, B., McGrew, D.: Identifying encrypted malware traffic with contextual flow data. In: ACM AISec, pp. 35–46 (2016)

    Google Scholar 

  5. Arp, D., et al.: Dos and don’ts of machine learning in computer security. In: USENIX Security (2022)

    Google Scholar 

  6. AsSadhan, B., Moura, J.M., Lapsley, D., Jones, C., Strayer, W.T.: Detecting botnets using command and control traffic. In: IEEE NCA, pp. 156–162. IEEE (2009)

    Google Scholar 

  7. Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: USENIX Security, pp. 807–822 (2016)

    Google Scholar 

  8. Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: ACSAC, pp. 129–138 (2012)

    Google Scholar 

  9. Bortolameotti, R., et al.: DECANTeR: DEteCtion of anomalous outbouNd HTTP TRaffic by passive application fingerprinting. In: ACSAC, pp. 373–386 (2017)

    Google Scholar 

  10. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE S &P, pp. 39–57. IEEE (2017)

    Google Scholar 

  11. Clements, J., Yang, Y., Sharma, A., Hu, H., Lao, Y.: Rallying adversarial techniques against deep learning for network security. arXiv preprint arXiv:1903.11688 (2019)

  12. Clements, J., Yang, Y., Sharma, A.A., Hu, H., Lao, Y.: Rallying adversarial techniques against deep learning for network security. In: IEEE SSCI, pp. 01–08. IEEE (2021)

    Google Scholar 

  13. The MITRE Corporation: Application layer protocol: web protocols. Accessed 18 Dec 2021

  14. The MITRE Corporation: Command and control. Accessed 18 Dec 2021

  15. The MITRE Corporation: Data obfuscation: protocol impersonation. Accessed 18 Dec 2021

  16. The MITRE Corporation: Dynamic DNS. Accessed 18 Dec 2021

  17. The MITRE Corporation: Dynamic resolution: fast flux DNS. Accessed 18 Dec 2021

  18. The MITRE Corporation: Encrypted channel. Accessed 18 Dec 2021

  19. The MITRE Corporation: Fallback channel TTP. Accessed 18 Dec 2021

  20. The MITRE Corporation: Non-application layer protocol. Accessed 18 Dec 2021

  21. The MITRE Corporation: Protocol tunneling. Accessed 18 Dec 2021

  22. Farinholt, B., Rezaeirad, M., McCoy, D., Levchenko, K.: Dark matter: uncovering the DarkComet RAT ecosystem. In: WWW, pp. 2109–2120 (2020)

    Google Scholar 

  23. Farinholt, B., et al.: To catch a ratter: monitoring the behavior of amateur DarkComet RAT operators in the wild. In: IEEE S &P, pp. 770–787. IEEE (2017)

    Google Scholar 

  24. FireEye: Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with sunburst backdoor, 13 December 2020.

  25. Han, D., et al.: Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors. IEEE J. Sel. Areas Commun. 39(8), 2632–2647 (2021)

    CrossRef  Google Scholar 

  26. Hashemi, M.J., Cusack, G., Keller, E.: Towards evaluation of NIDSs in adversarial setting. In: ACM Big-DAMA, pp. 14–21 (2019)

    Google Scholar 

  27. Hashemi, M.J., Keller, E.: Enhancing robustness against adversarial examples in network intrusion detection systems. In: IEEE NFV-SDN, pp. 37–43. IEEE (2020)

    Google Scholar 

  28. Heinemeyer, M.: Fin7.5: the infamous cybercrime rig “FIN7” continues its activities. Accessed 18 July 2021

  29. Homoliak, I., Teknøs, M., Ochoa, M., Breitenbacher, D., Hosseini, S., Hanacek, P.: Improving network intrusion detection classifiers by non-payload-based exploit-independent obfuscations: an adversarial approach. EAI Endorsed Trans. Secur. Saf. 5, 17 (2018)

    Google Scholar 

  30. Hu, X., et al.: BAYWATCH: robust beaconing detection to identify infected hosts in large-scale enterprise networks. In: IEEE/IFIP DSN, pp. 479–490. IEEE (2016)

    Google Scholar 

  31. Hutchins, E.M., Cloppert, M.J., Amin, R.M.: Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. In: Leading Issues in Information Warfare & Security Research, vol. 1, p. 80 (2011)

    Google Scholar 

  32. Invernizzi, L., et al.: Nazca: detecting malware distribution in large-scale networks. In: NDSS, vol. 14, pp. 23–26 (2014)

    Google Scholar 

  33. Jansen, W.: Abusing cloud services to fly under the radar. Accessed 18 Dec 2021

  34. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: KDD, pp. 1245–1254. ACM (2009)

    Google Scholar 

  35. Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.: HOLMES: real-time APT detection through correlation of suspicious information flows. In: IEEE S &P, pp. 1137–1152. IEEE (2019)

    Google Scholar 

  36. Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. In: NDSS (2018)

    Google Scholar 

  37. Nelms, T., Perdisci, R., Ahamad, M.: ExecScent: mining for new C &C domains in live networks with adaptive control protocol templates. In: USENIX Security, pp. 589–604 (2013)

    Google Scholar 

  38. Oprea, A., Li, Z., Norris, R., Bowers, K.: MADE: security analytics for enterprise threat detection. In: ACSAC, pp. 124–136 (2018)

    Google Scholar 

  39. Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: IEEE/IFIP DSN, pp. 45–56. IEEE (2015)

    Google Scholar 

  40. Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: NSDI, vol. 10, p. 14 (2010)

    Google Scholar 

  41. Rezaeirad, M., Farinholt, B., Dharmdasani, H., Pearce, P., Levchenko, K., McCoy, D.: Schrödinger’s RAT: profiling the stakeholders in the remote access trojan ecosystem. In: USENIX Security, pp. 1043–1060 (2018)

    Google Scholar 

  42. Schindler, T.: Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats. In: GI-Jahrestagung (2017)

    Google Scholar 

  43. Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: IEEE S &P, pp. 305–316. IEEE (2010)

    Google Scholar 

  44. Stinson, E., Mitchell, J.C.: Towards systematic evaluation of the evadability of bot/botnet detection methods. In: WOOT, vol. 8, pp. 1–9 (2008)

    Google Scholar 

  45. Szegedy, C., et al.: Intriguing properties of neural networks. CoRR abs/1312.6199 (2014)

    Google Scholar 

  46. Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: BotFinder: finding bots in network traffic without deep packet inspection. In: CoNEXT, pp. 349–360 (2012)

    Google Scholar 

  47. Wang, J., Qixu, L., Di, W., Dong, Y., Cui, X.: Crafting adversarial example to bypass flow- &ML-based botnet detector via RL. In: RAID, pp. 193–204 (2021)

    Google Scholar 

  48. Wang, Z.: Deep learning-based intrusion detection with adversaries. IEEE Access 6, 38367–38384 (2018)

    CrossRef  Google Scholar 

  49. Zhou, Y., Kantarcioglu, M., Thuraisingham, B., Xi, B.: Adversarial support vector machine learning. In: KDD, pp. 1059–1067. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Almuthanna Alageel .

Editor information

Editors and Affiliations

A PairFlow

A PairFlow

EarlyCrow defines a novel multipurpose network flow format called PairFlow, which is leveraged to build the contextual summary of a PCAP capture, representing key behavioral, statistical and protocol information relevant to APT TTPs. We discuss the details of each component in the following.

1.1 A.1 Tracking

Packets Retrieving. The tracking module identifies all unique pair connections on the network and filters out those using non-IP protocols (Fig. 5, ). For each unique pair connection, PairFlow tracks, bidirectionally, all packets related to a pair. These packets are designated with an initial Flow ID. The Flow ID holds unchanged for all packets during the same time window for a given pair connection. Each packet will maintain its individual index for the aggregation step later. Packets with the same Flow ID may also use different protocols. Therefore, each one has a one hot encoding flag called Encoding Protocol Flag (EPFLAG) used later for further filtering. These flags started with EPFLAG_Protocol, where a protocol is a subset of {TCP, UDP, DNS, ICMP, HTTP, SSL/TLS}.

DNS Requests and Responses. The tracked packets do not include DNS requests and responses, which are responsible for locating the IP address needed to establish a connection. That is due to the pair connection being between the host and the DNS server, which is different than the destination. Similar to [4], to track these DNS packets, a destination of the present pair will be used as a Local PTR to find all DNS response packets from the PCAP repository. Once found, the DNS response resource records will be used to find all related DNS requests. Now, any packets belonging to the pair connection are attached and sorted according to their arrival time. Those packets outside of time window are not included.

1.2 A.2 Aggregation

Header Generation. Besides the individual packet ID from the PCAP, every packet is also designated with a Flow ID composed of a ContextualSummary ID (CSID) and a PairFlow ID (PFID). The former is unique for the lifetime of a pair, while the latter is unique for a time window. Any packets from that PairFlow will always have the same Flow ID. To assign the PFID, the aggregation module will check the ContextualSummary repository to find if the pair has been processed in the past (Fig. 5, ). If so, the incoming PFID will be the last used PFID for the same pair and ContextualSummary ID, incremented by one. Otherwise, a new and unique ContextualSummary will be created, and the PFID will start with zero.

Packets Aggregation. The aggregator module creates a PairFlow to store PairFlow ID, sorted packet index, pair connection, time window, EPFlag, FQDNs, URL, UAs, SSL/TLS settings, and initial flow-based statistics. The initial flow-based statistics include the number of protocol-based packets (i.e., TCP, UDP, ICMP, HTTP, SSL/TLS, DNS packets), total (encrypted) bytes, total (encrypted) bytes sent/received. Time-based statistics include packet Time to Live (TTL) and delta packets interarrival time max/min/median and the flow duration at the same time window. Similar to [6], we separate TCP packets into data and control packets to be used later in the encapsulation process. Finally, preprocessed flows are dispatched to the encapsulation step for further processing.

Fig. 5.
figure 5

Overview of the PairFlow workflow.

1.3 A.3 Encapsulation

The encapsulation phase explicitly groups packet behavior, FQDN and URL, HTTP(S) and initial statistical behavior implicit in preprocessed flows in order to make contextual information readily available (Fig. 5, ). The data types involved include list of strings and tuples, Boolean and numeric fields, as shown in Table 5.

Packet Behavior. Packet Behavior encapsulates all packets according to their protocol type (TCP, UDP, and ICMP) in a list of tuples. The first element is the packet index for traceability of a given packet inside the original PCAP for further investigation.

The TCP plane involves the control and data sub-planes as shown in Fig. 5. Each packet in the data sub-plane holds protocol name, request/response and their types, content type, timestamp, and packet length for each packet. For example, an HTTP request packet can be described as (460854, ‘HTTP’, ‘Request’, ‘GET’, ‘Empty Content’, 1066.51, 383) and its response (460895, ‘HTTP’, ‘Response’, 200, ‘text/javascript’, 1066.86, 429). This helps the upper system work on time series traffic and monitor the anomaly for a given PairFlow. Further packet-level statistical analysis such as counting GET/POST, HTTP response types, content analysis can be achieved as described in Sect. 3.3.

The control sub-plane provides the behavior of the initial connections before the data exchange begins, the TCP continuation, or the termination of the TCP connection. For example, when TCP establishes a connection with three-way handshaking, it will summarize SYN, SNYACK, ACK packets as follows (72095, ‘0x02’, 215.73 s, 74), (72126, ‘0x12’, 215.78 s, 70 B), (72127, ‘0x10’, 215.78 s, 66 B). Then it will follow a stream of packets with TCP flag = 0x10 (ACK) until the connection is disconnected with flag FIN. This will be useful for analyzing any problem with time series or monitoring the discontinuity of such a PairFlow as we can see in Sect. 3.4.

UDP plane records all UDP-based packets with protocol name, packet type, timestamp, and packet length. For example, if there are two packets for DNS which are request and response for a specific domain, they will be summarized as follows: (21160, ‘DNS’, ‘DNS Request’, 141.44 s, 75 B), (21219, ‘DNS’, ‘DNS Response’, 141.54, 547 B). ICMP Plane is similar to the UDP plane but for the ICMP only. However, the type and code are reporting ICMP settings for each packet. The plane can be helpful for any classifier detecting ICMP-based attacks.

FQDN and URL. As depicted in Fig. 5, domain list encapsulates all FQDNs related information in a list of tuples. Each tuple holds an FQDN, its A and NS resource records, and the domain age extracted from the WHOIS file. This helps malicious domain detectors, which often rely on FQDN strings, relative DNS zone, and WHOIS files. URL encapsulates each relevant element of URL during a connection in a tuple which includes FQDN, web page filename, the number of parameters, values and fragments, and whether it contains encoded strings or not.

HTTP(S). HTTP encapsulates HTTP-level information for a given connection, in particular, distinct HTTP server names, status codes, content types and UAs. TLS Protocols summarizes the security settings between a client and server. Cipher suites for both client and server are stored in a list. Cipher suites includes the key exchange/agreement (e.g. RSA, Elliptic-curve Diffie-Hellman (ECDH), Elliptic Curve Digital Signature Algorithm (ECDSA)), authentication (e.g. RSA), block/stream ciphers (e.g. AES, RC4) with their block cipher mode (e.g. CBC) and message authentication (e.g. MD5, SHA-x). Extension types are also listed for each connection which summarizes the cipher suite settings such as extended master secret, session tickets, and Elliptic Curve (EC) point formats. Supported Groups are also stored, known as the EC setting (e.g., secp256r1, secp521r1).

Initial Statistical Behavior. A few essential fields are important to be summarized statistically. We calculate max, min, mean packet TTL, delta packets interarrival time, and duration for a given PairFlow. We also calculate the total (encrypted) bytes and the ratio of sent/received (encrypted) bytes. Max, min, median of cipher suites bytes, and server and client extension bytes are also calculated. We also provide a statistical summary of individual protocol number of packets such as raw TCP, raw UDP, ICMP, DNS, HTTP, TLS, and SSL. We summarize statistical fields in Table 5.

Table 5. Summary of PairFlow data fields (B: Boolean, LS: List of Strings, LT: List of Tuples, N: Numerical).

1.4 A.4 Variants Extraction

PairFlow processing also exports four variant JSON files which can be used by any external classifier (Fig. 5, ). FQDN.json includes all domains and their hostname lists that have been accessed during a given PairFlow. In addition, resource records such as A, NS are also included and domain age extracted from WHOIS file, which appears to be useful for domain detection [3]. TCP-UDP-ICMP.json is dedicated for those classifiers use time-series for detection [6, 30]. All three planes are presented here in addition to related statistical fields such as packet TTL and delta packets interarrival time. HTTP.json is employed for those interested to detect malicious HTTP connections [30, 38]. Other classifiers may deploy HTTPS.json for detecting encrypted communications without deciphering the traffic [4]. A detailed study of the other variants is left for future work.

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alageel, A., Maffeis, S. (2022). EarlyCrow: Detecting APT Malware Command and Control over HTTP(S) Using Contextual Summaries. In: Susilo, W., Chen, X., Guo, F., Zhang, Y., Intan, R. (eds) Information Security. ISC 2022. Lecture Notes in Computer Science, vol 13640. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22389-1

  • Online ISBN: 978-3-031-22390-7

  • eBook Packages: Computer ScienceComputer Science (R0)