Diagnosing bot infections using Bayesian inference

Ashfaq, Ayesha Binte; Abaid, Zainab; Ismail, Maliha; Aslam, Muhammad Umar; Syed, Affan A.; Khayam, Syed Ali

doi:10.1007/s11416-016-0286-y

Diagnosing bot infections using Bayesian inference

Original Paper
Published: 30 September 2016

Volume 14, pages 21–38, (2018)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Ayesha Binte Ashfaq¹,
Zainab Abaid²,
Maliha Ismail³,
Muhammad Umar Aslam³,
Affan A. Syed⁴ &
…
Syed Ali Khayam⁴

477 Accesses
2 Citations
Explore all metrics

Abstract

Prior research in botnet detection has used the bot lifecycle to build detection systems. These systems, however, use rule-based decision engines which lack automated adaptability and learning, accuracy tunability, the ability to cope with gaps in training data, and the ability to incorporate local security policies. To counter these limitations, we propose to replace the rigid decision engines in contemporary bot detectors with a more formal Bayesian inference engine. Bottleneck, our prototype implementation, builds confidence in bot infections based on the causal bot lifecycle encoded in a Bayesian network. We evaluate Bottleneck by applying it as a post-processing decision engine on lifecycle events generated by two existing bot detectors (BotHunter and BotFlex) on two independently-collected datasets. Our experimental results show that Bottleneck consistently achieves comparable or better accuracy than the existing rule-based detectors when the test data is similar to the training data. For differing training and test data, Bottleneck, due to its automated learning and inference models, easily surpasses the accuracies of rule-based systems. Moreover, Bottleneck’s stochastic nature allows its accuracy to be tuned with respect to organizational needs. Extending Bottleneck’s Bayesian network into an influence diagram allows for local security policies to be defined within our framework. Lastly, we show that Bottleneck can also be extended to incorporate evidence trustscore for false alarm reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Reference is not included due to double blind review.
While some rule-based engines soften the impact of these fundamental problems using regression-based weight assignment and soft timers [3], these schemes lack a formal rigor and remain susceptible to data overfitting.
A few variants of the Conficker botnet copy these DLL-form bot binaries to removable media, but since these events cannot be extracted from the network trace, we do not include such events in our lifecycle. Similarly the bot also performs password cracking to copy itself to the ADMIN$ folder, update registry values and performs other configuration changes, which we also not consider as part of our lifecycle events owing to the inability to extract them from the network trace.
Implies a network of clean hosts. In the SysNet Lab dataset, the benign network traffic was collected from the hosts in the lab and it was necessary to ensure that the benign data was clean and free of any bot traffic for judicious evaluation.
BotHunter’s three conditions are (1) evidence of (local host infection AND outward bot coordination or attack); (2) at least two distinct signs of outward bot coordination or attack; and (3) evidence that a host attempts communication with a confirmed malware site (E8[rb]).
E8[rb] is not a bot lifecycle event.
The steepness of the curve owes to the small size of the SysNet data trace. The trace includes ten bot infections and benign data from 22 hosts. Hence, there are a limited number of instances in the trace which are not uniformly distributed w.r.t. threshold, ultimately resulting in the steepness of the ROC curve.
BotHunter’s three conditions are (1) evidence of (local host infection AND outward bot coordination or attack); (2) at least two distinct signs of outward bot coordination or attack; and (3) evidence that a host attempts communication with a confirmed malware site.

References

Bencsth, B., Pk, G., Buttyn, L., Flegyhzi, M.: Duqu: analysis, detection, and lessons learned. In: 2012 ACM European Workshop on System Security (EuroSec), vol. 2012 (2012)
Falliere, N., Murchu, L.O., Chien, E.: W32. stuxnet dossier. In: White Paper, Symantec Corp., Security Response, 2011, online 5 June (2013)
Gu, G., Porras, P., Yegneswaran, V., Fong, M., Lee, W.: Bothunter: detecting malware infection through ids-driven dialog correlation, In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07. USENIX Association, Berkeley, pp. 12:1–12:16 (2007). http://dl.acm.org/citation.cfm?id=1362903.1362915
Khattak, S., Ahmed, Z., Syed, A. A., Khayam, S.A.: Poster: Botflex: a community-driven tool for botnet detection, online 17 May (2013)
Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs. Springer, New York (2007)
Book MATH Google Scholar
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection, In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 305–316. doi:10.1109/SP.2010.25
Ramay, N.R., Khattak, S., Syed, A.A., Khayam, S.A.: Poster: Bottleneck: a generalized, flexible, and extensible framework for botnet defense, online 13 April (2013)
Cooper, G., Herskovits, E.: A bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992). doi:10.1007/BF00994110
MATH Google Scholar
Cheng, J., Greiner, R.: Learning Bayesian belief network classifiers: algorithm and systems. In: Stroulia, E., Matwin, S. (eds.) Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol. 2056, pp. 141–151. Springer, Berlin, Heidelberg (2001)
Netica programers library reference manual. http://www.norsys.com/netica-j/docs/NeticaJ_Man.pdf, online 20 May (2013)
Spiegelhalter, D. J., Dawid, A.P., Lauritzen, S.L., Cowell, R.G.: Bayesian analysis in expert systems. Stat. Sci. 219–247 (1993)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, New York (1988)
MATH Google Scholar
Conficker: https://mil.fireeye.com/edp.php?sname=Bot.Conficker, online June (2013)
Inside the storm: https://www.blackhat.com/presentations/bh-usa-08/Stewart/BH_US_08_Stewart_Protocols_of_the_Storm.pdf
Roesch, M., et al.: Snort-lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration, Seattle, Washington, pp. 229–238 (1999)
Bro: http://www.bro.org/, online 10 April (2013)
Kruegel, C., Mutz, D., Robertson, W., Valeur, F.: Bayesian event classification for intrusion detection. In: 19th Annual Proceedings Computer Security Applications Conference, pp. 14–23. IEEE, New York (2003)
Cert-polaska: http://www.cert.pl/PDF/Report_Virut_EN.pdf, online 25 February (2013)
Nayatel: http://www.nayatel.pk/index.php, online April (2013)
Team cymru: https://www.team-cymru.org/, online April (2013)
The ICSI networking and security group. http://www.icir.org/, online 13 May (2013)
Bottleneck: http://sysnet.org.pk/w/Code_and_Tools#Bottleneck, online 15 June (2013)
Stewart, J.: Inside the storm: protocols and encryption of the storm botnet. In: Black Hat Technical Security Conference, New York (2008)
Emerging threats malware rulesets: http://www.emergingthreats.net, online 5 August (2013)
Poole, D., Mackworth, A.: Artificial intelligence: foundations of computational agents, online November (2013)
Netica-j reference manual—Norsys Software Corp. http://www.norsys.com/downloads/NeticaJ_Man_418.pdf, online 16 September (2013)
Costa, E., Lorena, A., Carvalho, A., Freitas, A.: A review of performance evaluation measures for hierarchical classifiers. In: Evaluation Methods for Machine Learning II: Papers from the AAAI-2007 Workshop, pp. 1–6 (2007)
lozano, J.A., Santaf, G., Inza, I.: Classier performance evaluation and comparison. In: International Conference on Machine Learning and Applications (ICMLA 2010). http://www.icmla-conference.org/icmla10/CFP_Tutorial_files/jose.pdf
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. http://www.amazon.de/Data-Mining-Concepts-Techniques-Management/dp/0123814790/ref=tmm_hrd_title_0?ie=UTF8&qid=1366039033&sr=1-1 (2012)
Invernizzi, L., Miskovic, S., Torres, R., Saha, S., Lee, S., Mellia, M., Kruegel, C., Vigna, G.: Nazca: detecting malware distribution in large-scale networks. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2014)
Kapravelos, A., Shoshitaishvili, Y., Cova, M., Kruegel, C., Vigna, G.: Revolver: an automated approach to the detection of evasive web-based malware. In: USENIX Security, Citeseer, pp. 637–652 (2013)
Chinchani, R., Van Den Berg, E.: A fast static analysis approach to detect exploit code inside network flows. In: Recent Advances in Intrusion Detection. Springer, New York, pp. 284–308 (2006)
Baldoni, R., Di Luna, G.A., Querzoni, L.: Collaborative detection of coordinated port scans. In: Distributed Computing and Networking. Springer, New York, pp. 102–117 (2013)
Muelder, C., Ma, K.-L., Bartoletti, T.: Interactive visualization for network and port scan detection. In: Recent Advances in Intrusion Detection. Springer, New York, pp. 265–283 (2006)
Zargar, S.T., Joshi, J., Tipper, D.: A survey of defense mechanisms against distributed denial of service (ddos) flooding attacks. IEEE Commun Surv Tutor 15(4), 2046–2069 (2013), online 28 May (2013)
Feinstein, L., Schnackenberg, D., Balupari, R., Kindred, D.: Statistical approaches to ddos attack detection and response. In: Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 1. IEEE, New York, pp. 303–314 (2003)
Zhao, Y., Xie, Y., Yu, F., Ke, Q., Yu, Y., Chen, Y., Gillum, E.: Botgraph: large scale spamming botnet detection. In: NSDI, vol. 9, pp. 321–334 (2009)
Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: USENIX Security, pp. 589–604 (2013)
Perdisci, R., Ariu, D., Giacinto, G.: Scalable fine-grained behavioral clustering of http-based malware. Comput. Netw. 57(2), 487–500 (2013)
Article Google Scholar
Goebel, J., Holz, T.: Rishi: identify bot contaminated hosts by IRC nickname evaluation. In: Proceedings of the First Conference on First Workshop on Hot Topics in Understanding Botnets, Cambridge, p. 8 (2007)
Saad, S., Traore, I., Ghorbani, A., Sayed, B., Zhao, D., Lu, W., Felix, J., Hakimian, P.: Detecting p2p botnets through network behavior analysis and machine learning. In: 2011 Ninth Annual International Conference on Privacy, Security and Trust (PST), pp. 174–180. IEEE, New York (2011)
Hsu, C.-H., Huang, C.-Y., Chen, K.-T.: Fast-flux bot detection in real time. In: Recent Advances in Intrusion Detection, pp. 464–483. Springer, New York (2010)
Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Proceedings of the 21st USENIX Security Symposium (2012)
Khattak, S., Ramay, N., Khan, K., Syed, A., Khayam, S.: A taxonomy of botnet behavior, detection and defense. IEEE Commun. Surv. Tutor., online June (2014)
Fabian, M.A.R.J.Z., Terzis, M. A.: A multifaceted approach to understanding the botnet phenomenon. In: Proceedings of the 2006 ACM SIGCOMM Internet Measurement Conference (IMC), vol. 2006 (2006)
Gu, G., Perdisci, R., Zhang, J., Lee, W., et al.: Botminer: clustering analysis of network traffic for protocol-and structure-independent botnet detection. In: USENIX Security Symposium, pp. 139–154 (2008)
Silva, S.S., Silva, R.M., Pinto, R.C., Salles, R.M.: Botnets: a survey. Comput. Netw. 57(2), 378–403 (2013)
Article Google Scholar
Hachem, N., Ben Mustapha, Y., Granadillo, G.G., Debar, H.: Botnets: lifecycle and taxonomy, In: 2011 Conference on Network and Information Systems Security (SAR-SSI), pp. 1–8. IEEE, New York (2011)
Lu, C., Brooks, R.: Botnet traffic detection using hidden markov models. In: Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research, p. 31. ACM, New York (2011)
Kidmose, E.: Botnet detection using hidden Markov models. Master’s thesis, Aalborg University (2014)

Download references

Author information

Authors and Affiliations

National University of Sciences and Technology, Islamabad, Pakistan
Ayesha Binte Ashfaq
Department of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Zainab Abaid
SysNet, National University of Computer and Emerging Sciences, Islamabad, Pakistan
Maliha Ismail & Muhammad Umar Aslam
PLUMgrid Inc., Sunnyvale, CA, USA
Affan A. Syed & Syed Ali Khayam

Authors

Ayesha Binte Ashfaq
View author publications
You can also search for this author in PubMed Google Scholar
Zainab Abaid
View author publications
You can also search for this author in PubMed Google Scholar
Maliha Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Umar Aslam
View author publications
You can also search for this author in PubMed Google Scholar
Affan A. Syed
View author publications
You can also search for this author in PubMed Google Scholar
Syed Ali Khayam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayesha Binte Ashfaq.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 159 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashfaq, A.B., Abaid, Z., Ismail, M. et al. Diagnosing bot infections using Bayesian inference. J Comput Virol Hack Tech 14, 21–38 (2018). https://doi.org/10.1007/s11416-016-0286-y

Download citation

Received: 21 December 2015
Accepted: 31 August 2016
Published: 30 September 2016
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11416-016-0286-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagnosing bot infections using Bayesian inference

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

What an Algorithm Is

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 159 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Diagnosing bot infections using Bayesian inference

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

What an Algorithm Is

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 159 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation