Advertisement

Scalable Framework for Cyber Threat Situational Awareness Based on Domain Name Systems Data Analysis

  • R. Vinayakumar
  • Prabaharan Poornachandran
  • K. P. Soman
Chapter
Part of the Studies in Big Data book series (SBD, volume 44)

Abstract

There are myriad of security solutions that have been developed to tackle the Cyber Security attacks and malicious activities in digital world. They are firewalls, intrusion detection and prevention systems, anti-virus systems, honeypots etc. Despite employing these detection measures and protection mechanisms, the number of successful attacks and the level of sophistication of these attacks keep increasing day-by-day. Also, with the advent of Internet-of-Things, the number of devices connected to Internet has risen dramatically. The inability to detect attacks on these devices are due to (1) the lack of computational power for detecting attacks, (2) the lack of interfaces that could potentially indicate a compromise on this devices and (3) the lack of the ability to interact with the system to execute diagnostic tools. This warrants newer approaches such as Tier-1 Internet Service Provider level view of attack patterns to provide situational awareness of Cyber Security threats. We investigate and explore the event data generated by the Internet protocol Domain Name Systems (DNS) for the purpose of Cyber threat situational awareness. Traditional methods such as Static and Binary analysis of Malware are sometimes inadequate to address the proliferation of Malware due to the time taken to obtain and process the individual binaries in order to generate signatures. By the time the Anti-Malware signature is available, there is a chance that a significant amount of damage might have happened. The traditional Anti-Malware systems may not identify malicious activities. However, it may be detected faster through DNS protocol by analyzing the generated event data in a timely manner. As DNS was not designed with security in mind (or suffers from vulnerabilities), we explore how the vast amount of event data generated by these systems can be leveraged to create Cyber threat situational awareness. The main contributions of the book chapter are two-fold: (1). A scalable framework that can perform web scale analysis in near real-time that provide situational awareness. (2). Detect early warning signals before large scale attacks or malware propagation occurs. We employ deep learning approach to classify and correlate malicious events that are perceived from the protocol usage. To our knowledge this is the first time, a framework that can analyze and correlate the DNS usage information at continent scale or multiple Tier-1 Internet Service Provider scale has been studied and analyzed in real-time to provide situational awareness. Merely using a commodity hardware server, the developed framework is capable of analyzing more than 2 Million events per second and it could detect the malicious activities within them in near real-time. The developed framework can be scaled out to analyze even larger volumes of network event data by adding additional computing resources. The scalability and real-time detection of malicious activities from early warning signals makes the developed framework stand out from any system of similar kind.

Keywords

DNS log analysis Big data analytics Machine learning Deep learning 

Notes

Acknowledgements

This research was supported in part by Paramount Computer Systems and Ministry of Electronics and Information Technology (MeitY), Government of India. We are also grateful to NVIDIA India, for the GPU hardware support to research grant. We are grateful to Computational Engineering and Networking (CEN) department for encouraging the research.

References

  1. 1.
    Abu Rajab, M., Zarfoss, J., Monrose, F., & Terzis, A. (2006). A multifaceted approach to understanding the botnet phenomenon. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement (pp. 41–52). ACM.Google Scholar
  2. 2.
    Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., & Feamster, N. (2010). Building a dynamic reputation system for DNS. In USENIX Security Symposium (pp. 273–290).Google Scholar
  3. 3.
    Ollmann, G. (2009). Botnet communication topologies. Retrieved September 30, 2009.Google Scholar
  4. 4.
    Foster, K. (2010). The conicker worm and variants.Google Scholar
  5. 5.
    Torpig. (2016). Retrieved January 11, 2016 from http://en.wikipedia.org/wiki/Torpig.
  6. 6.
    Royal, P. (2008). Analysis of the kraken botnet. Damballa, Apr 9.Google Scholar
  7. 7.
    Looking back at murofet, a zeusbot variant’s active history. (2015). Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc. Retrieved August 1, 2014 from https://blog.dambella.com/archives/1008.
  8. 8.
    Crawford, H., & Aycock, J. (2008). Kwyjibo: Automatic domain name generation. Software: Practice and Experience, 38(14), 1561–1567.Google Scholar
  9. 9.
    Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., & Dagon, D. (2012). From throw-away traffic to bots: Detecting the rise of dga-based malware. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12) (pp. 491–506).Google Scholar
  10. 10.
    Will, C. (2014) Botnet detection with dns monitoring. Network, 25.Google Scholar
  11. 11.
    Schiavoni, S., Maggi, F., Cavallaro, L., & Zanero, S. (2014). Phoenix: Dga-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 192–211). Springer.Google Scholar
  12. 12.
    Raghuram, J., Miller, D. J., & Kesidis, G. (2014). Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling. Journal of Advanced Research, 5(4), 423433.CrossRefGoogle Scholar
  13. 13.
    Thomas, M., & Mohaisen, A. (2014). Kindred domains: detecting and clustering botnet domains using DNS traffic. In Proceedings of the 23rd International Conference on World Wide Web (pp. 707–712). ACM.Google Scholar
  14. 14.
    Ashwini, B., Menon, V. K., & Soman, K. P. (2016). Prediction of malicious domains using smith waterman algorithm. In International Symposium on Security in Computing and Communication (pp. 369–376). Singapore: Springer.Google Scholar
  15. 15.
    Zdrnja, B., Brownlee, N., & Wessels, D. (2007). Passive monitoring of dns anomalies. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 129–139). Springer.CrossRefGoogle Scholar
  16. 16.
    Ramachandran, A., & Feamster, N. (2006). Understanding the network-level behavior of spammers. In ACM SIGCOMM Computer Communication Review (vol. 36, no. 4, pp. 291–302). ACM.CrossRefGoogle Scholar
  17. 17.
    Anderson, D. S., Fleizach, C., Savage, S., & Voelker, G. M. (2007). Spamscatter: Characterizing internet scam hosting infrastructure. In Usenix Security (pp. 1–14).Google Scholar
  18. 18.
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRefGoogle Scholar
  19. 19.
    Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems.Google Scholar
  20. 20.
    Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179211.CrossRefGoogle Scholar
  21. 21.
    Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157166.CrossRefGoogle Scholar
  22. 22.
    Martens, J. (2010). Deep learning via hessian-free optimization. In Proceedings of 27th International Conference on Machine Learning.Google Scholar
  23. 23.
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  24. 24.
    Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv:1406.1078, http://arxiv.org/abs/1406.1078.
  25. 25.
    Le, Q. V., Jaitly, N., & Hinton, G. E. (2015). A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941 (2015).
  26. 26.
    Talathi, S. S., & Vartak, A. (2015). Improving performance of recurrent neural network with relu nonlinearity. arXiv:1511.03771.
  27. 27.
    Anstee Darren, C. F. C. P. B., & Sockrider, G. (2015). Worldwide infrastructure security report.Google Scholar
  28. 28.
    Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. Detecting android malware using long short-term memory-LSTM. Journal of Intelligent and Fuzzy Systems, IOS Press [In press].Google Scholar
  29. 29.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Deep android malware detection and classification. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 1677–1683). IEEE.Google Scholar
  30. 30.
    Vinayakumar, R., Soman, K. P., Poornachandran, P., & Sachin Kumar, S. Evaluating deep learning approaches to characterize and classify the DGAs at scale. Journal of Intelligent and Fuzzy Systems, IOS Press [In press].Google Scholar
  31. 31.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent and Fuzzy Systems, IOS Press [In press].CrossRefGoogle Scholar
  32. 32.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating shallow and deep networks for secure shell (ssh) traffic analysis. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 266–274). IEEE.Google Scholar
  33. 33.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 2026–2032). IEEE.Google Scholar
  34. 34.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. Evaluating deep learning approaches to characterize, signalize and classify malicious URLs. Journal of Intelligent and Fuzzy Systems, IOS Press [In press].Google Scholar
  35. 35.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Applying convolutional neural network for network intrusion detection. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 1222–1228). IEEE.Google Scholar
  36. 36.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Evaluating effectiveness of shallow and deep networks to intrusion detection system. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 1282–1289). IEEE.Google Scholar
  37. 37.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Long short-term memory based operation log anomaly detection. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 236–242). IEEE.Google Scholar
  38. 38.
    Vinayakumar, R., Soman, K. P., Velan, K. S., & Ganorkar, S. (2017). Evaluating shallow and deep networks for ransomware detection and classification. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 259–265). IEEE.Google Scholar
  39. 39.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Deep encrypted text categorization. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 364–370). IEEE.Google Scholar
  40. 40.
    Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2017). Applying deep learning approaches for network traffic prediction. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017 (pp. 2353–2358). IEEE.Google Scholar
  41. 41.
    Tripwire, google’s malaysian domains hit with DNS cache poisoning attack. (2013). Retrieved October, 2013 from http://www.tripwire.com/state-of-security/top-security-stories/googlesmalaysian-domainshit-dns-cache-poisoning-attack/.
  42. 42.
    Alexa-the top 500 sites on the web. (2014). Retrieved October 10, 2014 from http://www.alexa.com/topsites.
  43. 43.
    Hall, P. A., & Dowling, G. R. (1980). Approximate string matching. ACM Computing Surveys (CSUR), 12(4), 381–402.MathSciNetCrossRefGoogle Scholar
  44. 44.
    Dameraulevenshtein distance. (2014). Retrieved December 12, 2014 from http://en.wikipedia.org/wiki/DamerauLevenshtein.
  45. 45.
    Van der Maaten, L., & Hinton, G. (2008). Visualizing data using T-Sne. Journal of Machine Learning Research, 9(2579–2605), 85.MATHGoogle Scholar
  46. 46.
    Abadi, M., et al. (2016). TensorFlow: A system for large-scale machine learning. In OSDI (Vol. 16).Google Scholar
  47. 47.
    Soman, K. P., Loganathan, R., & Ajay, V. (2009). Machine learning with SVM and other kernel methods. Ltd: PHI Learning Pvt.Google Scholar
  48. 48.
    Soman, K. P., Diwakar, S., & Ajay, V. (2006). Data mining: Theory and practice [WITH CD]. Ltd: PHI Learning Pvt.Google Scholar
  49. 49.
    Kuhrer, M., Rossow, C., & Holz, T. (2014). Paint it black: Evaluating the effectiveness of malware blacklists. In International Workshop on Recent Advances in Intrusion Detection (pp. 1–21). Springer.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • R. Vinayakumar
    • 1
  • Prabaharan Poornachandran
    • 2
  • K. P. Soman
    • 1
  1. 1.Amrita School of Engineering, CoimbatoreCentre for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Amrita UniversityCoimbatoreIndia
  2. 2.Amrita School of EngineeringCentre for Cyber Security Systems and Networks, Amrita Vishwa Vidyapeetham, Amrita UniversityCoimbatoreIndia

Personalised recommendations