Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models

  • Nour MoustafaEmail author
  • Gideon Creech
  • Jill Slay
Part of the Data Analytics book series (DAANA)


An intrusion detection system has become a vital mechanism to detect a wide variety of malicious activities in the cyber domain. However, this system still faces an important limitation when it comes to detecting zero-day attacks, concerning the reduction of relatively high false alarm rates. It is thus necessary to no longer consider the tasks of monitoring and analysing network data in isolation, but instead optimise their integration with decision-making methods for identifying anomalous events. This chapter presents a scalable framework for building an effective and lightweight anomaly detection system. This framework includes three modules of capturing and logging, pre-processing and a new statistical decision engine, called the Dirichlet mixture model based anomaly detection technique. The first module sniffs and collects network data while the second module analyses and filters these data to improve the performance of the decision engine. Finally, the decision engine is designed based on the Dirichlet mixture model with a lower-upper interquartile range as decision engine. The performance of this framework is evaluated on two well-known datasets, the NSL-KDD and UNSW-NB15. The empirical results showed that the statistical analysis of network data helps in choosing the best model which correctly fits the network data. Additionally, the Dirichlet mixture model based anomaly detection technique provides a higher detection rate and lower false alarm rate than other three compelling techniques. These techniques were built based on correlation and distance measures that cannot detect modern attacks which mimic normal activities, whereas the proposed technique was established using the Dirichlet mixture model and precise boundaries of interquartile range for finding small differences between legitimate and attack vectors, efficiently identifying these attacks.


  1. 1.
    Aburomman, A.A., Reaz, M.B.I.: A novel svm-knn-pso ensemble method for intrusion detection system. Applied Soft Computing 38, 360–372 (2016)CrossRefGoogle Scholar
  2. 2.
    Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. Journal of Network and Computer Applications 60, 19–31 (2016)CrossRefGoogle Scholar
  3. 3.
    Alqahtani, S.M., Al Balushi, M., John, R.: An intelligent intrusion detection system for cloud computing (sidscc). In: Computational Science and Computational Intelligence (CSCI), 2014 International Conference on, vol. 2, pp. 135–141. IEEE (2014)Google Scholar
  4. 4.
    Ambusaidi, M., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm (2016)zbMATHGoogle Scholar
  5. 5.
    traffic analysis, N.: Network traffic analysis (November 2016). URL
  6. 6.
    Berthier, R., Sanders, W.H., Khurana, H.: Intrusion detection for advanced metering infrastructures: Requirements and architectural directions. In: Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on, pp. 350–355. IEEE (2010)Google Scholar
  7. 7.
    Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Communications Surveys & Tutorials 16(1), 303–336 (2014)CrossRefGoogle Scholar
  8. 8.
    Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Transactions on Image Processing 13(11), 1533–1543 (2004)CrossRefGoogle Scholar
  9. 9.
    Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8), 1429–1443 (2009)CrossRefGoogle Scholar
  10. 10.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM computing surveys (CSUR) 41(3), 15 (2009)Google Scholar
  11. 11.
    Corona, I., Giacinto, G., Roli, F.: Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues. Information Sciences 239, 201–225 (2013)CrossRefGoogle Scholar
  12. 12.
    Ding, Q., Kolaczyk, E.D.: A compressed pca subspace method for anomaly detection in high-dimensional data. IEEE Transactions on Information Theory 59(11), 7419–7433 (2013)CrossRefGoogle Scholar
  13. 13.
    Dua, S., Du, X.: Data mining and machine learning in cybersecurity. CRC press (2016)Google Scholar
  14. 14.
    Dubey, S., Dubey, J.: Kbb: A hybrid method for intrusion detection. In: Computer, Communication and Control (IC4), 2015 International Conference on, pp. 1–6. IEEE (2015)Google Scholar
  15. 15.
    Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the american statistical association 90(430), 577–588 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fahad, A., Tari, Z., Almalawi, A., Goscinski, A., Khalil, I., Mahmood, A.: Ppfscada: Privacy preserving framework for scada data publishing. Future Generation Computer Systems 37, 496–511 (2014)CrossRefGoogle Scholar
  17. 17.
    Fan, W., Bouguila, N., Ziou, D.: Unsupervised anomaly intrusion detection via localized bayesian feature selection. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1032–1037. IEEE (2011)Google Scholar
  18. 18.
    Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite dirichlet mixture models and applications. IEEE transactions on neural networks and learning systems 23(5), 762–774 (2012)CrossRefGoogle Scholar
  19. 19.
    Ghasemi, A., Zahediasl, S., et al.: Normality tests for statistical analysis: a guide for non-statisticians. International journal of endocrinology and metabolism 10(2), 486–489 (2012)CrossRefGoogle Scholar
  20. 20.
    Giannetsos, T., Dimitriou, T.: Spy-sense: spyware tool for executing stealthy exploits against sensor networks. In: Proceedings of the 2nd ACM workshop on Hot topics on wireless network security and privacy, pp. 7–12. ACM (2013)Google Scholar
  21. 21.
    Greggio, N.: Learning anomalies in idss by means of multivariate finite mixture models. In: Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on, pp. 251–258. IEEE (2013)Google Scholar
  22. 22.
    Harrou, F., Kadri, F., Chaabane, S., Tahon, C., Sun, Y.: Improved principal component analysis for anomaly detection: Application to an emergency department. Computers & Industrial Engineering 88, 63–77 (2015)CrossRefGoogle Scholar
  23. 23.
    Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert systems with Applications 38(1), 306–313 (2011)CrossRefGoogle Scholar
  24. 24.
    Hung, S.S., Liu, D.S.M.: A user-oriented ontology-based approach for network intrusion detection. Computer Standards & Interfaces 30(1), 78–88 (2008)CrossRefGoogle Scholar
  25. 25.
    Jadhav, A., Jadhav, A., Jadhav, P., Kulkarni, P.: A novel approach for the design of network intrusion detection system (nids). In: Sensor Network Security Technology and Privacy Communication System (SNS & PCS), 2013 International Conference on, pp. 22–27. IEEE (2013)Google Scholar
  26. 26.
    Lee, Y.J., Yeh, Y.R., Wang, Y.C.F.: Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering 25(7), 1460–1470 (2013)CrossRefGoogle Scholar
  27. 27.
    Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence 36(1), 18–32 (2014)CrossRefGoogle Scholar
  28. 28.
    Milenkoski, A., Vieira, M., Kounev, S., Avritzer, A., Payne, B.D.: Evaluating computer intrusion detection systems: A survey of common practices. ACM Computing Surveys (CSUR) 48(1), 12 (2015)Google Scholar
  29. 29.
    Minka, T.: Estimating a dirichlet distribution (2000)Google Scholar
  30. 30.
    Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., Rajarajan, M.: A survey of intrusion detection techniques in cloud. Journal of Network and Computer Applications 36(1), 42–57 (2013)CrossRefGoogle Scholar
  31. 31.
    Moustafa, N., Slay, J.: A hybrid feature selection for network intrusion detection systems: Central points. In: the Proceedings of the 16th Australian Information Warfare Conference, Edith Cowan University, Joondalup Campus, Perth, Western Australia, pp. 5–13. Security Research Institute, Edith Cowan University (2015)Google Scholar
  32. 32.
    Moustafa, N., Slay, J.: The significant features of the unsw-nb15 and the kdd99 data sets for network intrusion detection systems. In: Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2015 4th International Workshop on, pp. 25–31. IEEE (2015)Google Scholar
  33. 33.
    Moustafa, N., Slay, J.: Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6. IEEE (2015)Google Scholar
  34. 34.
    Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: Statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data set. Information Security Journal: A Global Perspective (2016)Google Scholar
  35. 35.
    Nadiammai, G., Hemalatha, M.: An evaluation of clustering technique over intrusion detection system. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, pp. 1054–1060. ACM (2012)Google Scholar
  36. 36.
    Naldurg, P., Sen, K., Thati, P.: A temporal logic based framework for intrusion detection. In: International Conference on Formal Techniques for Networked and Distributed Systems, pp. 359–376. Springer (2004)Google Scholar
  37. 37.
    Perdisci, R., Gu, G., Lee, W.: Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems. In: Sixth International Conference on Data Mining (ICDM’06), pp. 488–498. IEEE (2006)Google Scholar
  38. 38.
    Pontarelli, S., Bianchi, G., Teofili, S.: Traffic-aware design of a high-speed fpga network intrusion detection system. IEEE Transactions on Computers 62(11), 2322–2334 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C., Samatova, N.F.: Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics 7(3), 223–247 (2015)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1), 73–79 (2011)Google Scholar
  41. 41.
    Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2112–2119. IEEE (2012)Google Scholar
  42. 42.
    Seeberg, V.E., Petrovic, S.: A new classification scheme for anonymization of real data used in ids benchmarking. In: Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on, pp. 385–390. IEEE (2007)Google Scholar
  43. 43.
    Shameli-Sendi, A., Cheriet, M., Hamou-Lhadj, A.: Taxonomy of intrusion risk assessment and response system. Computers & Security 45, 1–16 (2014)CrossRefGoogle Scholar
  44. 44.
    Sheikhan, M., Jadidi, Z.: Flow-based anomaly detection in high-speed links using modified gsa-optimized neural network. Neural Computing and Applications 24(3–4), 599–611 (2014)CrossRefGoogle Scholar
  45. 45.
    Shifflet, J.: A technique independent fusion model for network intrusion detection. In: Proceedings of the Midstates Conference on Undergraduate Research in Computer Science and Mat hematics, vol. 3, pp. 1–3. Citeseer (2005)Google Scholar
  46. 46.
    Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R.P.: Denial-of-service attack detection based on multivariate correlation analysis. In: International Conference on Neural Information Processing, pp. 756–765. Springer (2011)Google Scholar
  47. 47.
    Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R.P.: A system for denial-of-service attack detection based on multivariate correlation analysis. IEEE transactions on parallel and distributed systems 25(2), 447–456 (2014)CrossRefGoogle Scholar
  48. 48.
    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 (2009)CrossRefGoogle Scholar
  49. 49.
    Tsai, C.F., Lin, C.Y.: A triangle area based nearest neighbors approach to intrusion detection. Pattern recognition 43(1), 222–229 (2010)CrossRefzbMATHGoogle Scholar
  50. 50.
    Wagle, B.: Multivariate beta distribution and a test for multivariate normality. Journal of the Royal Statistical Society. Series B (Methodological) pp. 511–516 (1968)Google Scholar
  51. 51.
    Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing 10(1), 1–35 (2010)CrossRefGoogle Scholar
  52. 52.
    Zainaddin, D.A.A., Hanapi, Z.M.: Hybrid of fuzzy clustering neural network over nsl dataset for intrusion detection system. Journal of Computer Science 9(3), 391 (2013)CrossRefGoogle Scholar
  53. 53.
    Zuech, R., Khoshgoftaar, T.M., Wald, R.: Intrusion detection and big heterogeneous data: a survey. Journal of Big Data 2(1), 1 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.The Australian Centre for Cyber SecurityUniversity of New South Wales CanberraCanberraAustralia

Personalised recommendations