Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models

Moustafa, Nour; Creech, Gideon; Slay, Jill

doi:10.1007/978-3-319-59439-2_5

Nour Moustafa⁶,
Gideon Creech⁶ &
Jill Slay⁶

Part of the book series: Data Analytics ((DAANA))

3083 Accesses
81 Citations

Abstract

An intrusion detection system has become a vital mechanism to detect a wide variety of malicious activities in the cyber domain. However, this system still faces an important limitation when it comes to detecting zero-day attacks, concerning the reduction of relatively high false alarm rates. It is thus necessary to no longer consider the tasks of monitoring and analysing network data in isolation, but instead optimise their integration with decision-making methods for identifying anomalous events. This chapter presents a scalable framework for building an effective and lightweight anomaly detection system. This framework includes three modules of capturing and logging, pre-processing and a new statistical decision engine, called the Dirichlet mixture model based anomaly detection technique. The first module sniffs and collects network data while the second module analyses and filters these data to improve the performance of the decision engine. Finally, the decision engine is designed based on the Dirichlet mixture model with a lower-upper interquartile range as decision engine. The performance of this framework is evaluated on two well-known datasets, the NSL-KDD and UNSW-NB15. The empirical results showed that the statistical analysis of network data helps in choosing the best model which correctly fits the network data. Additionally, the Dirichlet mixture model based anomaly detection technique provides a higher detection rate and lower false alarm rate than other three compelling techniques. These techniques were built based on correlation and distance measures that cannot detect modern attacks which mimic normal activities, whereas the proposed technique was established using the Dirichlet mixture model and precise boundaries of interquartile range for finding small differences between legitimate and attack vectors, efficiently identifying these attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The NSLKDD dataset, https://web.archive.org/web/20150205070216/http://nsl.cs.unb.ca/NSL-KDD/, November 2016.
2.
The UNSW-NB15 dataset, https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/, November 2016.
3.
The IXIA Tools, https://www.ixiacom.com/products/perfectstorm, November 2016.
4.
Pcap refers to packet capture, which contains an Application Programming Interface (API) for saving network data. The UNIX operating systems execute the pcap format using the libpcap library while the Windows operating systems utilise a port of libpcap, called WinPcap.
5.
The MySQL Cluster CGE technology, https://www.mysql.com/products/cluster/, November 2016.
6.
The Hadoop technologies, http://hadoop.apache.org/, November 2016.

References

Aburomman, A.A., Reaz, M.B.I.: A novel svm-knn-pso ensemble method for intrusion detection system. Applied Soft Computing 38, 360–372 (2016)
Article Google Scholar
Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. Journal of Network and Computer Applications 60, 19–31 (2016)
Article Google Scholar
Alqahtani, S.M., Al Balushi, M., John, R.: An intelligent intrusion detection system for cloud computing (sidscc). In: Computational Science and Computational Intelligence (CSCI), 2014 International Conference on, vol. 2, pp. 135–141. IEEE (2014)
Google Scholar
Ambusaidi, M., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm (2016)
MATH Google Scholar
traffic analysis, N.: Network traffic analysis (November 2016). URL https://www.ipswitch.com/solutions/network-traffic-analysis
Berthier, R., Sanders, W.H., Khurana, H.: Intrusion detection for advanced metering infrastructures: Requirements and architectural directions. In: Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on, pp. 350–355. IEEE (2010)
Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Communications Surveys & Tutorials 16(1), 303–336 (2014)
Article Google Scholar
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Transactions on Image Processing 13(11), 1533–1543 (2004)
Article Google Scholar
Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8), 1429–1443 (2009)
Article Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM computing surveys (CSUR) 41(3), 15 (2009)
Google Scholar
Corona, I., Giacinto, G., Roli, F.: Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues. Information Sciences 239, 201–225 (2013)
Article Google Scholar
Ding, Q., Kolaczyk, E.D.: A compressed pca subspace method for anomaly detection in high-dimensional data. IEEE Transactions on Information Theory 59(11), 7419–7433 (2013)
Article Google Scholar
Dua, S., Du, X.: Data mining and machine learning in cybersecurity. CRC press (2016)
Google Scholar
Dubey, S., Dubey, J.: Kbb: A hybrid method for intrusion detection. In: Computer, Communication and Control (IC4), 2015 International Conference on, pp. 1–6. IEEE (2015)
Google Scholar
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the american statistical association 90(430), 577–588 (1995)
Article MathSciNet MATH Google Scholar
Fahad, A., Tari, Z., Almalawi, A., Goscinski, A., Khalil, I., Mahmood, A.: Ppfscada: Privacy preserving framework for scada data publishing. Future Generation Computer Systems 37, 496–511 (2014)
Article Google Scholar
Fan, W., Bouguila, N., Ziou, D.: Unsupervised anomaly intrusion detection via localized bayesian feature selection. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1032–1037. IEEE (2011)
Google Scholar
Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite dirichlet mixture models and applications. IEEE transactions on neural networks and learning systems 23(5), 762–774 (2012)
Article Google Scholar
Ghasemi, A., Zahediasl, S., et al.: Normality tests for statistical analysis: a guide for non-statisticians. International journal of endocrinology and metabolism 10(2), 486–489 (2012)
Article Google Scholar
Giannetsos, T., Dimitriou, T.: Spy-sense: spyware tool for executing stealthy exploits against sensor networks. In: Proceedings of the 2nd ACM workshop on Hot topics on wireless network security and privacy, pp. 7–12. ACM (2013)
Google Scholar
Greggio, N.: Learning anomalies in idss by means of multivariate finite mixture models. In: Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on, pp. 251–258. IEEE (2013)
Google Scholar
Harrou, F., Kadri, F., Chaabane, S., Tahon, C., Sun, Y.: Improved principal component analysis for anomaly detection: Application to an emergency department. Computers & Industrial Engineering 88, 63–77 (2015)
Article Google Scholar
Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert systems with Applications 38(1), 306–313 (2011)
Article Google Scholar
Hung, S.S., Liu, D.S.M.: A user-oriented ontology-based approach for network intrusion detection. Computer Standards & Interfaces 30(1), 78–88 (2008)
Article Google Scholar
Jadhav, A., Jadhav, A., Jadhav, P., Kulkarni, P.: A novel approach for the design of network intrusion detection system (nids). In: Sensor Network Security Technology and Privacy Communication System (SNS & PCS), 2013 International Conference on, pp. 22–27. IEEE (2013)
Google Scholar
Lee, Y.J., Yeh, Y.R., Wang, Y.C.F.: Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering 25(7), 1460–1470 (2013)
Article Google Scholar
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence 36(1), 18–32 (2014)
Article Google Scholar
Milenkoski, A., Vieira, M., Kounev, S., Avritzer, A., Payne, B.D.: Evaluating computer intrusion detection systems: A survey of common practices. ACM Computing Surveys (CSUR) 48(1), 12 (2015)
Google Scholar
Minka, T.: Estimating a dirichlet distribution (2000)
Google Scholar
Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., Rajarajan, M.: A survey of intrusion detection techniques in cloud. Journal of Network and Computer Applications 36(1), 42–57 (2013)
Article Google Scholar
Moustafa, N., Slay, J.: A hybrid feature selection for network intrusion detection systems: Central points. In: the Proceedings of the 16th Australian Information Warfare Conference, Edith Cowan University, Joondalup Campus, Perth, Western Australia, pp. 5–13. Security Research Institute, Edith Cowan University (2015)
Google Scholar
Moustafa, N., Slay, J.: The significant features of the unsw-nb15 and the kdd99 data sets for network intrusion detection systems. In: Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2015 4th International Workshop on, pp. 25–31. IEEE (2015)
Google Scholar
Moustafa, N., Slay, J.: Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6. IEEE (2015)
Google Scholar
Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: Statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data set. Information Security Journal: A Global Perspective (2016)
Google Scholar
Nadiammai, G., Hemalatha, M.: An evaluation of clustering technique over intrusion detection system. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, pp. 1054–1060. ACM (2012)
Google Scholar
Naldurg, P., Sen, K., Thati, P.: A temporal logic based framework for intrusion detection. In: International Conference on Formal Techniques for Networked and Distributed Systems, pp. 359–376. Springer (2004)
Google Scholar
Perdisci, R., Gu, G., Lee, W.: Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems. In: Sixth International Conference on Data Mining (ICDM’06), pp. 488–498. IEEE (2006)
Google Scholar
Pontarelli, S., Bianchi, G., Teofili, S.: Traffic-aware design of a high-speed fpga network intrusion detection system. IEEE Transactions on Computers 62(11), 2322–2334 (2013)
Article MathSciNet MATH Google Scholar
Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C., Samatova, N.F.: Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics 7(3), 223–247 (2015)
Article MathSciNet Google Scholar
Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1), 73–79 (2011)
Google Scholar
Saligrama, V., Chen, Z.: Video anomaly detection based on local statistical aggregates. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2112–2119. IEEE (2012)
Google Scholar
Seeberg, V.E., Petrovic, S.: A new classification scheme for anonymization of real data used in ids benchmarking. In: Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on, pp. 385–390. IEEE (2007)
Google Scholar
Shameli-Sendi, A., Cheriet, M., Hamou-Lhadj, A.: Taxonomy of intrusion risk assessment and response system. Computers & Security 45, 1–16 (2014)
Article Google Scholar
Sheikhan, M., Jadidi, Z.: Flow-based anomaly detection in high-speed links using modified gsa-optimized neural network. Neural Computing and Applications 24(3–4), 599–611 (2014)
Article Google Scholar
Shifflet, J.: A technique independent fusion model for network intrusion detection. In: Proceedings of the Midstates Conference on Undergraduate Research in Computer Science and Mat hematics, vol. 3, pp. 1–3. Citeseer (2005)
Google Scholar
Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R.P.: Denial-of-service attack detection based on multivariate correlation analysis. In: International Conference on Neural Information Processing, pp. 756–765. Springer (2011)
Google Scholar
Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R.P.: A system for denial-of-service attack detection based on multivariate correlation analysis. IEEE transactions on parallel and distributed systems 25(2), 447–456 (2014)
Article Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 (2009)
Book Google Scholar
Tsai, C.F., Lin, C.Y.: A triangle area based nearest neighbors approach to intrusion detection. Pattern recognition 43(1), 222–229 (2010)
Article MATH Google Scholar
Wagle, B.: Multivariate beta distribution and a test for multivariate normality. Journal of the Royal Statistical Society. Series B (Methodological) pp. 511–516 (1968)
Google Scholar
Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing 10(1), 1–35 (2010)
Article Google Scholar
Zainaddin, D.A.A., Hanapi, Z.M.: Hybrid of fuzzy clustering neural network over nsl dataset for intrusion detection system. Journal of Computer Science 9(3), 391 (2013)
Article Google Scholar
Zuech, R., Khoshgoftaar, T.M., Wald, R.: Intrusion detection and big heterogeneous data: a survey. Journal of Big Data 2(1), 1 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Australian Centre for Cyber Security, University of New South Wales Canberra, Canberra, NSW, Australia
Nour Moustafa, Gideon Creech & Jill Slay

Authors

Nour Moustafa
View author publications
You can also search for this author in PubMed Google Scholar
Gideon Creech
View author publications
You can also search for this author in PubMed Google Scholar
Jill Slay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nour Moustafa .

Editor information

Editors and Affiliations

University of Bristol, Bristol, United Kingdom
Iván Palomares Carrascosa
Centre for Secure Information Technologies, Queen’s University of Belfast, Belfast, United Kingdom
Harsha Kumara Kalutarage
Queen’s University Belfast, Belfast, United Kingdom
Yan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Moustafa, N., Creech, G., Slay, J. (2017). Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models. In: Palomares Carrascosa, I., Kalutarage, H., Huang, Y. (eds) Data Analytics and Decision Support for Cybersecurity. Data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-59439-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-59439-2_5
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59438-5
Online ISBN: 978-3-319-59439-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics