The Journal of Supercomputing

, Volume 72, Issue 9, pp 3489–3510 | Cite as

Real time intrusion detection system for ultra-high-speed big data environments

Article

Abstract

In recent years, the number of people using the Internet and network services is increasing day by day. On a daily basis, a large amount of data is generated over the Internet from zeta byte to petabytes with a very high speed. On the other hand, we see more security threats on the network, the Internet, websites, and the enterprise network. Therefore, detecting intrusion in such ultra-high-speed environment in real time is a challenging task. Many intrusion detection systems (IDSs) are proposed for various types of network attacks using machine learning approaches. Most of them are unable to detect recent unknown attacks, whereas the others do not provide a real-time solution to overcome the above-mentioned challenges. Therefore, to address these problems, we propose a real-time intrusion detection system for ultra-high-speed big data environment using Hadoop implementation. The proposed system includes four-layered IDS architecture, which consists of the capturing layer, filtration and load balancing layer, processing or Hadoop layer, and the decision-making layer. Furthermore, feature selection scheme is proposed that selects nine parameters for classification using (FSR) and (BER), as well as from the analysis of DARPA datasets. In addition, five major machine learning approaches are used to evaluate the proposed system including J48, REPTree, random forest tree, conjunctive rule, support vector machine, and Naïve Bayes classifiers. Results show that among all these classifiers, REPTree and J48 are the best classifiers in terms of accuracy as well as efficiency. The proposed system architecture is evaluated with respect to accuracy in terms of true positive (TP) and false positive (FP), with respect to efficiency in terms of processing time and by comparing results with traditional techniques. It has more than 99 % TP and less than 0.001 % FP on REPTree and J48. The system has overall higher accuracy than existing IDSs with the capability to work in real time in ultra-high-speed big data environment.

Keywords

Machine learning Intrusion detection Threats  Big data Network 

Notes

Acknowledgments

This study was supported by the Brain Korea 21 Plus project (SW Human Resource Development Program for Supporting Smart Life) funded by Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (21A20131600005). This work is also supported by Institute for Information and Communication Technology Promotion(IITP) Grant funded by the Korean government (MSIP). [No. 10041145, Self-Organized Software Platform (SoSp) for Welfare Devices].

References

  1. 1.
    Denning D (1986) An intrusion-detection model. In: IEEE computer society Symposium on research security and privacy, pp 118–131Google Scholar
  2. 2.
    Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng 13(2):222–232. doi: 10.1109/TSE.1987.232894
  3. 3.
    Butun I, Morgera SD, Sankar R (2014) A survey of intrusion detection systems in wireless sensor networks. IEEE Commun Surv Tutor 16(1):266–282CrossRefGoogle Scholar
  4. 4.
    Ngadi M, Abdullah AH, Mandala S (2008) A survey on MANET intrusion detection. Int J Comput Sci Secur 2(1):1–11Google Scholar
  5. 5.
    Zhang Y, Lee W, Huang YA (2003) Intrusion detection techniques for mobile wireless networks. J Wirel Netw 9(5):545–556CrossRefGoogle Scholar
  6. 6.
    Patcha A, Park JM (2007) An overview of anomaly detection techniques: existing solutions and latest technological trends. Elsevier J Comput Netw 51(12):3448–3470CrossRefGoogle Scholar
  7. 7.
    Puttini R, Hanashiro M, Miziara F, de Sousa R, Garcia-Villalba L, Barenco C(2006) On the anomaly intrusion-detection in mobile ad hoc network environments. In: Proc. 11th IFIP TC6 international conference on personal wireless communications. Springer, pp 182–193Google Scholar
  8. 8.
    Engen, V.: Machine learning for network based intrusion. Ph.D. dissertation, Bournemouth Univ., Poole (2010)Google Scholar
  9. 9.
    ofcom (2013) Communications market report 2013 [Online]. http://www.ofcom.org.uk/cmruk/
  10. 10.
    Sagiroglu S, Sinanc D (2013) Big data: a review. In: Collaboration technologies and systems (CTS), 2013 International Conference on. IEEE, pp 42–47Google Scholar
  11. 11.
    Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. Knowl Data Eng IEEE Trans 26(1):97–107CrossRefGoogle Scholar
  12. 12.
    Pires Jr. WR, de Paula Figueiredo TH, Wong HC, Loureiro AAF (2004) Malicious node detection in wireless sensor networks. In: Proc. 18th Int. Parallel Distrib. Process. Symp. (2004)Google Scholar
  13. 13.
    Rao R, Kesidis G (2003) Detecting malicious packet dropping using statistically regular traffic patterns in multihop wireless networks that are not bandwidth limited. In: Proc. IEEE GLOBECOMGoogle Scholar
  14. 14.
    Kayacik HG, Zincir-Heywood AN, Heywood MI (2005) Selecting features for intrusion detection: a feature relevance analysis on kdd99 intrusion detection datasets. In: Proceedings of the third annual conference on privacy, security and trust, CiteseerGoogle Scholar
  15. 15.
    Araujo N, de Oliveira R, Ferreira E-W, Shinoda A, Bhargava B (2010) Identifying important characteristics in the kdd99 intrusion detection dataset by feature selection using a hybrid approach. In: IEEE 17th international conference on telecommunications (ICT), pp 552–558. IEEEGoogle Scholar
  16. 16.
    Kantor P, Muresan G, Roberts F et al (2005) Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In: Intelligence and security informatics, sec. 3, p 363. Springer-Verlag, Berlin, HeidelbergGoogle Scholar
  17. 17.
    Abbes T, Bouhoula A, Rusinowitch M (2010) Efficient decision tree for protocol analysis in intrusion detection. Int J Secur Netw 5(4):220–235CrossRefGoogle Scholar
  18. 18.
    Wagner C, François J, State R, Engel T (2011) Machine learning approach for IP-flow record anomaly detection. In: Proc. 10th International IFIPGoogle Scholar
  19. 19.
    Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521CrossRefGoogle Scholar
  20. 20.
    Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATHGoogle Scholar
  21. 21.
    Muda Z, Yassin W, Sulaiman MN, Udzir NI (2011) A K-means and naive bayes learning approach for better intrusion detection. Inf Technol J 10(3):648–655CrossRefGoogle Scholar
  22. 22.
    Gaddam SR, Phoha VV, Balagani KS (2007) K-Means+ID3: a novel method for supervised anomaly detection by cascading kmeans clustering and ID3 decision tree learning methods. IEEE Trans Knowl Data Eng 19(3):345–354CrossRefGoogle Scholar
  23. 23.
    Cho SB (2002) Incorporating soft computing techniques into a probabilistic intrusion detection ystem. Syst Man Cybern Part C Appl Rev IEEE Trans 32(2):154–160CrossRefGoogle Scholar
  24. 24.
    Yu Z, Tsai JJP, Weigert T (2007) An automatically tuning intrusion detection system. Syst Man Cybern Part B Cybern IEEE Trans 37(2):373–384CrossRefGoogle Scholar
  25. 25.
    da Silva AP, Martins M, Rocha B, Loureiro A, Ruiz L, Wong HC (2005) Decentralized intrusion detection in wireless sensor networks. In: Proc. 1st ACM International workshop on quality of service and security in wireless and mobile networks (Q2SWinet ’05), pp 16–23. ACM PressGoogle Scholar
  26. 26.
    Wai FH, Aye YN, James NH (2005) Intrusion detection in wireless ad-hoc networks. CS4274, Introduction to Mobile Computing, term paper, School of Computing, National University of SingaporeGoogle Scholar
  27. 27.
    Nadkarni K, Mishra A (2003) Intrusion detection in MANETs-the second wall of defense. In: Proc. 29th annual conference of the IEEE industrial electronics societyGoogle Scholar
  28. 28.
    Francisco M-P et al (2011) Network intrusion detection system embedded on a smart sensor. Ind Electron IEEE Trans 58(3):722–732MathSciNetCrossRefGoogle Scholar
  29. 29.
    Sequeira K, Zaki M (2002) ADMIT: anomaly-based data mining for intrusions. In: Proc. eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 386–395. ACM, New YorkGoogle Scholar
  30. 30.
    El-Khatib K (2010) Impact of feature reduction on the efficiency of wireless intrusion detection systems. Parallel Distrib Syst IEEE Trans 21(8):1143–1149Google Scholar
  31. 31.
    Tan Z, Nagar UT, Xiangjian He, Nanda P, Ren Ping Liu, Song Wang, Jiankun Hu (2014) Enhancing big data security with collaborative intrusion detection. Cloud Comput IEEE 1(3):27–33. doi: 10.1109/MCC.2014.53 CrossRefGoogle Scholar
  32. 32.
    Huang J, Kalbarczyk Z, Nicol DM (2014) Knowledge discovery from big data for intrusion detection using LDA. In: Big data (BigData Congress), 2014 IEEE international congress on, June 27 2014-July 2 2014, pp 760–761. doi: 10.1109/BigData.Congress.2014.111
  33. 33.
    Ahn S-H, Kim N-U, Chung T-M (2014) Big data analysis system concept for detecting unknown attacks. In: Advanced communication technology (ICACT), 2014 16th International Conference on, 16–19 Feb 2014, pp 269–272. doi: 10.1109/ICACT.2014.6778962
  34. 34.
    Marchal S, Jiang X, State R, Engel T (2014) A Big data architecture for large scale security monitoring. In: Big data (BigData Congress), 2014 IEEE international congress on, June 27 2014–July 2 2014, pp 56–63. doi: 10.1109/BigData.Congress.2014.18
  35. 35.
    I.S.T.G. MIT Lincoln Lab (2000) DARPA intrusion detection data sets. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.html
  36. 36.
    KDDcup99 (1999) Knowledge discovery in databases DARPA archive. http://www.kdd.ics.uci.edu/databases/kddcup99/task.html
  37. 37.
    NSL-KDD (2009) NSL-KDD data set for network-based intrusion detection systems. http://iscx.cs.unb.ca/NSL-KDD/
  38. 38.
    Al-Jarrah OY et al (2014) Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: Distributed computing systems workshops (ICDCSW), 2014 IEEE 34th international conference on. IEEEGoogle Scholar
  39. 39.
    ENGEN (2010) Machine learning for network based intrusion detection. Doctoral dissertation, Bournemouth UniversityGoogle Scholar
  40. 40.
    Zaman S, Karray F (2009) Features selection for intrusion detection systems based on support vector machines. In: Consumer communications and networking conference, 2009. CCNC 2009. 6th IEEE, pp 1–8Google Scholar
  41. 41.
    Fusco F, Deri L (2010) High speed network traffic analysis with commodity multi-core systems. ACM IMC 2010Google Scholar
  42. 42.
    Rathore MMU, Paul A, Ahmad A, Chen B, Huang B, Ji W (2015) Real-Time Big Data Analytical Architecture for Remote Sensing Application. Sel Top Appli Earth Observations Remote Sens, IEEE J 8(10):4610–4621. doi: 10.1109/JSTARS.2015.2424683
  43. 43.
    Ahmad A, Paul A, Rathore MM (2016) An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication. Neurocomputing 174:439–453Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringKyungpook National UniversityDaeguKorea

Personalised recommendations