Skip to main content
Log in

Toward intrusion detection using belief decision trees for big data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Big data refers to datasets that we cannot manage with standard tools and within which lie valuable information previously hidden. New data mining techniques are needed to deal with the increasing size of such data, their complex structure as well as their veracity which is on covering questions of data imperfection and uncertainty. Even though big data veracity is often overlooked, it is very challenging and important for an accurate and reliable mining and knowledge discovery. This paper proposes MapReduce-based belief decision trees for big data as classifiers of uncertain large-scale datasets. The proposed averaging and conjunctive classification approaches are experimented for intrusion detection on KDD’99 massive intrusion dataset. Several granularity attacks’ levels have been considered depending on whether dealing with whole kind of attacks, or grouping them in categories or focusing on distinguishing normal and abnormal connections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abbes T, Bouhoula A, Rusinowitch M (2004) Protocol analysis in intrusion detection using decision tree. In: International conference on information technology: coding and computing, vol 1. IEEE Computer Society, pp 404–408

  2. Ajabi M, Boukhris I, Elouedi Z (2016) Big data classification using belief decision trees: Application to intrusion detection. In: International conference on advanced intelligent system and informatics, advances in intelligent systems and computing, vol 407. Springer, Berlin, pp 369–379

  3. Akamine M, Ajmera J (2012) Decision tree-based acoustic models for speech recognition. EURASIP J Audio Speech Music Process 2012(1):10

    Article  Google Scholar 

  4. Amdahl GM (2007) Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the AFIPS conference proceedings. IEEE Solid State Circuits Soc Newsl 12(3):19–20

    Article  Google Scholar 

  5. Appriou A (1999) Multisensor signal processing in the framework of the theory of evidence. In: NATO/RTA, SCI lecture series 216 on application of mathematical signal processing techniques to mission systems

  6. Azar A, El-Metwally S (2013) Decision tree classifiers for automated medical diagnosis. Neural Comput Appl 23(7–8):2387–2403

    Article  Google Scholar 

  7. Ben Amor N, Benferhat S, Elouedi Z (2004) Naive Bayes vs decision trees in intrusion detection systems. In: ACM symposium on applied computing. ACM, pp 420–424

  8. Bouzida Y, Cuppens F (2006) Neural networks vs. decision trees for intrusion detection. In: IEEE/IST workshop on monitoring, attack detection and mitigation, vol 28, p 29

  9. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  10. Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307

    Article  Google Scholar 

  11. Chen C, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347

    Article  Google Scholar 

  12. Chimphlee W, Abdullah AH, Sap MNM, Srinoy S, Chimphlee S (2006) Anomaly-based intrusion detection using fuzzy rough clustering. In: International conference on hybrid information technology, vol 1. IEEE, pp 329–334

  13. Dai W, Ji W (2014) A mapreduce implementation of c4.5 decision tree algorithm. Int J Database Theory Appl 7(1):49–60

    Article  Google Scholar 

  14. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  15. Dempster A (1968) A generalization of Bayesian inference. J R Stat Soc Ser B (Methodol) 30:205–247

  16. Denœux T, Zouhal LM (2001) Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst 122:47–62

    MathSciNet  MATH  Google Scholar 

  17. Depren O, Topallar M, Anarim E, Ciliz MK (2005) An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl 29(4):713–722

    Article  Google Scholar 

  18. Destercke S, Dubois D (2009) Can the minimum rule of possibility theory be extended to belief functions? In: European conference on symbolic and quantitative approaches to reasoning with uncertainty. Lecture Notes in Computer Science, vol 5590. Springer, Berlin, pp 299–310

  19. Elouedi Z, Mellouli K (2001) Induction of belief decision trees: a conjunctive approach. In: Conference of the applied stochastic models and data analysis, pp 404–409

  20. Elouedi Z, Mellouli K, Smets P (2000) Decision trees using the belief function theory. In: International conference on information processing and management of uncertainty, vol 1, pp 141–148

  21. Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: theoretical foundations. Int J Approx Reason 28(2):91–124

    Article  MathSciNet  MATH  Google Scholar 

  22. Ferrera P, De Prado I, Palacios E, Fernandez-Marquez J, Di Marzo Serugendo G (2014) Tuple mapreduce and pangool: an associated implementation. Knowl Inf Syst 41(2):531–557

    Article  Google Scholar 

  23. Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, Hoboken

    Google Scholar 

  24. Koc L, Mazzuchi TA, Sarkani S (2013) A network intrusion detection system based on a hidden nave Bayes multiclass classifier. Expert Syst Appl 39(18):13492–13500

    Article  Google Scholar 

  25. Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Rec 40(4):11–20

    Article  Google Scholar 

  26. Lin SW, Ying KC, Lee CY, Lee ZJ (2012) An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 12(10):3285–3290

    Article  Google Scholar 

  27. Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) The 1999 DARPA off-line intrusion detection evaluation. Comput Netw 34(4):579–595

    Article  Google Scholar 

  28. Liu G, Wang X (2008) An integrated intrusion detection system by using multiple neural networks. In: IEEE conference on cybernetics and intelligent systems. IEEE, pp 22–27

  29. Madden S (2012) From databases to big data. IEEE Internet Comput 3:4–6

    Article  Google Scholar 

  30. Mehta M, Agrawal R, Rissanen J (1996) SLIQ: a fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G (eds) Advances in database technology. Springer, Berlin, pp 18–32

  31. Om H, Kundu A (2012) A hybrid system for reducing the false alarm rate of anomaly intrusion detection system. In: International conference on recent advances in information technology (RAIT). IEEE, pp 131–136

  32. Patel J, Katkar V (2016) A multi-classifiers based novel DoS/DDoS attack detection using fuzzy logic. In: International conference on ICT for sustainable development. Springer, Berlin, pp 809–815

  33. Pathan ASK (2014) The state of the art in intrusion prevention and detection. CRC Press, Boca raton

    Book  Google Scholar 

  34. Quinlan J (2014) C4.5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  35. Ramos V, Abraham A (2005) Antids: self organized ant-based clustering model for intrusion detection system. In: Abraham A, Dote Y, Furuhashi T, Köppen M, Ohuchi A, Ohsawa Y (eds) Soft Computing as transdisciplinary science and technology. Springer, Berlin, pp 977–986

  36. Sagiroglu S, Sinanc D (2013) Big data: a review. In: International conference on collaboration technologies and systems. IEEE, pp 42–47

  37. Scott SL (2004) A bayesian paradigm for designing intrusion detection systems. Comput Stat Data Anal 45(1):69–83

    Article  MathSciNet  MATH  Google Scholar 

  38. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton

    MATH  Google Scholar 

  39. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Symposium on mass storage systems and technologies. IEEE, pp 1–10

  40. Smets P (1998) The transferable belief model for quantified belief representation, vol 1. Kluwer, Dordrecht, pp 267–301

    MATH  Google Scholar 

  41. Stolfo S (1999) KDD cup 1999 dataset. KDD repository. University of California, Irvine. http://kdd.ics.uci.edu

  42. Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41(4):70–73

    Article  Google Scholar 

  43. Trabelsi S, Elouedi Z, El Aroui M (2014) Incremental induction of belief decision trees in averaging approach. In: International conference on database and expert systems applications. Springer, Berlin, pp 454–461

  44. Trabelsi S, Elouedi Z, Mellouli K (2006) Pruning method of belief decision trees. World Acad Sci Eng Technol 21:100–105

    Google Scholar 

  45. Trabelsi S, Elouedi Z, Mellouli K (2007) Pruning belief decision tree methods in averaging and conjunctive approaches. Int J Approx Reason 46(3):568–595

    Article  MathSciNet  MATH  Google Scholar 

  46. Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–12000

    Article  Google Scholar 

  47. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Sebastopol

    Google Scholar 

  48. Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  49. Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Article  Google Scholar 

  50. Yao Y, Lingras P (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(12):81–106

    Article  MathSciNet  MATH  Google Scholar 

  51. Yu J, Lee H, Kim MS, Park D (2008) Traffic flooding attack detection with SNMP MIB using SVM. Comput Commun 31(17):4212–4219

    Article  Google Scholar 

  52. Zhang Z, Shen H (2005) Application of online-training SVMs for real-time intrusion detection with different considerations. Comput Commun 28(12):1428–1442

    Article  Google Scholar 

  53. Haines JW, Lippmann RP, Fried DJ, Zissman MA, Tran E (2001) 1999 DARPA intrusion detection evaluation: design and procedures. DTIC Document

  54. Zuech R, Khoshgoftaar TM, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. J Big Data 2(1):1–41

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imen Boukhris.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boukhris, I., Elouedi, Z. & Ajabi, M. Toward intrusion detection using belief decision trees for big data. Knowl Inf Syst 53, 671–698 (2017). https://doi.org/10.1007/s10115-017-1034-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1034-4

Keywords

Navigation