Knowledge and Information Systems

, Volume 53, Issue 3, pp 671–698 | Cite as

Toward intrusion detection using belief decision trees for big data

Regular Paper


Big data refers to datasets that we cannot manage with standard tools and within which lie valuable information previously hidden. New data mining techniques are needed to deal with the increasing size of such data, their complex structure as well as their veracity which is on covering questions of data imperfection and uncertainty. Even though big data veracity is often overlooked, it is very challenging and important for an accurate and reliable mining and knowledge discovery. This paper proposes MapReduce-based belief decision trees for big data as classifiers of uncertain large-scale datasets. The proposed averaging and conjunctive classification approaches are experimented for intrusion detection on KDD’99 massive intrusion dataset. Several granularity attacks’ levels have been considered depending on whether dealing with whole kind of attacks, or grouping them in categories or focusing on distinguishing normal and abnormal connections.


Big data Veracity Belief function theory Classification under uncertainty Intrusion detection 


  1. 1.
    Abbes T, Bouhoula A, Rusinowitch M (2004) Protocol analysis in intrusion detection using decision tree. In: International conference on information technology: coding and computing, vol 1. IEEE Computer Society, pp 404–408Google Scholar
  2. 2.
    Ajabi M, Boukhris I, Elouedi Z (2016) Big data classification using belief decision trees: Application to intrusion detection. In: International conference on advanced intelligent system and informatics, advances in intelligent systems and computing, vol 407. Springer, Berlin, pp 369–379Google Scholar
  3. 3.
    Akamine M, Ajmera J (2012) Decision tree-based acoustic models for speech recognition. EURASIP J Audio Speech Music Process 2012(1):10CrossRefGoogle Scholar
  4. 4.
    Amdahl GM (2007) Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the AFIPS conference proceedings. IEEE Solid State Circuits Soc Newsl 12(3):19–20CrossRefGoogle Scholar
  5. 5.
    Appriou A (1999) Multisensor signal processing in the framework of the theory of evidence. In: NATO/RTA, SCI lecture series 216 on application of mathematical signal processing techniques to mission systemsGoogle Scholar
  6. 6.
    Azar A, El-Metwally S (2013) Decision tree classifiers for automated medical diagnosis. Neural Comput Appl 23(7–8):2387–2403CrossRefGoogle Scholar
  7. 7.
    Ben Amor N, Benferhat S, Elouedi Z (2004) Naive Bayes vs decision trees in intrusion detection systems. In: ACM symposium on applied computing. ACM, pp 420–424Google Scholar
  8. 8.
    Bouzida Y, Cuppens F (2006) Neural networks vs. decision trees for intrusion detection. In: IEEE/IST workshop on monitoring, attack detection and mitigation, vol 28, p 29Google Scholar
  9. 9.
    Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonMATHGoogle Scholar
  10. 10.
    Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307CrossRefGoogle Scholar
  11. 11.
    Chen C, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRefGoogle Scholar
  12. 12.
    Chimphlee W, Abdullah AH, Sap MNM, Srinoy S, Chimphlee S (2006) Anomaly-based intrusion detection using fuzzy rough clustering. In: International conference on hybrid information technology, vol 1. IEEE, pp 329–334Google Scholar
  13. 13.
    Dai W, Ji W (2014) A mapreduce implementation of c4.5 decision tree algorithm. Int J Database Theory Appl 7(1):49–60CrossRefGoogle Scholar
  14. 14.
    Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  15. 15.
    Dempster A (1968) A generalization of Bayesian inference. J R Stat Soc Ser B (Methodol) 30:205–247Google Scholar
  16. 16.
    Denœux T, Zouhal LM (2001) Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst 122:47–62MathSciNetMATHGoogle Scholar
  17. 17.
    Depren O, Topallar M, Anarim E, Ciliz MK (2005) An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl 29(4):713–722CrossRefGoogle Scholar
  18. 18.
    Destercke S, Dubois D (2009) Can the minimum rule of possibility theory be extended to belief functions? In: European conference on symbolic and quantitative approaches to reasoning with uncertainty. Lecture Notes in Computer Science, vol 5590. Springer, Berlin, pp 299–310Google Scholar
  19. 19.
    Elouedi Z, Mellouli K (2001) Induction of belief decision trees: a conjunctive approach. In: Conference of the applied stochastic models and data analysis, pp 404–409Google Scholar
  20. 20.
    Elouedi Z, Mellouli K, Smets P (2000) Decision trees using the belief function theory. In: International conference on information processing and management of uncertainty, vol 1, pp 141–148Google Scholar
  21. 21.
    Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: theoretical foundations. Int J Approx Reason 28(2):91–124MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Ferrera P, De Prado I, Palacios E, Fernandez-Marquez J, Di Marzo Serugendo G (2014) Tuple mapreduce and pangool: an associated implementation. Knowl Inf Syst 41(2):531–557CrossRefGoogle Scholar
  23. 23.
    Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, HobokenGoogle Scholar
  24. 24.
    Koc L, Mazzuchi TA, Sarkani S (2013) A network intrusion detection system based on a hidden nave Bayes multiclass classifier. Expert Syst Appl 39(18):13492–13500CrossRefGoogle Scholar
  25. 25.
    Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Rec 40(4):11–20CrossRefGoogle Scholar
  26. 26.
    Lin SW, Ying KC, Lee CY, Lee ZJ (2012) An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 12(10):3285–3290CrossRefGoogle Scholar
  27. 27.
    Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) The 1999 DARPA off-line intrusion detection evaluation. Comput Netw 34(4):579–595CrossRefGoogle Scholar
  28. 28.
    Liu G, Wang X (2008) An integrated intrusion detection system by using multiple neural networks. In: IEEE conference on cybernetics and intelligent systems. IEEE, pp 22–27Google Scholar
  29. 29.
    Madden S (2012) From databases to big data. IEEE Internet Comput 3:4–6CrossRefGoogle Scholar
  30. 30.
    Mehta M, Agrawal R, Rissanen J (1996) SLIQ: a fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G (eds) Advances in database technology. Springer, Berlin, pp 18–32Google Scholar
  31. 31.
    Om H, Kundu A (2012) A hybrid system for reducing the false alarm rate of anomaly intrusion detection system. In: International conference on recent advances in information technology (RAIT). IEEE, pp 131–136Google Scholar
  32. 32.
    Patel J, Katkar V (2016) A multi-classifiers based novel DoS/DDoS attack detection using fuzzy logic. In: International conference on ICT for sustainable development. Springer, Berlin, pp 809–815Google Scholar
  33. 33.
    Pathan ASK (2014) The state of the art in intrusion prevention and detection. CRC Press, Boca ratonCrossRefGoogle Scholar
  34. 34.
    Quinlan J (2014) C4.5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  35. 35.
    Ramos V, Abraham A (2005) Antids: self organized ant-based clustering model for intrusion detection system. In: Abraham A, Dote Y, Furuhashi T, Köppen M, Ohuchi A, Ohsawa Y (eds) Soft Computing as transdisciplinary science and technology. Springer, Berlin, pp 977–986Google Scholar
  36. 36.
    Sagiroglu S, Sinanc D (2013) Big data: a review. In: International conference on collaboration technologies and systems. IEEE, pp 42–47Google Scholar
  37. 37.
    Scott SL (2004) A bayesian paradigm for designing intrusion detection systems. Comput Stat Data Anal 45(1):69–83MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, PrincetonMATHGoogle Scholar
  39. 39.
    Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Symposium on mass storage systems and technologies. IEEE, pp 1–10Google Scholar
  40. 40.
    Smets P (1998) The transferable belief model for quantified belief representation, vol 1. Kluwer, Dordrecht, pp 267–301MATHGoogle Scholar
  41. 41.
    Stolfo S (1999) KDD cup 1999 dataset. KDD repository. University of California, Irvine.
  42. 42.
    Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41(4):70–73CrossRefGoogle Scholar
  43. 43.
    Trabelsi S, Elouedi Z, El Aroui M (2014) Incremental induction of belief decision trees in averaging approach. In: International conference on database and expert systems applications. Springer, Berlin, pp 454–461Google Scholar
  44. 44.
    Trabelsi S, Elouedi Z, Mellouli K (2006) Pruning method of belief decision trees. World Acad Sci Eng Technol 21:100–105Google Scholar
  45. 45.
    Trabelsi S, Elouedi Z, Mellouli K (2007) Pruning belief decision tree methods in averaging and conjunctive approaches. Int J Approx Reason 46(3):568–595MathSciNetCrossRefMATHGoogle Scholar
  46. 46.
    Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–12000CrossRefGoogle Scholar
  47. 47.
    White T (2012) Hadoop: the definitive guide. O’Reilly Media, SebastopolGoogle Scholar
  48. 48.
    Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRefGoogle Scholar
  49. 49.
    Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107CrossRefGoogle Scholar
  50. 50.
    Yao Y, Lingras P (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(12):81–106MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Yu J, Lee H, Kim MS, Park D (2008) Traffic flooding attack detection with SNMP MIB using SVM. Comput Commun 31(17):4212–4219CrossRefGoogle Scholar
  52. 52.
    Zhang Z, Shen H (2005) Application of online-training SVMs for real-time intrusion detection with different considerations. Comput Commun 28(12):1428–1442CrossRefGoogle Scholar
  53. 53.
    Haines JW, Lippmann RP, Fried DJ, Zissman MA, Tran E (2001) 1999 DARPA intrusion detection evaluation: design and procedures. DTIC DocumentGoogle Scholar
  54. 54.
    Zuech R, Khoshgoftaar TM, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. J Big Data 2(1):1–41CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2017

Authors and Affiliations

  1. 1.LARODEC, Institut Supérieur de GestionUniversité de TunisTunisTunisia

Personalised recommendations