A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems

  • Tarfa HamedEmail author
  • Jason B. Ernst
  • Stefan C. Kremer


In this chapter, a new review and taxonomy of the input data and pre-processing techniques of intrusion detection systems are presented. This chapter surveys the literature over the last two decades on the data of intrusion detection systems. We present also in this chapter a framework for understanding the different components described in the literature that allows readers to systematically understand the works and envision future hybrid approaches. The chapter describes how to collect the data, and how to prepare this data for different types of processing. We opted to organize the chapter along a component-by-component structure, rather than a paper-by-paper organization, since we believe this will give the reader a wider perspective about the process of constructing an intrusion detection system and its evaluation mechanisms. The organization of this chapter represents an ideal intrusion detection system since it contains most of the components of IDS, so existing approaches can be neatly accommodated within this framework. This will allow the reader to construct and explore new systems by assembling the described components in novel arrangements. We have also conducted important comparisons after each component of IDS supported by some tables to give the reader a better perspective about that particular component. In this sense, it provides insights that a reader would not gain by simply reading the original source papers. The classifiers used with IDS are beyond the scope of this chapter.


  1. 1.
    Aghaei-Foroushani, V., & Zincir-Heywood, A. N. (2013). On evaluating ip traceback schemes: a practical perspective. In 2013 IEEE Security and privacy workshops (SPW) (pp. 127–134). Piscataway, NJ: IEEE.CrossRefGoogle Scholar
  2. 2.
    Al-Jarrah, O., & Arafat, A. (2015). Network intrusion detection system using neural network classification of attack behavior. Journal of Advances in Information Technology, 6(1), 291–295.Google Scholar
  3. 3.
    Alata, E., Nicomette, V., Kaaâniche, M., Dacier, M., & Herrb, M. (2006). Lessons learned from the deployment of a high-interaction honeypot. In Sixth European Dependable Computing Conference, 2006. EDCC ’06 (pp. 39–46). doi:10.1109/EDCC.2006.17.Google Scholar
  4. 4.
    Baecher, P., Koetter, M., Dornseif, M., & Freiling, F. (2006). The nepenthes platform: An efficient approach to collect malware. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID) (pp. 165–184). Berlin: Springer.CrossRefGoogle Scholar
  5. 5.
    Balkanli, E., & Zincir-Heywood, A. (2014). On the analysis of backscatter traffic. In 2014 IEEE 39th Conference on Local Computer Networks Workshops (LCN Workshops) (pp. 671–678). doi:10.1109/LCNW.2014.6927719.Google Scholar
  6. 6.
    Baumann, R. (2005). Honeyd–a low involvement honeypot in action. Originally published as part of the GCIA (GIAC Certified Intrusion Analyst) practical (2003)Google Scholar
  7. 7.
    Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496. Scholar
  8. 8.
    Bhuse, V., & Gupta, A. (2006). Anomaly intrusion detection in wireless sensor networks. Journal of High Speed Networks, 15(1), 33–51.Google Scholar
  9. 9.
    Casas, P., Mazel, J., & Owezarski, P. (2012). Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Computer Communications, 35(7), 772–783.,
  10. 10.
    Chimedtseren, E., Iwai, K., Tanaka, H., & Kurokawa, T. (2014). Intrusion detection system using discrete Fourier transform. In 2014 Seventh IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA) (pp. 1–5). doi:10.1109/CISDA.2014.7035624.Google Scholar
  11. 11.
    Gaikwad, D., & Thool, R. C. (2015). Intrusion detection system using bagging ensemble method of machine learning. In 2015 International Conference on Computing Communication Control and Automation (ICCUBEA) (pp. 291–295). Piscataway, NJ: IEEE.CrossRefGoogle Scholar
  12. 12.
    Gong, Y., Mabu, S., Chen, C., Wang, Y., & Hirasawa, K. (2009). Intrusion detection system combining misuse detection and anomaly detection using genetic network programming. In ICCAS-SICE, 2009 (pp. 3463–3467).Google Scholar
  13. 13.
    Ingre, B., & Yadav, A. (2015). Performance analysis of NSL-KDD dataset using ANN. In 2015 International Conference on Signal Processing and Communication Engineering Systems (SPACES) (pp. 92–96). doi:10.1109/SPACES.2015.7058223.Google Scholar
  14. 14.
    Jadhav, A., Jadhav, A., Jadhav, P., & Kulkarni, P. (2013). A novel approach for the design of network intrusion detection system(NIDS). In 2013 International Conference on Sensor Network Security Technology and Privacy Communication System (SNS PCS) (pp. 22–27). doi:10.1109/SNS-PCS.2013.6553828.Google Scholar
  15. 15.
    Jamali, S., & Shaker, V. (2014). Defense against {SYN} flooding attacks: A particle swarm optimization approach. Computers and Electrical Engineering, 40(6), 2013–2025.,
  16. 16.
    Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69–75.CrossRefGoogle Scholar
  17. 17.
    Kayacik, H., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. Zeng, F. Y. Wang, H. Chen, & R. Merkle (Eds.), Intelligence and security informatics. Lecture notes in computer science (Vol. 3495, pp. 362–367). Berlin/Heidelberg: Springer. doi:10.1007/11427995_29,
  18. 18.
    Kim, H. G., Kim, D. J., Cho, S. J., Park, M., & Park, M. (2011). An efficient visitation algorithm to improve the detection speed of high-interaction client honeypots. In Proceedings of the 2011 ACM Symposium on Research in Applied Computation (pp. 266–271). New York: ACM. doi:10.1145/2103380.2103435, Scholar
  19. 19.
    Kim, J., Bentley, P. J., Aickelin, U., Greensmith, J., Tedesco, G., & Twycross, J. (2007). Immune system approaches to intrusion detection–a review. Natural Computing, 6(4), 413–466.MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Lan, F., Chunlei, W., & Guoqing, M. (2010). A framework for network security situation awareness based on knowledge discovery. In 2010 2nd International Conference on Computer Engineering and Technology (ICCET) (Vol. 1, pp. 226–231). Piscataway, NJ: IEEE.Google Scholar
  21. 21.
    Lane, T. (2006). A decision-theoretic, semi-supervised model for intrusion detection. In Machine learning and data mining for computer security (pp. 157–177). London: Springer.CrossRefGoogle Scholar
  22. 22.
    Lane, T., & Brodley, C. E. (1997). An application of machine learning to anomaly detection. In Proceedings of the 20th National Information Systems Security Conference (pp. 366–377).Google Scholar
  23. 23.
    Li, Y., Fang, B. X., Chen, Y., & Guo, L. (2006). A lightweight intrusion detection model based on feature selection and maximum entropy model. In 2006 International Conference on Communication Technology (pp. 1–4). doi:10.1109/ICCT.2006.341771.Google Scholar
  24. 24.
    Ligh, M., Adair, S., Hartstein, B., & Richard, M. (2011). Malware analyst’s cookbook and DVD: Tools and techniques for fighting malicious code. Hoboken: Wiley Publishing.Google Scholar
  25. 25.
    Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 78(0), 13–21.,
  26. 26.
    Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502. doi:10.1109/TKDE.2005.66.MathSciNetCrossRefGoogle Scholar
  27. 27.
    Mahoney, M. V., & Chan, P. K. (2001). Phad: Packet header anomaly detection for identifying hostile network traffic (Tech. Rep. CS-2001-4), Florida Institute of Technology, Melbourne, FL, USA.Google Scholar
  28. 28.
    McGraw, G., & Morrisett, G. (2000). Attacking malicious code: A report to the infosec research council. IEEE Software, 17(5), 33–41.CrossRefGoogle Scholar
  29. 29.
    MeeraGandhi, G., & Appavoo, K. (2010). Effective network intrusion detection using classifiers decision trees and decision rules. International Journal of Advanced Networking and Applications, 2(3), 686–692.Google Scholar
  30. 30.
    Mehta, V., Bahadur, P., Kapoor, M., Singh, P., & Rajpoot, S. (2015). Threat prediction using honeypot and machine learning. In 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (pp. 278–282). doi:10.1109/ABLAZE.2015.7155011.Google Scholar
  31. 31.
    Mo, Y., Ma, Y., & Xu, L. (2008). Design and implementation of intrusion detection based on mobile agents. In: IEEE International Symposium on IT in Medicine and Education, 2008 (pp. 278–281). doi:10.1109/ITME.2008.4743870.Google Scholar
  32. 32.
    Mohanapriya, M., & Krishnamurthi, I. (2014). Modified DSR protocol for detection and removal of selective black hole attack in MANET. Computers and Electrical Engineering, 40(2), 530–538.,
  33. 33.
    Muehlbach, S., & Koch, A. (2012). Malacoda: Towards high-level compilation of network security applications on reconfigurable hardware. In Proceedings of the Eighth ACM/IEEE Symposium on Architectures for Networking and Communications Systems (pp. 247–258). New York: ACM.CrossRefGoogle Scholar
  34. 34.
    Muzammil, M., Qazi, S., & Ali, T. (2013). Comparative analysis of classification algorithms performance for statistical based intrusion detection system. In 2013 3rd International Conference on Computer, Control Communication (IC4) (pp. 1–6). doi:10.1109/IC4.2013.6653738.Google Scholar
  35. 35.
    Nechaev, B., Allman, M., Paxson, V., & Gurtov, A. (2010). A preliminary analysis of TCP performance in an enterprise network. In Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking, USENIX Association (pp. 1–6).Google Scholar
  36. 36.
    Ng, J., Joshi, D., & Banik, S. (2015). Applying data mining techniques to intrusion detection. In 2015 12th International Conference on Information Technology – New Generations (ITNG) (pp. 800–801). doi:10.1109/ITNG.2015.146.Google Scholar
  37. 37.
    Northcutt, S., & Novak, J. (2003). Network intrusion detection. Indianapolis: Sams Publishing.Google Scholar
  38. 38.
    Pannell, G., & Ashman, H. (2010). Anomaly detection over user profiles for intrusion detection. In Proceedings of the 8th Australian Information Security Management Conference, School of Computer and Information Science, Edith Cowan University, Perth, Western Australia (pp. 81–94)Google Scholar
  39. 39.
    Portokalidis, G., & Bos, H. (2007). Sweetbait: Zero-hour worm detection and containment using low-and high-interaction honeypots. Computer Networks, 51(5), 1256–1274.CrossRefzbMATHGoogle Scholar
  40. 40.
    Project, T. H. (2009). Dionaea. Accessed February 2013.Google Scholar
  41. 41.
    Provos N (2004) A virtual honeypot framework. In: Proceedings of the 13th Conference on USENIX Security Symposium - Volume 13, USENIX Association, Berkeley, CA, USA, SSYM’04, pp 1-14,
  42. 42.
    Richharya, V., Rana, D. J., Jain, D. R., & Pandey, D. K. (2013). Design of trust model for efficient cyber attack detection on fuzzified large data using data mining techniques. International Journal of Research in Computer and Communication Technology, 2(3), 126–130.Google Scholar
  43. 43.
    Rieck, K., Schwenk, G., Limmer, T., Holz, T., & Laskov, P. (2010). Botzilla: Detecting the phoning home of malicious software. In proceedings of the 2010 ACM Symposium on Applied Computing (pp. 1978–1984). New York: ACM.Google Scholar
  44. 44.
    Schonlau, M., DuMouchel, W., Ju, W. H., Karr, A. F., Theus, M., & Vardi, Y. (2001). Computer intrusion: Detecting masquerades. Statistical Science, 16(1), 58–74.MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Seifert, C., Welch, I., & Komisarczuk, P. (2008). Application of divide-and-conquer algorithm paradigm to improve the detection speed of high interaction client honeypots. In Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 1426–1432. New York: ACM.CrossRefGoogle Scholar
  46. 46.
    Sekar, R., Gupta, A., Frullo, J., Shanbhag, T., Tiwari, A., Yang, H., et al. (2002). Specification-based anomaly detection: A new approach for detecting network intrusions. In Proceedings of the 9th ACM Conference on Computer and Communications Security (pp. 265–274). New York: ACM.Google Scholar
  47. 47.
    Sen, J. (2010). Efficient routing anomaly detection in wireless mesh networks. In 2010 First International Conference on Integrated Intelligent Computing (ICIIC) (pp. 302–307). doi:10.1109/ICIIC.2010.22.Google Scholar
  48. 48.
    Shanmugavadivu, R., & Nagarajan, N. (2011). Network intrusion detection system using fuzzy logic. Indian Journal of Computer Science and Engineering (IJCSE), 2(1), 101–111.Google Scholar
  49. 49.
    Sharma, V., & Nema, A. (2013). Innovative genetic approach for intrusion detection by using decision tree. In 2013 International Conference on Communication Systems and Network Technologies (CSNT) (pp. 418–422). doi:10.1109/CSNT.2013.93.Google Scholar
  50. 50.
    Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers and Security, 31(3), 357–374.,
  51. 51.
    Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), 3799–3821.CrossRefGoogle Scholar
  52. 52.
    Singh, S., & Silakari, S. (2009). A survey of cyber attack detection systems. International Journal of Computer Science and Network Security (IJCSNS), 9(5), 1–10.Google Scholar
  53. 53.
    Subramanian, U., & Ong, H. S. (2014). Analysis of the effect of clustering the training data in naive bayes classifier for anomaly network intrusion detection. Journal of Advances in Computer Networks, 2(1), 85–88.CrossRefGoogle Scholar
  54. 54.
    Tan, Z., Jamdagni, A., He, X., Nanda, P., Liu, R. P., & Hu, J. (2015). Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers, 64(9), 2519–2533. doi:10.1109/TC.2014.2375218.MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 (pp. 53–58).Google Scholar
  56. 56.
    Teng, L., Teng, S., Tang, F., Zhu, H., Zhang, W., Liu, D., et al. (2014). A collaborative and adaptive intrusion detection based on SVMs and decision trees. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 898–905). doi:10.1109/ICDMW.2014.147.Google Scholar
  57. 57.
    Terry, S., & Chow, B. J. (2005). An assessment of the DARPA IDS evaluation dataset using snort (Tech. rep.), UC Davis Technical Report.Google Scholar
  58. 58.
    Thaseen, S., & Kumar, C. A. (2013). An analysis of supervised tree based classifiers for intrusion detection system. In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (pp. 294–299). doi:10.1109/ICPRIME.2013.6496489.Google Scholar
  59. 59.
    Thomas, C., Sharma, V., & Balakrishnan, N. (2008). Usefulness of darpa dataset for intrusion detection system evaluation. In SPIE Defense and Security Symposium, International Society for Optics and Photonics (pp. 1–8)Google Scholar
  60. 60.
    Trinius, P., Holz, T., Willems, C., & Rieck, K. (2009). A malware instruction set for behavior-based analysis (Tech. Rep. TR-2009-07), University of Mannheim.Google Scholar
  61. 61.
    Van Jacobson, C. L., & McCanne, S. (1987). Tcpdump. Accessed January 7, 2014.
  62. 62.
    Wang, W., Guyet, T., Quiniou, R., Cordier, M. O., Masseglia, F., & Zhang, X. (2014). Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowledge-Based Systems, 70(0), 103–117.,
  63. 63.
    Warrender, C., Forrest, S., & Pearlmutter, B. (1999). Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, 1999 (pp. 133–145). doi:10.1109/SECPRI.1999.766910.Google Scholar
  64. 64.
    Xiaoqing, G., Hebin, G., & Luyi, C. (2010). Network intrusion detection method based on agent and SVM. In 2010 The 2nd IEEE International Conference on Information Management and Engineering (ICIME) (pp. 399–402). Piscataway, NJ: IEEE.CrossRefGoogle Scholar
  65. 65.
    Yanjun, Z., Jun, W. M., & Jing, W. (2013). Realization of intrusion detection system based on the improved data mining technology. In 2013 8th International Conference on Computer Science Education (ICCSE) (pp. 982–987). doi:10.1109/ICCSE.2013.6554056.Google Scholar
  66. 66.
    Yassin, W., Udzir, N. I., Abdullah, A., Abdullah, M. T., Zulzalil, H., & Muda, Z. (2014). Signature-based anomaly intrusion detection using integrated data mining classifiers. In 2014 International Symposium on Biometrics and Security Technologies (ISBAST) (pp. 232–237). doi:10.1109/ISBAST.2014.7013127.Google Scholar
  67. 67.
    Ying, L., Yan, Z., & Yang-Jia, O. (2010). The design and implementation of host-based intrusion detection system. In 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI) (pp. 595–598). doi:10.1109/IITSI.2010.127.Google Scholar
  68. 68.
    Zou, X., Pan, Y., & Dai, Y.-S. (2008). Trust and security in collaborative computing. Singapore: World Scientific.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Tarfa Hamed
    • 1
    Email author
  • Jason B. Ernst
    • 2
  • Stefan C. Kremer
    • 1
  1. 1.School of Computer ScienceUniversity of GuelphGuelphCanada
  2. 2.Left Inc.VancouverCanada

Personalised recommendations