A Review on Scalable Learning Approches on Intrusion Detection Dataset

  • Santosh Kumar SahuEmail author
  • Durga Prasad Mohapatra
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 597)


There has been much excitement recently about Big Data and the dire need for data scientists who possess the ability to extract meaning from it. Data scientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But, now those large, complex datasets should process smartly. As a result, it improves productivity by reducing the computational process. As a result, Big Data analytics takes a vital role in intrusion detection. It provides tools to support structured, unstructured, and semi-structured data for analytics. Also, it offers scalable machine learning algorithms for fast processing of data using machine learning approach. It also provides tools to visualize a large amount of data in a practical way that motivates us to implement our model using scalable machine learning approach. In this work, we describe a scalable machine learning algorithm for threat classification. The algorithm has been designed to work even with a relatively small training set and support to classify a large volume of testing data. Different machine learning approaches implemented and evaluated using intrusion dataset. The data is normalized using the min–max normalization technique, and for SVM classification, data transforms into sparse representation for reducing computational time. Then using Apache Hive, we store the processed data into HDFS format. All the methods except the neural network are implemented using Apache Spark. Out of all the approaches, the fine KNN approach outperforms in terms of accuracy in a reasonable computational time, whereas the Bagged Tree approach achieves slightly less accuracy but takes less computational time for classifying the data.


Intrusion detection Apache Spark Big Data Machine learning SVM KNN Neural network Ensemble approach 


  1. 1.
    NSLKDD.: Dataset Homepage,
  2. 2.
    Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput Sci 93, 824–831 (2016)CrossRefGoogle Scholar
  3. 3.
    Keegan, N., Ji, S.-Y., Chaudhary, A., Concolato, C., Yu, B., Jeong, D.H.: A survey of cloud-based network intrusion detection analysis. Human-centric Comput. Inf. Sci. 6(1), 19 (2016)Google Scholar
  4. 4.
    Kulariya, M., Saraf, P., Ranjan, R., Gupta, G.P.: Performance analysis of network intrusion detection schemes using Apache Spark. In: 2016 International Conference on Communication and Signal Processing (ICCSP), pp. 1973–1977. IEEE (2016)Google Scholar
  5. 5.
    Mavridis, I., Karatza, H.: Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark. J. Syst. Softw. 125, 133–151 (2017)CrossRefGoogle Scholar
  6. 6.
    Kumari, R., Singh, M.K., Jha, R., Singh, N.K.: Anomaly detection in network traffic using K-mean clustering. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), pp. 387–393. IEEE (2016)Google Scholar
  7. 7.
    Hsieh, C.-J., Chan, T.-Y.: Detection DDoS attacks based on neural-network using Apache Spark. In: 2016 International Conference on Applied System Innovation (ICASI), pp. 1–4. IEEE (2016)Google Scholar
  8. 8.
    Mavridis, I., Karatza, E.: Log file analysis in the cloud with Apache Hadoop and Apache Spark (2015)Google Scholar
  9. 9.
    Rathore, M.M., Paul, A., Ahmad, A., Rho, S., Imran, M., Guizani, M.: Hadoop based real-time intrusion detection for high-speed networks. In: 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2016)Google Scholar
  10. 10.
    Sahu, S., Sarangi, S., Jena, S.: A detailed analysis on intrusion detection datasets. In: 2014 IEEE International Advance Computing Conference (IACC), pp. 1348–1353. IEEE (2014)Google Scholar
  11. 11.
    KDD_Corrected Dataset Homepage, (2016). Accessed 21 Nov 2016
  12. 12.
    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.: A detailed analysis of the KDD CUP 99 data set. Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA) (2009)Google Scholar
  13. 13.
    Sahu, S.K., Katiyar, K., Kumari, K.M., Kumar, G., Mohapatra, D.P.: An SVM-based ensemble approach for intrusion detection. Int. J. Infor. Technol. Web Eng. 14(1), 66–84Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of CSENIT RourkelaRourkelaIndia

Personalised recommendations