Network Anomaly Detection Using Unsupervised Feature Selection and Density Peak Clustering

  • Xiejun Ni
  • Daojing HeEmail author
  • Sammy Chan
  • Farooq Ahmad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9696)


Intrusion detection systems (IDSs) play a significant role to effectively defend our crucial computer systems or networks against attackers on the Internet. Anomaly detection is an effective way to detect intrusion, which can discover patterns that do not conform to expected behavior. The mainstream approaches of ADS (anomaly detection system) are using data mining technology to automatically extract normal pattern and abnormal ones from a large set of network data and distinguish them from each other. However, supervised or semi-supervised approaches in data mining rely on data label information. This is not practical when the network data is large-scale. In this paper, we propose a two-stage approach, unsupervised feature selection and density peak clustering to tackle label lacking situations. First, the density-peak based clustering approach is introduced for network anomaly detection, which considers both distance and density nature of data. Second, to achieve better performance of clustering process, we use maximal information coefficient and feature clustering to remove redundant and irrelevant features. Experimental results show that our method can get rid of useless features of high-dimensional data and achieves high detection accuracy and efficiency in the meanwhile.


Anomaly detection Data mining Feature selection Maximal information coefficient Density peak clustering 



This research is supported by the Pearl River Nova Program of Guangzhou (No. 2014J2200051), the National Science Foundation of China (Grants: 51477056 and 61321064), the Shanghai Rising-Star Program (No. 15QA1401700), the CCF-Tencent Open Research Fund, the Shanghai Knowledge Service Platform for Trustworthy Internet of Things (No. ZF1213), and the Specialized Research Fund for the Doctoral Program of Higher Education. Daojing He is the corresponding author of this article.


  1. 1.
    Heady, R., Luger, G.F., Maccabe, A., et al.: The architecture of a network level intrusion detection system. Department of Computer Science, College of Engineering, University of New Mexico (1990)Google Scholar
  2. 2.
    Barbara, D., Jajodia, S.: Applications of Data Mining in Computer Security. Springer Science & Business Media, New York (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Eskin, E., Arnold, A., Prerau, M., et al.: A geometric framework for unsupervised anomaly detection. In: Barbará, D., Jajodia, S. (eds.) Applications of Data Mining in Computer Security, pp. 77–101. Springer, New York (2002)CrossRefGoogle Scholar
  4. 4.
    Roesch, M.: Snort: lightweight intrusion detection for networks. LISA 99(1), 229–238 (1999)MathSciNetGoogle Scholar
  5. 5.
    Camacho, J, Macia-Fernandez, G, Diaz-Verdejo, J., et al.: Tackling the big data 4 vs for anomaly detection. In: 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 500–505. IEEE (2014)Google Scholar
  6. 6.
    Patcha, A., Park, J.M.: An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12), 3448–3470 (2007)CrossRefGoogle Scholar
  7. 7.
    Luo, Y.B., Wang, B.S., Sun, Y.P., et al.: FL-LPVG: an approach for anomaly detection based on flow-level limited penetrable visibility graph (2013)Google Scholar
  8. 8.
    Tran, Q.A., Duan, H., Li, X.: One-class support vector machine for anomaly network traffic detection. China Education and Research Network (CERNET), Tsinghua University, Main Building, vol. 310 (2004)Google Scholar
  9. 9.
    Hu, W., Hu, W.: Network-based intrusion detection using Adaboost algorithm. In: The 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Proceedings, pp. 712–717. IEEE (2005)Google Scholar
  10. 10.
    Zhou, Q, Gu, L, Wang, C., et al.: Using an improved C4.5 for imbalanced dataset of intrusion. In: Proceedings of the 2006 International Conference on Privacy, Security, Trust: Bridge the Gap Between PST Technologies and Business Services, p. 67. ACM (2006)Google Scholar
  11. 11.
    Zhang, J., Zulkernine, M., Haque, A.: Random-forests-based network intrusion detection systems. IEEE Trans. Syst. Man Cybern Part C Appl. Rev. 38(5), 649–659 (2008)CrossRefGoogle Scholar
  12. 12.
    Tong, X., Wang, Z., Yu, H.: A research using hybrid RBF/Elman neural networks for intrusion detection system secure model. Comput. Phys. Commun. 180(10), 1795–1801 (2009)CrossRefGoogle Scholar
  13. 13.
    Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
  14. 14.
    Leung, K., Leckie, C.: Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the Twenty-Eighth Australasian Conference on Computer Science, vol. 38, pp. 333–342. Australian Computer Society Inc (2005)Google Scholar
  15. 15.
    Zhang, J., Zulkernine, M.: Anomaly based network intrusion detection with unsupervised outlier detection. In: 2006 IEEE International Conference on Communications, ICC 2006, vol. 5, pp. 2388–2393. IEEE (2006)Google Scholar
  16. 16.
    Egilmez, H.E., Ortega, A.: Spectral anomaly detection using graph-based filtering for wireless sensor networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1085–1089. IEEE (2014)Google Scholar
  17. 17.
    Jianliang, M., Haikun, S., Ling B.: The application on intrusion detection based on k-means cluster algorithm. In: 2009 International Forum on Information Technology and Applications, IFITA 2009, vol. 1, pp. 150–152. IEEE (2009)Google Scholar
  18. 18.
    Jiang, W., Yao, M., Yan, J.: Intrusion detection based on improved fuzzy c-means algorithm. In: 2008 International Symposium on Information Science and Engineering, ISISE 2008, vol. 2, pp. 326–329. IEEE (2008)Google Scholar
  19. 19.
    Oh, S.H., Lee, W.S.: An anomaly intrusion detection method by clustering normal user behavior. Comput. Secur. 22(7), 596–612 (2003)CrossRefGoogle Scholar
  20. 20.
    Huang, S.Y., Huang, Y.N.: Network traffic anomaly detection based on growing hierarchical SOM. In: 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 1–2. IEEE (2013)Google Scholar
  21. 21.
    Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1), 37–52 (1987)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Yu, H., Yang, J.: A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34, 2067–2070 (2001)CrossRefzbMATHGoogle Scholar
  23. 23.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  24. 24.
    Qu, G., Hariri, S., Yousif, M.: A new dependency and correlation analysis for features. IEEE Trans. Knowl. Data Eng. 17(9), 1199–1207 (2005)CrossRefGoogle Scholar
  25. 25.
    Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)CrossRefGoogle Scholar
  26. 26.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference, vol. 12, pp. 194–202 (1995)Google Scholar
  27. 27.
    Kwak, N., Choi, C.H.: Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1667–1671 (2002)CrossRefGoogle Scholar
  28. 28.
    Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)CrossRefGoogle Scholar
  29. 29.
    Reshef, D.N., Reshef, Y.A., Finucane, H.K., et al.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)CrossRefGoogle Scholar
  30. 30.
    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  31. 31.
    Cup, K.: Data. knowledge discovery in databases darpa archive (1999)Google Scholar
  32. 32.
    Albanese, D., Filosi, M.: Mine tool.
  33. 33.
    Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Xiejun Ni
    • 1
  • Daojing He
    • 1
    Email author
  • Sammy Chan
    • 2
  • Farooq Ahmad
    • 3
  1. 1.School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina
  2. 2.Department of Electronic EngineeringCity University of Hong KongHong KongChina
  3. 3.Department of Computer ScienceCOMSATS Institute of Information TechnologyLahorePakistan

Personalised recommendations