Skip to main content

A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique

  • Chapter
  • First Online:
New Developments in Unsupervised Outlier Detection

Abstract

The problem of outlier analysis is an important one because of its applicability to a variety of problem domains such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce, and so on. Many models have been developed for outlier detection, including probabilistic models, distance-based models, density-based models, and clustering models. These models extract various indicators (e.g., frequencies of certain values) from the amount of available data, which are useful in understanding the behaviors of the outliers. Spectral clustering receives much attention as a competitive clustering algorithms emerging in recent years. However, it is not very well scalable to modern large datasets. To partially circumvent this drawback, in this chapter, we propose a new outlier detection method inspired by spectral clustering. Our algorithm combines the concept of k-nearest neighbors and spectral clustering techniques to obtain the abnormal data as outliers by using the information of eigenvalues in the feature space statistically. We compare the performance of the proposed method with state-of-the-art outlier detection methods. Experimental results show the effectiveness of our algorithm for identifying outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.

    Article  Google Scholar 

  2. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.

    Google Scholar 

  3. Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1), 7–27.

    Article  Google Scholar 

  4. Bach, F. R., & Jordan, M. I. (2004). Blind one-microphone speech separation: A spectral learning approach. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 65–72.

    Google Scholar 

  5. Ding, C., He, X., Zha, H., et al. (2001). A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01), California, USA, pp. 107–114.

    Google Scholar 

  6. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  7. Stoer, M., & Wagner, F. (1997). A simple min-cut algorithm. Journal of the ACM, 44(4), 585–591.

    Article  MathSciNet  Google Scholar 

  8. Hagen, L., & Kahng, A. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design, 11(9), 1074–1085.

    Article  Google Scholar 

  9. Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 1601–1608.

    Google Scholar 

  10. Bojchevski, A., Matkovic, Y., & G¨unnemann, S. (2017). Robust spectral clustering for noisy data. In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17), Halifax, NS, Canada, pp. 737–746.

    Google Scholar 

  11. Wu, L., Chen, P.-Y., Yen, I.E.-H., Xu, F., Xia, Y., & Aggarwal, C. (2018). Scalable spectral clustering using random binning features. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), London, United Kingdom, pp. 2506–2515.

    Google Scholar 

  12. Tan, M., Zhang, S., & Wu, L. (2018). Mutual kNN based spectral clustering. Neural Computing and Applications. https://doi.org/10.1007/s00521-018-3836-z

    Article  Google Scholar 

  13. Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.

    Article  Google Scholar 

  14. Yang, X., Yu, W., Wang, R., Zhang, G., & Nie, F. (2020). Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters, 130, 345–352.

    Article  Google Scholar 

  15. Pang, Y., Xie, J., Nie, F., & Li, X. (2020). Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 50(1), 247–258.

    Article  Google Scholar 

  16. Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-phase clustering process for outlier detection. Pattern Recognition Letters, 22(6–7), 691–700.

    Article  Google Scholar 

  17. Yu, D., Sheikholeslami, G., & Zang, . (2002). FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4, 387–412.

    Article  Google Scholar 

  18. Wang, C. H. (2008). Recognition of semiconductor defect patterns using spatial filtering and spectral clustering. Expert Systems with Applications, 34(3), 1914–1923.

    Article  Google Scholar 

  19. Xiang, T., & Gong, S. (2008). Spectral clustering with eigenvector selection. Pattern Recognition, 41(3), 1012–1029.

    Article  Google Scholar 

  20. Filippone, M., Camastra, F., & Masulli, F. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190.

    Article  Google Scholar 

  21. Luxburg, U. V. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  22. Yang, P., & Huang, B. (2008). An outlier detection algorithm based on spectral clustering. In Proceedings of 2008 Pacific-Asia Workshop on Computational Intelligence and Industrial Application (PACIIA 2008), Wuhan, China, pp. 507–510.

    Google Scholar 

  23. He, Z. Y., Xu, X. F., & Deng, S. C. (2003). Discovering cluster based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650.

    Article  Google Scholar 

  24. Yang, P., & Huang, B. (2008). A spectral clustering algorithm for outlier detection. In Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering (FITME’08), Leicestershire, United kingdom, pp. 33–36.

    Google Scholar 

  25. Lin, H., & Zhu, Q. (2012). A spectral clustering-based dataset structure analysis and outlier detection progress. Journal of Computational Information Systems, 8(1), 115–124.

    Google Scholar 

  26. Tyuryukanov, I., van der Meijden, M. A. M. M., Terzija, V., & Popov, M. (2018). Spectral MST-based graph outlier detection with application to clustering of power networks. In Proceedings of the 20th Power Systems Computation Conference (PSCC’18), Dublin, Ireland.

    Google Scholar 

  27. Aggarwal, C. C. (2013). Outlier Analysis. Springer.

    Google Scholar 

  28. Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., & Feng, B. (2010). gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM’10), pp. 481–490.

    Google Scholar 

  29. Xiong, L., Chen, X., & Schneider, J. (2011). Direct robust matrix factorization for anomaly detection. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11), Vancouver, BC, Canada, pp. 844–853.

    Google Scholar 

  30. Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA, pp. 444–452.

    Google Scholar 

  31. Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3–4), 237–253.

    Article  Google Scholar 

  32. Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’00), Dallas, pp.427–438.

    Google Scholar 

  33. Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United states, pp.93–104.

    Google Scholar 

  34. Angiulli, F. and Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki, pp.15–26.

    Google Scholar 

  35. Wang, Y., Wang, X., & Wang, X. L. (2016). A spectral clustering based outlier detection technique. In Proceedings of 12th International Conference on Machine Learning and Data Mining, New York, USA, pp.15–27.

    Google Scholar 

  36. UCI: The UCIKDD Archive, University of California, Irvine, CA. https://kdd.ics.uci.edu/.

  37. Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA, pp. 37–46.

    Google Scholar 

  38. Janssens, J., Huszar, F., Postma, E., & van den Herik, H. (2012). Stochastic outlier selection.

    Google Scholar 

  39. Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X. (2019). Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, https://arxiv.org/abs/1809.10816.

  40. Ru, X., Liu, Z., Huang, Z., et al. (2016). Normalized residual-based constant false-alarm rate outlier detection. Pattern Recognition Letters, 69, 1–7.

    Article  Google Scholar 

  41. Tang, B., & He, H. (2017). A local density-based approach for outlier detection. Neurocomputing, 241, 171–180.

    Article  Google Scholar 

  42. Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134.

    Article  Google Scholar 

Download references

Acknowledgements

This chapter was modified from the paper published by our group in “machine learning and data mining in pattern recognition” [35]. The related contents are reused with permission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Xi'an Jiaotong University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_6

Download citation

Publish with us

Policies and ethics