A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique

Wang, Xiaochun; Wang, Xiali; Wilkes, Mitch

doi:10.1007/978-981-15-9519-6_6

Xiaochun Wang⁴,
Xiali Wang⁵ &
Mitch Wilkes⁶

533 Accesses
2 Citations

Abstract

The problem of outlier analysis is an important one because of its applicability to a variety of problem domains such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce, and so on. Many models have been developed for outlier detection, including probabilistic models, distance-based models, density-based models, and clustering models. These models extract various indicators (e.g., frequencies of certain values) from the amount of available data, which are useful in understanding the behaviors of the outliers. Spectral clustering receives much attention as a competitive clustering algorithms emerging in recent years. However, it is not very well scalable to modern large datasets. To partially circumvent this drawback, in this chapter, we propose a new outlier detection method inspired by spectral clustering. Our algorithm combines the concept of k-nearest neighbors and spectral clustering techniques to obtain the abnormal data as outliers by using the information of eigenvalues in the feature space statistically. We compare the performance of the proposed method with state-of-the-art outlier detection methods. Experimental results show the effectiveness of our algorithm for identifying outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
Article Google Scholar
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.
Google Scholar
Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1), 7–27.
Article Google Scholar
Bach, F. R., & Jordan, M. I. (2004). Blind one-microphone speech separation: A spectral learning approach. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 65–72.
Google Scholar
Ding, C., He, X., Zha, H., et al. (2001). A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01), California, USA, pp. 107–114.
Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Stoer, M., & Wagner, F. (1997). A simple min-cut algorithm. Journal of the ACM, 44(4), 585–591.
Article MathSciNet Google Scholar
Hagen, L., & Kahng, A. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design, 11(9), 1074–1085.
Article Google Scholar
Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 1601–1608.
Google Scholar
Bojchevski, A., Matkovic, Y., & G¨unnemann, S. (2017). Robust spectral clustering for noisy data. In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17), Halifax, NS, Canada, pp. 737–746.
Google Scholar
Wu, L., Chen, P.-Y., Yen, I.E.-H., Xu, F., Xia, Y., & Aggarwal, C. (2018). Scalable spectral clustering using random binning features. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), London, United Kingdom, pp. 2506–2515.
Google Scholar
Tan, M., Zhang, S., & Wu, L. (2018). Mutual kNN based spectral clustering. Neural Computing and Applications. https://doi.org/10.1007/s00521-018-3836-z
Article Google Scholar
Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.
Article Google Scholar
Yang, X., Yu, W., Wang, R., Zhang, G., & Nie, F. (2020). Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters, 130, 345–352.
Article Google Scholar
Pang, Y., Xie, J., Nie, F., & Li, X. (2020). Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 50(1), 247–258.
Article Google Scholar
Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-phase clustering process for outlier detection. Pattern Recognition Letters, 22(6–7), 691–700.
Article Google Scholar
Yu, D., Sheikholeslami, G., & Zang, . (2002). FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4, 387–412.
Article Google Scholar
Wang, C. H. (2008). Recognition of semiconductor defect patterns using spatial filtering and spectral clustering. Expert Systems with Applications, 34(3), 1914–1923.
Article Google Scholar
Xiang, T., & Gong, S. (2008). Spectral clustering with eigenvector selection. Pattern Recognition, 41(3), 1012–1029.
Article Google Scholar
Filippone, M., Camastra, F., & Masulli, F. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190.
Article Google Scholar
Luxburg, U. V. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
Yang, P., & Huang, B. (2008). An outlier detection algorithm based on spectral clustering. In Proceedings of 2008 Pacific-Asia Workshop on Computational Intelligence and Industrial Application (PACIIA 2008), Wuhan, China, pp. 507–510.
Google Scholar
He, Z. Y., Xu, X. F., & Deng, S. C. (2003). Discovering cluster based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650.
Article Google Scholar
Yang, P., & Huang, B. (2008). A spectral clustering algorithm for outlier detection. In Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering (FITME’08), Leicestershire, United kingdom, pp. 33–36.
Google Scholar
Lin, H., & Zhu, Q. (2012). A spectral clustering-based dataset structure analysis and outlier detection progress. Journal of Computational Information Systems, 8(1), 115–124.
Google Scholar
Tyuryukanov, I., van der Meijden, M. A. M. M., Terzija, V., & Popov, M. (2018). Spectral MST-based graph outlier detection with application to clustering of power networks. In Proceedings of the 20th Power Systems Computation Conference (PSCC’18), Dublin, Ireland.
Google Scholar
Aggarwal, C. C. (2013). Outlier Analysis. Springer.
Google Scholar
Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., & Feng, B. (2010). gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM’10), pp. 481–490.
Google Scholar
Xiong, L., Chen, X., & Schneider, J. (2011). Direct robust matrix factorization for anomaly detection. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11), Vancouver, BC, Canada, pp. 844–853.
Google Scholar
Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA, pp. 444–452.
Google Scholar
Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3–4), 237–253.
Article Google Scholar
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’00), Dallas, pp.427–438.
Google Scholar
Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United states, pp.93–104.
Google Scholar
Angiulli, F. and Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki, pp.15–26.
Google Scholar
Wang, Y., Wang, X., & Wang, X. L. (2016). A spectral clustering based outlier detection technique. In Proceedings of 12th International Conference on Machine Learning and Data Mining, New York, USA, pp.15–27.
Google Scholar
UCI: The UCIKDD Archive, University of California, Irvine, CA. https://kdd.ics.uci.edu/.
Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA, pp. 37–46.
Google Scholar
Janssens, J., Huszar, F., Postma, E., & van den Herik, H. (2012). Stochastic outlier selection.
Google Scholar
Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X. (2019). Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, https://arxiv.org/abs/1809.10816.
Ru, X., Liu, Z., Huang, Z., et al. (2016). Normalized residual-based constant false-alarm rate outlier detection. Pattern Recognition Letters, 69, 1–7.
Article Google Scholar
Tang, B., & He, H. (2017). A local density-based approach for outlier detection. Neurocomputing, 241, 171–180.
Article Google Scholar
Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134.
Article Google Scholar

Download references

Acknowledgements

This chapter was modified from the paper published by our group in “machine learning and data mining in pattern recognition” [35]. The related contents are reused with permission.

Author information

Authors and Affiliations

School of Software Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
Xiaochun Wang
School of Information Engineering, Chang’an University, Xi’an, Shaanxi, China
Xiali Wang
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Mitch Wilkes

Authors

Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiali Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Wilkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_6

Download citation

DOI: https://doi.org/10.1007/978-981-15-9519-6_6
Published: 25 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9518-9
Online ISBN: 978-981-15-9519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics