Abstract
The problem of outlier analysis is an important one because of its applicability to a variety of problem domains such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce, and so on. Many models have been developed for outlier detection, including probabilistic models, distance-based models, density-based models, and clustering models. These models extract various indicators (e.g., frequencies of certain values) from the amount of available data, which are useful in understanding the behaviors of the outliers. Spectral clustering receives much attention as a competitive clustering algorithms emerging in recent years. However, it is not very well scalable to modern large datasets. To partially circumvent this drawback, in this chapter, we propose a new outlier detection method inspired by spectral clustering. Our algorithm combines the concept of k-nearest neighbors and spectral clustering techniques to obtain the abnormal data as outliers by using the information of eigenvalues in the feature space statistically. We compare the performance of the proposed method with state-of-the-art outlier detection methods. Experimental results show the effectiveness of our algorithm for identifying outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.
Malik, J., Belongie, S., Leung, T., et al. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision, 43(1), 7–27.
Bach, F. R., & Jordan, M. I. (2004). Blind one-microphone speech separation: A spectral learning approach. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 65–72.
Ding, C., He, X., Zha, H., et al. (2001). A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01), California, USA, pp. 107–114.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Stoer, M., & Wagner, F. (1997). A simple min-cut algorithm. Journal of the ACM, 44(4), 585–591.
Hagen, L., & Kahng, A. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design, 11(9), 1074–1085.
Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), Vancouver, BC, Canada, pp. 1601–1608.
Bojchevski, A., Matkovic, Y., & G¨unnemann, S. (2017). Robust spectral clustering for noisy data. In Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17), Halifax, NS, Canada, pp. 737–746.
Wu, L., Chen, P.-Y., Yen, I.E.-H., Xu, F., Xia, Y., & Aggarwal, C. (2018). Scalable spectral clustering using random binning features. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), London, United Kingdom, pp. 2506–2515.
Tan, M., Zhang, S., & Wu, L. (2018). Mutual kNN based spectral clustering. Neural Computing and Applications. https://doi.org/10.1007/s00521-018-3836-z
Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.
Yang, X., Yu, W., Wang, R., Zhang, G., & Nie, F. (2020). Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters, 130, 345–352.
Pang, Y., Xie, J., Nie, F., & Li, X. (2020). Spectral clustering by joint spectral embedding and spectral rotation. IEEE Transactions on Cybernetics, 50(1), 247–258.
Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-phase clustering process for outlier detection. Pattern Recognition Letters, 22(6–7), 691–700.
Yu, D., Sheikholeslami, G., & Zang, . (2002). FindOut: Finding outliers in very large datasets. Knowledge and Information Systems, 4, 387–412.
Wang, C. H. (2008). Recognition of semiconductor defect patterns using spatial filtering and spectral clustering. Expert Systems with Applications, 34(3), 1914–1923.
Xiang, T., & Gong, S. (2008). Spectral clustering with eigenvector selection. Pattern Recognition, 41(3), 1012–1029.
Filippone, M., Camastra, F., & Masulli, F. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190.
Luxburg, U. V. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Yang, P., & Huang, B. (2008). An outlier detection algorithm based on spectral clustering. In Proceedings of 2008 Pacific-Asia Workshop on Computational Intelligence and Industrial Application (PACIIA 2008), Wuhan, China, pp. 507–510.
He, Z. Y., Xu, X. F., & Deng, S. C. (2003). Discovering cluster based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650.
Yang, P., & Huang, B. (2008). A spectral clustering algorithm for outlier detection. In Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering (FITME’08), Leicestershire, United kingdom, pp. 33–36.
Lin, H., & Zhu, Q. (2012). A spectral clustering-based dataset structure analysis and outlier detection progress. Journal of Computational Information Systems, 8(1), 115–124.
Tyuryukanov, I., van der Meijden, M. A. M. M., Terzija, V., & Popov, M. (2018). Spectral MST-based graph outlier detection with application to clustering of power networks. In Proceedings of the 20th Power Systems Computation Conference (PSCC’18), Dublin, Ireland.
Aggarwal, C. C. (2013). Outlier Analysis. Springer.
Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., & Feng, B. (2010). gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM’10), pp. 481–490.
Xiong, L., Chen, X., & Schneider, J. (2011). Direct robust matrix factorization for anomaly detection. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11), Vancouver, BC, Canada, pp. 844–853.
Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA, pp. 444–452.
Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3–4), 237–253.
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’00), Dallas, pp.427–438.
Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United states, pp.93–104.
Angiulli, F. and Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki, pp.15–26.
Wang, Y., Wang, X., & Wang, X. L. (2016). A spectral clustering based outlier detection technique. In Proceedings of 12th International Conference on Machine Learning and Data Mining, New York, USA, pp.15–27.
UCI: The UCIKDD Archive, University of California, Irvine, CA. https://kdd.ics.uci.edu/.
Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA, pp. 37–46.
Janssens, J., Huszar, F., Postma, E., & van den Herik, H. (2012). Stochastic outlier selection.
Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X. (2019). Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, https://arxiv.org/abs/1809.10816.
Ru, X., Liu, Z., Huang, Z., et al. (2016). Normalized residual-based constant false-alarm rate outlier detection. Pattern Recognition Letters, 69, 1–7.
Tang, B., & He, H. (2017). A local density-based approach for outlier detection. Neurocomputing, 241, 171–180.
Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134.
Acknowledgements
This chapter was modified from the paper published by our group in “machine learning and data mining in pattern recognition” [35]. The related contents are reused with permission.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 Xi'an Jiaotong University Press
About this chapter
Cite this chapter
Wang, X., Wang, X., Wilkes, M. (2021). A k-Nearest Neighbour Spectral Clustering-Based Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-9519-6_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9518-9
Online ISBN: 978-981-15-9519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)