Abstract
Due to its important applications in data mining, many techniques have been developed for outlier detection. In contrast to k-nearest neighbor-based outlier detection techniques such as distance-based and density-based algorithms whose results have a relatively strong sensitiveness to the setting of the parameters, clustering-based outlier detection algorithms regard data items in the small groups as outliers and often obtain them as a by-product. Unlike K-means clustering algorithms, minimum spanning tree-based clustering algorithms can find clusters of arbitrary shapes, different sizes, and different densities. However, minimum spanning tree clustering-based outlier detection approaches may incur high computational costs. To partially circumvent these problems, in this chapter, an efficient outlier detection technique is proposed which is inspired by minimum spanning tree-based clustering. Extensive performance evaluations on synthetic as well as real datasets are conducted to show that the proposed approach works well for identifying global as well as local outliers with respect to the state-of-the-art outlier detection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hawkins, D. M. (1980). Identification of outliers. London: Chapman and Hall.
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., & Stolfo, S. (2002). A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Barbará, D., & Jajodia, S. (Eds.), Applications of data mining in computer security. Advances in Information Security (Vol. 6, pp. 77–101).
Lane, T., & Brodley, C. E. (1998). Temporal sequence learning and data reduction for anomaly detection. In Proceedings of the 1998 5th ACM Conference on Computer and Communications Security (CCS-5), San Francisco, CA, USA (pp. 150–158).
Bolton, R. J., & David, J. H. (2002). Unsupervised profiling methods for fraud detection. Statistical Science, 17(3), 235–255.
Wong, W., Moore, A., Cooper, G., & Wagner, M. (2002). Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence, Edmonton, Alta., Canada (pp. 217–223).
Sheng, B., Li, Q., Mao, W., & Jin, W. (2007). Outlier detection in sensor networks. In Proceedings of ACM International Symposium on Mobile Ad Hoc Networking and Computing (pp. 219–228).
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.
Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Data Bases (VLDB’98), New York (pp. 392–403).
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD’00), Dallas (pp. 427–438).
Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki (pp. 15–26).
Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United States (pp. 93–104).
Bay, S. D., & Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘03), Washington, DC, United States (pp. 29–38).
Aggarwal, C., & Yu, P. (2005). An effective and efficient algorithm for high-dimensional outlier detection. The VLDB Journal, 14(2), 211–221.
Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA (pp. 444–452).
Huang, H., Mehrotra, K., & Mohan, C. K. (2013). Rank-based outlier detection. Journal of Statistical Computation and Simulation, 83(3), 518–531.
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 64–82.
Wang, X., Wang, X. L., & Wilkes, D. M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.
Zhong, C., Miao, D., & Wang, R. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.
Luo, T., & Zhong, C. (2010). A neighborhood density estimation clustering algorithm based on minimum spanning tree. In Proceedings of the 5th International Conference on Rough Set and Knowledge Technology (RSKT’10), Beijing, China (pp. 557–565).
Luo, T., Zhong, C., Li, H., & Sun, X. (2010). A multi-prototype clustering algorithm based on minimum spanning tree. In Proceedings of 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’10) (pp. 1602–1607).
Zhong, C., Miao, D., & Franti, P. (2011). Minimum spanning tree based split-and-merge: A hierarchical clustering method. Information Sciences, 181(16), 3397–3410.
Lin, J., Ye, D., Chen, C., & Gao, M. (2008). Minimum spanning tree based spatial outlier mining and its applications. In Proceedings of the 3rd International Conference on Rough Sets and Knowledge Technology (RSKT’08), Chengdu, China, LNAI 5009 (pp. 508–515).
John Peter, S., & Victor, S. P. (2011). An integrated approach for local outlier detection using dynamic minimum spanning tree. Journal of Discrete Mathematical Sciences and Cryptography, 14(1), 89–106.
John Peter, S. (2011). Minimum spanning tree based clustering for outlier detection. Journal of Discrete Mathematical Sciences and Cryptography, 14(2), 149–166.
Daneshgar, A., Javadi, R., & Shariat Razavi, S. B. (2013). Clustering and outlier detection using isoperimetric number of trees. Pattern Recognition, 46(12), 3371–3382.
Wang, X., Wang, X. L., & Wilkes, D. M., A spanning tree-inspired clustering based outlier detection technique. In Proceedings of the 12th Industry Conference on Data Mining, Berlin, Germany (pp. 209–223)
Zhu, Q., Fan, X., & Feng, J. (2014). Outlier detection based on K-Neighborhood MST. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI’14), San Francisco, CA, United States (pp. 718–724).
Cipolla, E., & Vella, F. (2014). Identification of spatio-temporal outliers through Minimum Spanning Tree. In Proceedings of the 10th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS’14), Marrakech, Morocco (pp. 248–255).
Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Ickin, S., & Gustafsson, J. (2018). A minimum spanning tree clustering approach for outlier detection in event sequences. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18), Orlando, FL, United States (pp. 1123–1130).
Wang, X., Wang, X.L., & Wilkes, D. M. (2012). Modifying iDistance for a fast CHAMELEON with application to patch based image segmentation. In Proceedings of the 9th IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA 2012), Crete, Greece (pp. 107–114).
UCI: The UCI KDD Archive. [http://kdd.ics.uci.edu/]. Irvine, CA: University of California.
Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002). Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), Taipei, Taiwan (pp. 535–548).
Jin, W., Tung, A.K.H., Han, J., & Wang, W. (2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore (pp. 577–593).
Zhang, K., Hutter, M., & Jin, H. (2009). A new local distance-based outlier detection approach for scattered real-world data. In Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD ’09), Bangkok, Thailand (pp. 813–822).
Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA (pp. 37–46).
Meng, X., & Chen, Z. (2004). On user-oriented measurements of effectiveness of web information retrieval systems. In Proceedings of the International Conference on Internet Computing (ICIC’04), LasVegas, Nevada, USA (vol. 1, pp. 527–533).
Wang, X., Wang, X. L., Ma, Y., & Wilkes, D. M. (2015). A fast MST-inspired kNN-based outlier detection method. Information Systems, 48, 89–112.
Acknowledgements
This chapter was modified from the paper published by our group in Information Systems [38]. The related contents are reused with permission.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 Xi'an Jiaotong University Press
About this chapter
Cite this chapter
Wang, X., Wang, X., Wilkes, M. (2021). A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-9519-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9518-9
Online ISBN: 978-981-15-9519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)