A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique

Wang, Xiaochun; Wang, Xiali; Wilkes, Mitch

doi:10.1007/978-981-15-9519-6_5

Xiaochun Wang⁴,
Xiali Wang⁵ &
Mitch Wilkes⁶

481 Accesses

Abstract

Due to its important applications in data mining, many techniques have been developed for outlier detection. In contrast to k-nearest neighbor-based outlier detection techniques such as distance-based and density-based algorithms whose results have a relatively strong sensitiveness to the setting of the parameters, clustering-based outlier detection algorithms regard data items in the small groups as outliers and often obtain them as a by-product. Unlike K-means clustering algorithms, minimum spanning tree-based clustering algorithms can find clusters of arbitrary shapes, different sizes, and different densities. However, minimum spanning tree clustering-based outlier detection approaches may incur high computational costs. To partially circumvent these problems, in this chapter, an efficient outlier detection technique is proposed which is inspired by minimum spanning tree-based clustering. Extensive performance evaluations on synthetic as well as real datasets are conducted to show that the proposed approach works well for identifying global as well as local outliers with respect to the state-of-the-art outlier detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hawkins, D. M. (1980). Identification of outliers. London: Chapman and Hall.
Book Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., & Stolfo, S. (2002). A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Barbará, D., & Jajodia, S. (Eds.), Applications of data mining in computer security. Advances in Information Security (Vol. 6, pp. 77–101).
Google Scholar
Lane, T., & Brodley, C. E. (1998). Temporal sequence learning and data reduction for anomaly detection. In Proceedings of the 1998 5th ACM Conference on Computer and Communications Security (CCS-5), San Francisco, CA, USA (pp. 150–158).
Google Scholar
Bolton, R. J., & David, J. H. (2002). Unsupervised profiling methods for fraud detection. Statistical Science, 17(3), 235–255.
Article MathSciNet Google Scholar
Wong, W., Moore, A., Cooper, G., & Wagner, M. (2002). Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence, Edmonton, Alta., Canada (pp. 217–223).
Google Scholar
Sheng, B., Li, Q., Mao, W., & Jin, W. (2007). Outlier detection in sensor networks. In Proceedings of ACM International Symposium on Mobile Ad Hoc Networking and Computing (pp. 219–228).
Google Scholar
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
Article Google Scholar
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15.1–15.58.
Google Scholar
Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Data Bases (VLDB’98), New York (pp. 392–403).
Google Scholar
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD’00), Dallas (pp. 427–438).
Google Scholar
Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02), Helsinki (pp. 15–26).
Google Scholar
Breuning, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, TX, United States (pp. 93–104).
Google Scholar
Bay, S. D., & Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘03), Washington, DC, United States (pp. 29–38).
Google Scholar
Aggarwal, C., & Yu, P. (2005). An effective and efficient algorithm for high-dimensional outlier detection. The VLDB Journal, 14(2), 211–221.
Article Google Scholar
Kriegel, H.-P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA (pp. 444–452).
Google Scholar
Huang, H., Mehrotra, K., & Mohan, C. K. (2013). Rank-based outlier detection. Journal of Statistical Computation and Simulation, 83(3), 518–531.
Article MathSciNet Google Scholar
Zahn, C. T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 64–82.
Google Scholar
Wang, X., Wang, X. L., & Wilkes, D. M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.
Article Google Scholar
Zhong, C., Miao, D., & Wang, R. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.
Article Google Scholar
Luo, T., & Zhong, C. (2010). A neighborhood density estimation clustering algorithm based on minimum spanning tree. In Proceedings of the 5th International Conference on Rough Set and Knowledge Technology (RSKT’10), Beijing, China (pp. 557–565).
Google Scholar
Luo, T., Zhong, C., Li, H., & Sun, X. (2010). A multi-prototype clustering algorithm based on minimum spanning tree. In Proceedings of 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’10) (pp. 1602–1607).
Google Scholar
Zhong, C., Miao, D., & Franti, P. (2011). Minimum spanning tree based split-and-merge: A hierarchical clustering method. Information Sciences, 181(16), 3397–3410.
Article Google Scholar
Lin, J., Ye, D., Chen, C., & Gao, M. (2008). Minimum spanning tree based spatial outlier mining and its applications. In Proceedings of the 3rd International Conference on Rough Sets and Knowledge Technology (RSKT’08), Chengdu, China, LNAI 5009 (pp. 508–515).
Google Scholar
John Peter, S., & Victor, S. P. (2011). An integrated approach for local outlier detection using dynamic minimum spanning tree. Journal of Discrete Mathematical Sciences and Cryptography, 14(1), 89–106.
Article Google Scholar
John Peter, S. (2011). Minimum spanning tree based clustering for outlier detection. Journal of Discrete Mathematical Sciences and Cryptography, 14(2), 149–166.
Article MathSciNet Google Scholar
Daneshgar, A., Javadi, R., & Shariat Razavi, S. B. (2013). Clustering and outlier detection using isoperimetric number of trees. Pattern Recognition, 46(12), 3371–3382.
Google Scholar
Wang, X., Wang, X. L., & Wilkes, D. M., A spanning tree-inspired clustering based outlier detection technique. In Proceedings of the 12th Industry Conference on Data Mining, Berlin, Germany (pp. 209–223)
Google Scholar
Zhu, Q., Fan, X., & Feng, J. (2014). Outlier detection based on K-Neighborhood MST. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI’14), San Francisco, CA, United States (pp. 718–724).
Google Scholar
Cipolla, E., & Vella, F. (2014). Identification of spatio-temporal outliers through Minimum Spanning Tree. In Proceedings of the 10th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS’14), Marrakech, Morocco (pp. 248–255).
Google Scholar
Abghari, S., Boeva, V., Lavesson, N., Grahn, H., Ickin, S., & Gustafsson, J. (2018). A minimum spanning tree clustering approach for outlier detection in event sequences. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18), Orlando, FL, United States (pp. 1123–1130).
Google Scholar
Wang, X., Wang, X.L., & Wilkes, D. M. (2012). Modifying iDistance for a fast CHAMELEON with application to patch based image segmentation. In Proceedings of the 9th IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA 2012), Crete, Greece (pp. 107–114).
Google Scholar
UCI: The UCI KDD Archive. [http://kdd.ics.uci.edu/]. Irvine, CA: University of California.
Google Scholar
Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002). Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), Taipei, Taiwan (pp. 535–548).
Google Scholar
Jin, W., Tung, A.K.H., Han, J., & Wang, W. (2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore (pp. 577–593).
Google Scholar
Zhang, K., Hutter, M., & Jin, H. (2009). A new local distance-based outlier detection approach for scattered real-world data. In Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD ’09), Bangkok, Thailand (pp. 813–822).
Google Scholar
Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01), Santa Barbara, CA, USA (pp. 37–46).
Google Scholar
Meng, X., & Chen, Z. (2004). On user-oriented measurements of effectiveness of web information retrieval systems. In Proceedings of the International Conference on Internet Computing (ICIC’04), LasVegas, Nevada, USA (vol. 1, pp. 527–533).
Google Scholar
Wang, X., Wang, X. L., Ma, Y., & Wilkes, D. M. (2015). A fast MST-inspired kNN-based outlier detection method. Information Systems, 48, 89–112.
Article Google Scholar

Download references

Acknowledgements

This chapter was modified from the paper published by our group in Information Systems [38]. The related contents are reused with permission.

Author information

Authors and Affiliations

School of Software Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
Xiaochun Wang
School of Information Engineering, Chang’an University, Xi’an, Shaanxi, China
Xiali Wang
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Mitch Wilkes

Authors

Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiali Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Wilkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). A Minimum Spanning Tree Clustering-Inspired Outlier Detection Technique. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_5

Download citation

DOI: https://doi.org/10.1007/978-981-15-9519-6_5
Published: 25 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9518-9
Online ISBN: 978-981-15-9519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics