Enhancing Outlier Detection by Filtering Out Core Points and Border Points

Wang, Xiaochun; Wang, Xiali; Wilkes, Mitch

doi:10.1007/978-981-15-9519-6_7

Xiaochun Wang⁴,
Xiali Wang⁵ &
Mitch Wilkes⁶

477 Accesses
1 Citations

Abstract

Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection, and so on. At present, a large number of popular outlier detection algorithms are available, including distribution-based, distance-based, density-based, and clustering-based approaches. However, traditional outlier detection algorithms face some challenges. For one example, most distance-based and density-based outlier detection methods are based on k-nearest neighbors. Therefore, even though the outlier data occupy a relatively small amount in the dataset, the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. For another example, some methods can only detect the global outliers, but fail to detect the local outliers. Last but not the least, most outlier detection algorithms do not accurately distinguish between boundary points and outliers. To partially solve these problems, it is realized that the outlier detection problem is related to the clustering problem by complementarity. According to density-based clustering, there are three kinds of data points, namely core points, border points, and outliers. If indicators can be extracted from the data that make outliers have much larger deviation values than the other two kinds of data points, outlier detection problems can be fulfilled. Therefore, in this chapter, we propose to augment some boundary indicators to classical outlier detection algorithms. Experiments performed on both synthetic and real data sets demonstrate the efficacy of enhanced outlier detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Knorr, E.M., & Ng, R.T. (1999). A unified notion of outliers: Properties and computation. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD’97) (pp. 219-222). CA, USA: Newport Beach.
Google Scholar
Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3), 237–253.
Article Google Scholar
Mehnaz, S., & Bertino, E. (2017). Ghostbuster: A fine-grained approach for anomaly detection in file system accesses. In Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY’17) (pp. 3–14). Scottsdale, AZ, United states.
Google Scholar
Iturbe, M., Garitano, I., Zurutuza, U. and Uribeetxeberria, R. (2017). Towards large-scale, heterogeneous anomaly detection systems in industrial networks: A survey of current trends. Security and Communication Networks, 2017(6), Art. no. 9150965.
Google Scholar
Wang, Y., Wu, Z., Zhu, Y., & Zhang, P. (2018). Research on anomaly detection algorithm based on generalization latency of telecommunication network. Future Generation Computer Systems, 85, 9–18.
Article Google Scholar
Gogoi, P., Bhattacharyya, D. K., Borah, B., & Kalita, J. K. (2011). A survey of outlier detection methods in network anomaly identification. Computer Journal, 54(4), 570–588.
Article Google Scholar
Bhuyan, M. H., Bhattacharyya, D. K., & Kalita, J. K. (2012). Survey on incremental approaches for network anomaly detection. International Journal of Communication Networks and Information Security, 3(3), 226–239.
Google Scholar
Agarwal, D. (2005) An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05) (pp. 26–33). Houston, TX, United states.
Google Scholar
Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215–249.
Article Google Scholar
Avdiienko, V., Kuznetsov, K., Rommelfanger, I., Rau, A., Gorla, A., & Zeller, A. (2017). Detecting behavior anomalies in graphical user interfaces. In Proceedings of the International Conference on Software Engineering Companion (ICSE-C’17) (pp. 201–203). Buenos Aires, Argentina.
Google Scholar
Keogh, E., Lin, J., Lee, S.-H., & van Herle, H. (2010). Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems, 11(1), 1–27.
Article Google Scholar
Cai, L., Thornhill, N., Kuenzel, S., & Pal, B. C. (2017). Real-time detection of power system disturbances based on k-nearest neighbor analysis. IEEE Access, 5, 5631–5639.
Article Google Scholar
Mccarren, A., Mccarthy, S., Sullivan, C.O., & Roantree, M. (2017). Anomaly detection in agri warehouse construction. In Proceedings of 2017 Australasian Computer Science Week Multiconference (ACSW’17) (pp. 1–10). Geelong, VIC, Australia.
Google Scholar
Stojanovic, N., Dinic, M. and Stojanovic, L. (2018). A data-driven approach for multivariate contextualized anomaly detection: Industry use case. In Proceedings of the 5th IEEE International Conference on Big Data (Big Data’17) (99. 1560–1569). Boston, MA, United states.
Google Scholar
Vidmar, G., & Blagus, R. (2014). Outlier detection for healthcare quality monitoring: A comparison of four approaches to over-dispersed proportions. Quality and Reliability Engineering International, 30(3), 347–362.
Article Google Scholar
Yan, K., You, X., Ji, X., Yin, G., & Yang, F. (2016). A hybrid outlier detection method for health care big data. In Proceedings of the 6th IEEE International Conference on Big Data and Cloud Computing (BDCloud’16) (pp. 157–162). Atlanta, GA, United states.
Google Scholar
Gu, F., Niu, J., Das, S. K., He, Z., & Jin, X. (2017). Detecting breathing frequency and maintaining a proper running rhythm. Pervasive and Mobile Computing, 42, 498–512.
Article Google Scholar
Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.
MATH Google Scholar
Knorr, E.M., & Ng, R.T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proceedings of the International Conference on Very Large Data Bases (VLDB’98) (pp. 392–403), New York.
Google Scholar
Breuning, M.M., Kriegel, H.P., Ng, R.T., Sander, J. (2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 93–104). Dallas, TX, United states.
Google Scholar
Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-Phase Clustering Process for Outliers Detection. Pattern Recognition Letters, 22(6–7), 691–700.
Article MATH Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’96) (pp. 226–231). Portland, Oregon, USA.
Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H. P., et al. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’99), 28(2), 49–60.
Google Scholar
Hinneburg A., & Keim D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’98) (pp. 58–65). New York, NY, USA.
Google Scholar
Duan, L., Xu, L., Liu, Y., & Lee, J. Cluster-based outlier detection. Annals of Operations Research, 168(1), 151–168.
Google Scholar
Chen, X., Liu, W., Qiu, H., & Lai, J. (2011). APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, 32(7), 973–986.
Article Google Scholar
Chen, Y.Q., Wang, X., Xu, R., Bai, X., & Meng, X. (2010). An adaptive affinity propagation document clustering. In Proceedings of the 2010 7th International Conference on Informatics and Systems (INFOS’10) (pp. 1–7). Cairo, Egypt.
Google Scholar
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492–1496.
Article Google Scholar
Hou, J., Gao, H., & Li, X. (2016). DSets-DBSCAN: A parameter-free clustering algorithm. IEEE Transactions on Image Processing, 25(7), 3182–3193.
Article MathSciNet MATH Google Scholar
Qi, X., & Wang, P. (2016). A density-based clustering algorithm for high-dimensional data with feature selection. In Proceedings of the 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII’16) (pp. 114–118). Wuhan, Hubei, China.
Google Scholar
Zhu, Y., Ting, K. M., & Carman, M. J. (2016). Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognition, 60, 983–997.
Article MATH Google Scholar
Messaoud, T.A., Smiti, A. and Louati, A. (2019). A novel density-based clustering approach for outlier detection in high-dimensional data. In Proceedings of the 14th International Conference on Hybrid Artificial Intelligence Systems (HAIS’19) (pp. 322–331). León, Spain.
Google Scholar
Roffo, G., Melzi, S. and Cristani, M. (2015). Infinite feature selection. In Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV’15) (pp. 4202–4210). Santiago, Chile.
Google Scholar
Rahman, M. A., Ang, K. L.-M., & Seng, K. P. (2018). Unique neighborhood set parameter independent density-based clustering with outlier detection. IEEE Access, 6, 44707–44717.
Article Google Scholar
Su, S., Xiao, L., Ruan, L., Gu, F., Li, S., Wang, Z., et al. (2019). An efficient density-based local outlier detection approach for scattered data. IEEE Access, 7, 1006–1020.
Article Google Scholar
Wang, Y. F., Yu, J., Su, G. P., & Qian, Y. R. (2019). A new outlier detection method based on OPTICS. Sustainable Cities and Society, 45, 197–212.
Article Google Scholar
Nagamani, C., & Chittineni, S. (2019). Efficient neighborhood density based outlier detection inside a sub network with high dimensional data. Ingenierie des Systemes d’Information, 24(1), 107–111.
Google Scholar
Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD’02) (pp. 15–26). Helsinki.
Google Scholar
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’00) (pp. 427–438). Dallas.
Google Scholar
Jin, W., Tung, A.K.H., Han, J., & Wang, W. (2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06) (pp. 577–593). Singapore.
Google Scholar
Huang, H., Mehrotra, K., & Mohan, C. K. (2013). Rank-based outlier detection. Journal of Statistical Computation and Simulation, 83(3), 518–531.
Article MathSciNet Google Scholar
UCI: The UCI KDD Archive, University of California, Irvine, CA. http://kdd.ics.uci.edu/.
Aggarwal, C., & Yu, P. (2001). Outlier detection for high-dimensional data. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD’01) (pp. 37–46). Santa Barbara, CA, USA.
Google Scholar
Li, X., Wang, X., & Wang, X.L. (2018). Enhancing outlier detection by an outlier indicator. In Proceedings of the 14th International Conference on Machine Learning and Data Mining (pp. 393–405). New York, USA.
Google Scholar

Download references

Acknowledgements

This chapter was modified from the paper published by our group in machine learning and data mining in pattern recognition [44]. The related contents are reused with permission.

Author information

Authors and Affiliations

School of Software Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
Xiaochun Wang
School of Information Engineering, Chang’an University, Xi’an, Shaanxi, China
Xiali Wang
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Mitch Wilkes

Authors

Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiali Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Wilkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochun Wang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Wang, X., Wilkes, M. (2021). Enhancing Outlier Detection by Filtering Out Core Points and Border Points. In: New Developments in Unsupervised Outlier Detection. Springer, Singapore. https://doi.org/10.1007/978-981-15-9519-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-15-9519-6_7
Published: 25 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9518-9
Online ISBN: 978-981-15-9519-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics