Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

Fan, Hongqin; Zaïane, Osmar R.; Foss, Andrew; Wu, Junfeng

doi:10.1007/s10115-008-0145-3

Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

Regular Paper
Published: 13 August 2008

Volume 19, pages 31–51, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hongqin Fan¹,
Osmar R. Zaïane²,
Andrew Foss² &
…
Junfeng Wu²

488 Accesses
28 Citations
Explore all metrics

Abstract

One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An explainable outlier detection method using region-partition trees

Article 20 July 2020

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Article 13 June 2020

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

References

Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: Identifying density-based local outliers. In: Proceedings of ACM SIGMOD international conference on management of data, Dallas
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of conference on knowledge discovery and data mining (KDD), Portland, Oregon, USA
Fisher D, Xu L, Carmes JR, Chen J, Shiavi R, Biswas G, Weinberg J (1993) Applying AI clustering to engineering tasks. IEEE Intell Syst 8(6): 51–60
Google Scholar
Foss A, Zaïane O (2002) A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets. In: Proceedings of 2002 IEEE international conference on data mining (ICDM’02), Maebashi City, Japan
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multi-response data. Biomet J Int Biomet Soc 28: 81–124
Google Scholar
Goldstein J, Ramakrishnan R (2000) Contrast plots and P-sphere trees: space vs. time in nearest neighbor searches. In: Proceedings of 26th conference on very large databases (VLDB), pp 429–440
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
MATH Google Scholar
Howell DA, Shenton HW (2005) A System for in-service strain monitoring of ordinary bridges. In: Proceedings of the 2005 structures congress and forensic engineering symposium, New York, NY, USA
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of STOC, pp 604–613
Jin W, Tung AKH, Han JW (2001) Mining top-n local outliers in large databases. In: Proceedings of conference on knowledge discovery and data mining (KDD), San Francisco, CA, USA
Kantardzic M (2003) Data mining: concepts, models, methods, and algorithms. Wiley, New York
MATH Google Scholar
Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8): 68–75. doi:10.1109/2.781637
Google Scholar
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th international conference on very large databases (VLDB), New York, USA
Kushilevitz E, Ostrovsky R, Rabani Y (1998) Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of STOC
Liu T, Moore AW, Gray A, Yang K (2004) An Investigation of practical approximate nearest neighbour algorithms. NIPS, December
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th conference on very large databases (VLDB), Santiago, Chile, pp 144–155
Norbert B, Kriegel H-P, Schneider R, Seeger B (1990) The R*-Tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331
Patak Z (1990) Robust principal component analysis via project pursuit. MSc Thesis, University of British Columbia, Canada
Pena D, Prieto F (2001) Multivariate outlier detection and robust covariance matrix estimation. Technometrics, American Statistical Association and the American Society for Quality, vol 43, no. 3
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, TX, USA
Raz O, Buchheit R, Shaw M, Koopman P, Faloutsos C (2004) Detecting Semantic anomalies in truck weigh-in-motion traffic data using data mining. J Comput Civil Eng ASCE 18(4): 291–300
Article Google Scholar
Tang J, Chen Z, Fu AW, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining, Taipei, Taiwan, pp 535–548
Tang J, Chen Z, Fu AW, Cheung DW (2006) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 00: 1–41
Google Scholar
Toru K, Katsuya I, Satoru F, Hong J, Hiroshi K (1997) Equipment monitoring system. Yokogawa technical report english edition. No.24, Yokogawa Electric Corporation

Download references

Author information

Authors and Affiliations

Department of Engineering Technology, Missouri Western State University, St. Joseph, MO, USA
Hongqin Fan
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Osmar R. Zaïane, Andrew Foss & Junfeng Wu

Authors

Hongqin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Osmar R. Zaïane
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Foss
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osmar R. Zaïane.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, H., Zaïane, O.R., Foss, A. et al. Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19, 31–51 (2009). https://doi.org/10.1007/s10115-008-0145-3

Download citation

Received: 22 June 2007
Revised: 03 December 2007
Accepted: 16 February 2008
Published: 13 August 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s10115-008-0145-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

Abstract

Access this article

Similar content being viewed by others

An explainable outlier detection method using region-partition trees

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

Abstract

Access this article

Similar content being viewed by others

An explainable outlier detection method using region-partition trees

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation