Abstract
One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.
Similar content being viewed by others
References
Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: Identifying density-based local outliers. In: Proceedings of ACM SIGMOD international conference on management of data, Dallas
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of conference on knowledge discovery and data mining (KDD), Portland, Oregon, USA
Fisher D, Xu L, Carmes JR, Chen J, Shiavi R, Biswas G, Weinberg J (1993) Applying AI clustering to engineering tasks. IEEE Intell Syst 8(6): 51–60
Foss A, Zaïane O (2002) A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets. In: Proceedings of 2002 IEEE international conference on data mining (ICDM’02), Maebashi City, Japan
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multi-response data. Biomet J Int Biomet Soc 28: 81–124
Goldstein J, Ramakrishnan R (2000) Contrast plots and P-sphere trees: space vs. time in nearest neighbor searches. In: Proceedings of 26th conference on very large databases (VLDB), pp 429–440
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
Howell DA, Shenton HW (2005) A System for in-service strain monitoring of ordinary bridges. In: Proceedings of the 2005 structures congress and forensic engineering symposium, New York, NY, USA
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of STOC, pp 604–613
Jin W, Tung AKH, Han JW (2001) Mining top-n local outliers in large databases. In: Proceedings of conference on knowledge discovery and data mining (KDD), San Francisco, CA, USA
Kantardzic M (2003) Data mining: concepts, models, methods, and algorithms. Wiley, New York
Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8): 68–75. doi:10.1109/2.781637
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th international conference on very large databases (VLDB), New York, USA
Kushilevitz E, Ostrovsky R, Rabani Y (1998) Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of STOC
Liu T, Moore AW, Gray A, Yang K (2004) An Investigation of practical approximate nearest neighbour algorithms. NIPS, December
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th conference on very large databases (VLDB), Santiago, Chile, pp 144–155
Norbert B, Kriegel H-P, Schneider R, Seeger B (1990) The R*-Tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331
Patak Z (1990) Robust principal component analysis via project pursuit. MSc Thesis, University of British Columbia, Canada
Pena D, Prieto F (2001) Multivariate outlier detection and robust covariance matrix estimation. Technometrics, American Statistical Association and the American Society for Quality, vol 43, no. 3
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, TX, USA
Raz O, Buchheit R, Shaw M, Koopman P, Faloutsos C (2004) Detecting Semantic anomalies in truck weigh-in-motion traffic data using data mining. J Comput Civil Eng ASCE 18(4): 291–300
Tang J, Chen Z, Fu AW, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining, Taipei, Taiwan, pp 535–548
Tang J, Chen Z, Fu AW, Cheung DW (2006) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 00: 1–41
Toru K, Katsuya I, Satoru F, Hong J, Hiroshi K (1997) Equipment monitoring system. Yokogawa technical report english edition. No.24, Yokogawa Electric Corporation
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fan, H., Zaïane, O.R., Foss, A. et al. Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19, 31–51 (2009). https://doi.org/10.1007/s10115-008-0145-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0145-3