Skip to main content
Log in

Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: Identifying density-based local outliers. In: Proceedings of ACM SIGMOD international conference on management of data, Dallas

  2. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of conference on knowledge discovery and data mining (KDD), Portland, Oregon, USA

  3. Fisher D, Xu L, Carmes JR, Chen J, Shiavi R, Biswas G, Weinberg J (1993) Applying AI clustering to engineering tasks. IEEE Intell Syst 8(6): 51–60

    Google Scholar 

  4. Foss A, Zaïane O (2002) A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets. In: Proceedings of 2002 IEEE international conference on data mining (ICDM’02), Maebashi City, Japan

  5. Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multi-response data. Biomet J Int Biomet Soc 28: 81–124

    Google Scholar 

  6. Goldstein J, Ramakrishnan R (2000) Contrast plots and P-sphere trees: space vs. time in nearest neighbor searches. In: Proceedings of 26th conference on very large databases (VLDB), pp 429–440

  7. Hawkins D (1980) Identification of outliers. Chapman and Hall, London

    MATH  Google Scholar 

  8. Howell DA, Shenton HW (2005) A System for in-service strain monitoring of ordinary bridges. In: Proceedings of the 2005 structures congress and forensic engineering symposium, New York, NY, USA

  9. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of STOC, pp 604–613

  10. Jin W, Tung AKH, Han JW (2001) Mining top-n local outliers in large databases. In: Proceedings of conference on knowledge discovery and data mining (KDD), San Francisco, CA, USA

  11. Kantardzic M (2003) Data mining: concepts, models, methods, and algorithms. Wiley, New York

    MATH  Google Scholar 

  12. Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8): 68–75. doi:10.1109/2.781637

    Google Scholar 

  13. Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of 24th international conference on very large databases (VLDB), New York, USA

  14. Kushilevitz E, Ostrovsky R, Rabani Y (1998) Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of STOC

  15. Liu T, Moore AW, Gray A, Yang K (2004) An Investigation of practical approximate nearest neighbour algorithms. NIPS, December

  16. Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th conference on very large databases (VLDB), Santiago, Chile, pp 144–155

  17. Norbert B, Kriegel H-P, Schneider R, Seeger B (1990) The R*-Tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331

  18. Patak Z (1990) Robust principal component analysis via project pursuit. MSc Thesis, University of British Columbia, Canada

  19. Pena D, Prieto F (2001) Multivariate outlier detection and robust covariance matrix estimation. Technometrics, American Statistical Association and the American Society for Quality, vol 43, no. 3

  20. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, TX, USA

  21. Raz O, Buchheit R, Shaw M, Koopman P, Faloutsos C (2004) Detecting Semantic anomalies in truck weigh-in-motion traffic data using data mining. J Comput Civil Eng ASCE 18(4): 291–300

    Article  Google Scholar 

  22. Tang J, Chen Z, Fu AW, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining, Taipei, Taiwan, pp 535–548

  23. Tang J, Chen Z, Fu AW, Cheung DW (2006) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 00: 1–41

    Google Scholar 

  24. Toru K, Katsuya I, Satoru F, Hong J, Hiroshi K (1997) Equipment monitoring system. Yokogawa technical report english edition. No.24, Yokogawa Electric Corporation

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osmar R. Zaïane.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, H., Zaïane, O.R., Foss, A. et al. Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19, 31–51 (2009). https://doi.org/10.1007/s10115-008-0145-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0145-3

Keywords

Navigation