Advertisement

A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data

  • Hongqin Fan
  • Osmar R. Zaïane
  • Andrew Foss
  • Junfeng Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

We present a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into consideration. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective.

Keywords

Outlier Detection Mining Algorithm Close Neighbour Synthetic Dataset Local Outlier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Raz, O., Buchheit, R., Shaw, M., Koopman, P., Faloutsos, C.: Detecting Semantic Anomalies in Truck Weigh-in-Motion Traffic Data Using Data Mining. Journal of Computing in Civil Engineering, ASCE 18(4), 291–300 (2004)CrossRefGoogle Scholar
  2. 2.
    Knorr, E., Ng, R.: Algorithms for Mining Distance-based Outliers in Large Datasets. In: Proc. of 24th International Conference on Very Large Databases (1998)Google Scholar
  3. 3.
    Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of ACM SIGMOD 2000 International Conference on Management of Data, Dallas, TX (2000)Google Scholar
  4. 4.
    Tang, J., Chen, Z., Fu, A., Cheung, D.: Enhancing Effectiveness of outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Foss, A., Zaïane, O.: A Parameterless Method for Efficiently Discovering Clusters of arbitrary Shape in Large Datasets. In: Proc. of 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan (2002)Google Scholar
  6. 6.
    Hawkins, D.: Identification of Outliers, p. 1. Chapman and Hall, London (1980)CrossRefMATHGoogle Scholar
  7. 7.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of the ACM SIGMOD International Conference on Management of Data, Dallas, TX (2000)Google Scholar
  8. 8.
    Goldstein, J., Ramakrishnan, R.: Constrast Polots and P-Sphere Trees: Space vs. Time in Nearest Neighbor Searches. In: Proc. 26th VLDB conference (2000)Google Scholar
  9. 9.
    Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. In: STOC 1998 (1998)Google Scholar
  10. 10.
    Liu, T., Moore, A.W., Gray, A., Wang, K.: An Investigation of Practical Approximate Nearest Neighbor Algorithms. In: NIPS (December 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hongqin Fan
    • 1
  • Osmar R. Zaïane
    • 2
  • Andrew Foss
    • 2
  • Junfeng Wu
    • 2
  1. 1.Department of Civil EngineeringUniversity of AlbertaCanada
  2. 2.Department of Computing ScienceUniversity of AlbertaCanada

Personalised recommendations