Advertisement

Knowledge and Information Systems

, Volume 4, Issue 4, pp 387–412 | Cite as

FindOut: Finding Outliers in Very Large Datasets

  • Dantong Yu
  • Gholamhosein Sheikholeslami
  • Aidong Zhang
Original Paper

Abstract.

Finding the rare instances or the outliers is important in many KDD (knowledge discovery and data-mining) applications, such as detecting credit card fraud or finding irregularities in gene expressions. Signal-processing techniques have been introduced to transform images for enhancement, filtering, restoration, analysis, and reconstruction. In this paper, we present a new method in which we apply signal-processing techniques to solve important problems in data mining. In particular, we introduce a novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform. The main idea in FindOut is to remove the clusters from the original data and then identify the outliers. Although previous research showed that such techniques may not be effective because of the nature of the clustering, FindOut can successfully identify outliers from large datasets. Experimental results on very large datasets are presented which show the efficiency and effectiveness of the proposed approach.

Keywords: Clustering; Data mining; Outliers; Wavelet 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag London Limited 2002

Authors and Affiliations

  • Dantong Yu
    • 1
  • Gholamhosein Sheikholeslami
    • 1
  • Aidong Zhang
    • 1
  1. 1.Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, USAUS

Personalised recommendations