An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Detecting outliers in mixed attribute datasets is one of major challenges in real world applications. Existing outlier detection methods lack effectiveness for mixed attribute datasets mainly due to their inability of considering interactions among different types of, e.g., numerical and categorical attributes. To address this issue in mixed attribute datasets, we propose a novel Pattern based Outlier Detection approach (POD). Pattern in this paper is defined to describe majority of data as well as capture interactions among different types of attributes. In POD, the more does an object deviate from these patterns, the higher is its outlier factor. We use logistic regression to learn patterns and then formulate the outlier factor in mixed attribute datasets. A series of experimental results illustrate that POD performs statistically significantly better than several classic outlier detection methods.