On normalization and algorithm selection for unsupervised outlier detection


This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

  1. 1.

    Generally normalization refers to scaling each attribute to \(\left[ 0,1\right] \) while standardization refers to scaling each attribute to \({\mathcal {N}}\left( 0,1\right) \). For the sake of simplicity, and without loss of generality, we use the term normalization to refer to both re-scalings in this paper.


Funding was provided by the Australian Research Council through the Australian Laureate Fellowship FL140100012, and Linkage Project LP160101885. This research was supported in part by the Monash eResearch Centre and eSolutions-Research Support Services through the MonARCH HPC Cluster.

Kandanaarachchi, S., Muñoz, M.A., Hyndman, R.J. et al. On normalization and algorithm selection for unsupervised outlier detection. Data Min Knowl Disc 34, 309–354 (2020).

  • Unsupervised outlier detection
  • Effect of normalization on outlier detection
  • Algorithm selection problem for outlier detection
  • Instance space analysis
  • Instance space analysis for outlier detection