Abstract
Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.
This is a preview of subscription content, log in to check access.









References
Agostinelli, C., Leung, A., Yohai, V.J., Zamar, R.H.: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. Test 24(3), 441–461 (2015)
Alfons, A.: robusthd: Robust methods for high-dimensional data. R package version 01 (2012)
Bibby, J., Kent, J., Mardia, K.: Multivariate Analysis. Academic Press, London (1979)
Boudt, K., Rousseeuw, P., Vanduffel, S., Verdonck, T.: The minimum regularized covariance determinant estimator. arXiv:1701.07086 (2017)
Candès, E., Tao, T.: The dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)
Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105(489), 147–156 (2010)
Chun, H., Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 3–25 (2010)
Croux, C., Ruiz-Gazen, A.: High breakdown estimators for principal components: the projection-pursuit approach revisited. J. Multivar. Anal. 95, 206–226 (2005)
Davies, P., Gather, U.: The identification of multiple outliers. J. Am. Stat. Assoc. 88, 782–792 (1993)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C.: Sparse partial robust m regression. Chemom. Intell. Lab. Syst. 149, 50–59 (2015)
Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K.: Sparse and robust PLS for binary classification. J. Chemom. 30, 153–162 (2016)
Hubert, M., Rousseeuw, P.J., Vanden Branden, K.: ROBPCA: a new approach to robust principal components analysis. Technometrics 47, 64–79 (2005)
Janssens, K.H., De Raedt, I., Schalm, O., Veeckman, J.: Composition of 15–17\(^{{\rm th}}\) century archæological glass vessels excavated in antwerp, belgium. Mikrochimica Acta 15(Suppl.), 253–267 (1998)
Lemberge, P., De Raedt, I., Janssens, K.H., Wei, F., Van Espen, P.J.: Quantitative analysis of 16–17\(^{{\rm th}}\) century archæological glass vessels using pls regression of epxma and \(\mu \)-xrf data. J. Chemom. 14, 751–763 (2000)
Lopuhaä, H.: Multivariate \(\tau \)-estimators for location and scatter. Can. J. Stat. 19, 307–321 (1991)
Maronna, R., Zamar, R.: Robust estimates of location and dispersion for high-dimensional data sets. Technometrics 44, 307–317 (2002)
Maronna, R., Martin, D., Yohai, V.: Robust statistics: theory and methods. Wiley, New York (2006)
Öllerer, V., Croux, C.: Robust high-dimensional precision matrix estimation. In: Modern nonparametric, robust and multivariate methods, pp. 325–350. Springer (2015)
Öllerer, V., Alfons, A., Croux, C.: The shooting s-estimator for robust regression. Comput. Stat. 31, 829–844 (2016)
Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. B 71(2), 447–466 (2009)
Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79, 871–880 (1984)
Rousseeuw, P.J., Van den Bossche, W.: Detecting deviating data cells. Technometrics (Accepted) (2017). https://doi.org/10.1080/00401706.2017.1340909
Rousseeuw, P.J., Croux, C.: Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 88(424), 1273–1283 (1993)
Rousseeuw, P.J., Leroy, A.: Robust regression and outlier detection. Wiley, New York (1987)
Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Rousseeuw, P.J., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85, 633–651 (1990)
Serneels, S., Croux, C., Filzmoser, P., Van Espen, P.J.: Partial robust m-regression. Chemom. Intell. Lab. Syst. 79, 55–64 (2005)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Willems, G., Joe, H., Zamar, R.: Diagnosing multivariate outliers detected by robust estimators. J. Comput. Gr. Stat. 18(1), 73–91 (2009)
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Krishnaiaah, P.R. (ed.) Multivariate Analysis, pp. 391–420. Academic Press, New York (1966)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the BNP Paribas Fortis Chair in Fraud Analytics and Internal Funds KU Leuven under Grant C16/15/068.
Electronic supplementary material
Below is the link to the electronic supplementary material.
A Appendix: Proofs
A Appendix: Proofs
A.1 Proof of Proposition 1
Proof
Note that our weighted covariance matrix \(\hat{\varvec{\varSigma }}_w\), like all covariance matrices, is a positive-semidefinite matrix. Since we also assume it is not singular and \(\hat{\varvec{\varSigma }}_w^{-1}\) exists, we know that \(\hat{\varvec{\varSigma }}_w\) is positive-definite. We now apply the Cauchy- Bunyakovskiy-Schwarz inequality to \(\varvec{x}= \hat{\varvec{\varSigma }}_w^{-1/2}\varvec{x}_1\) and \(\varvec{y}= \hat{\varvec{\varSigma }}_w^{1/2}\varvec{y}_1\), for arbitrary \(\varvec{x}_1,\varvec{y}_1 \in \mathbb {R}^p\). This results in the following inequality
We have equality if \(\varvec{y}= c\varvec{x}\) with \(c\in \mathbb {R}\), which means \(\hat{\varvec{\varSigma }}_w^{1/2}\varvec{y}_1 = c \hat{\varvec{\varSigma }}_w^{-1/2}\varvec{x}_1\) or \(\varvec{y}_1 = c \hat{\varvec{\varSigma }}_w^{-1}\varvec{x}_1\). So summarized, for any \(\varvec{x},\varvec{y}\in \mathbb {R}^p\) we have the inequality
where there is equality if and only if \(\varvec{y}= c \hat{\varvec{\varSigma }}_w^{-1}\varvec{x}\).
We now look at
and apply this inequality:
We have equality in the above inequality if
\(\varvec{a}= c \hat{\varvec{\varSigma }}_w^{-1}(\varvec{x}- \hat{\varvec{\mu }}_w)\). So
is the direction \(\varvec{a}\) that maximizes
and for this \(\varvec{a}\) we have
\(\square \)
A.2 Proof of Theorem 1
Proof
We know that, by the theory of ordinary least squares regression,
and by the definition of our weighted covariance matrix, \(\hat{\varvec{\varSigma }}_{w,\varepsilon } = \frac{1}{n_{w,\varepsilon }-1} \varvec{X}_{w,\varepsilon }^T\varvec{X}_{w,\varepsilon }\), we can write
We know that \(((n_{w,\varepsilon }-1)\hat{\varvec{\varSigma }}_{w,\varepsilon })^{-1} = \frac{1}{n_{w,\varepsilon }-1} \hat{\varvec{\varSigma }}_{w,\varepsilon }^{-1}\) and it is easy to see that \(\varvec{X}^T_{w,\varepsilon }\varvec{y}^{n+1}_{w,\varepsilon } = \sqrt{\varepsilon }(\varvec{x}- \hat{\varvec{\mu }}_{w,\varepsilon })\), if we look at the definitions of \(\varvec{X}_{w,\varepsilon }\) and \(\varvec{y}^{n+1}_{w,\varepsilon }\). Thus we have that
Since \(\varepsilon \) is strictly larger than zero, we have that
Then we get that
since \(\lim _{\varepsilon \rightarrow 0}n_{w,\varepsilon }=n_w\), \(\lim _{\varepsilon \rightarrow 0}\hat{\varvec{\mu }}_{w,\varepsilon }=\hat{\varvec{\mu }}_w\) and
\(\lim _{\varepsilon \rightarrow 0}\hat{\varvec{\varSigma }}_{w,\varepsilon }^{-1}=\hat{\varvec{\varSigma }}_w^{-1}\). \(\square \)
Rights and permissions
About this article
Cite this article
Debruyne, M., Höppner, S., Serneels, S. et al. Outlyingness: Which variables contribute most?. Stat Comput 29, 707–723 (2019). https://doi.org/10.1007/s11222-018-9831-5
Received:
Accepted:
Published:
Issue Date:
Keywords
- Partial least squares
- Robust statistics
- Sparsity
- Variable selection