Outlier detection and robust covariance estimation using mathematical programming
The outlier detection problem and the robust covariance estimation problem are often interchangeable. Without outliers, the classical method of maximum likelihood estimation (MLE) can be used to estimate parameters of a known distribution from observational data. When outliers are present, they dominate the log likelihood function causing the MLE estimators to be pulled toward them. Many robust statistical methods have been developed to detect outliers and to produce estimators that are robust against deviation from model assumptions. However, the existing methods suffer either from computational complexity when problem size increases or from giving up desirable properties, such as affine equivariance. An alternative approach is to design a special mathematical programming model to find the optimal weights for all the observations, such that at the optimal solution, outliers are given smaller weights and can be detected. This method produces a covariance estimator that has the following properties: First, it is affine equivariant. Second, it is computationally efficient even for large problem sizes. Third, it easy to incorporate prior beliefs into the estimator by using semi-definite programming. The accuracy of this method is tested for different contamination models, including recently proposed ones. The method is not only faster than the Fast-MCD method for high dimensional data but also has reasonable accuracy for the tested cases.
KeywordsCovariance matrix estimation Robust statistics Outlier detection Optimization Semi-definite programming Newton–Raphson method
Mathematics Subject Classification (2000)62-07 (Statistics-Data analysis) 90-08 (Operations Research-Computational methods)
Unable to display preview. Download preview PDF.
- Chandola V, Banerjee A, Kumar V (2007) Outlier detection: a review. Technical Report, University of MinnesotaGoogle Scholar
- Critchley F, Schyns M, Haesbroeck G, Kinns D, Atkinson RA, Lu G (2004) The case sensitivity function approach to diagnostics and robust computation: a relaxation strategy. In: COMPSTAT: 2004 Proceedings in Computational Statistics, vol 36, pp 113–125Google Scholar
- Huber PJ (2004) Robust statistics. Wiley, New YorkGoogle Scholar
- Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of second Berkeley symposium. University of California Press, Berkeley, pp 481–492Google Scholar
- Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88(2):365–411Google Scholar
- Maronna RA, Martin RD, Yohai VJ (2004) Robust statistics: theory and methods. Wiley, New York (2006)Google Scholar
- Nguyen TD, Welsch R (2009) Outlier detection and least trimmed squares approximation using semi-definite programming. Comput Stat Data Anal (to appear)Google Scholar
- Toh KC, Todd MJ, Tutuncu RH (2006) Sdpt3 version 4.0 (beta)—a matlab software for semidefinite-quadratic-linear programmingGoogle Scholar