The power of monitoring: how to make the most of a contaminated multivariate sample
- 105 Downloads
Diagnostic tools must rely on robust high-breakdown methodologies to avoid distortion in the presence of contamination by outliers. However, a disadvantage of having a single, even if robust, summary of the data is that important choices concerning parameters of the robust method, such as breakdown point, have to be made prior to the analysis. The effect of such choices may be difficult to evaluate. We argue that an effective solution is to look at several pictures, and possibly to a whole movie, of the available data. This can be achieved by monitoring, over a range of parameter values, the results computed through the robust methodology of choice. We show the information gain that monitoring provides in the study of complex data structures through the analysis of multivariate datasets using different high-breakdown techniques. Our findings support the claim that the principle of monitoring is very flexible and that it can lead to robust estimators that are as efficient as possible. We also address through simulation some of the tricky inferential issues that arise from monitoring.
KeywordsData movie Forward search Outlier detection MM-estimation S-estimation Trimming Reweighting
We are very grateful to the Editor, Tommaso Proietti, for inviting this paper and for organizing its discussion. We also thank Alessio Farcomeni, Luca Greco, Domenico Perrotta and two anonymous reviewers for helpful comments on a previous draft. MR and ACA gratefully acknowledge support from the CRoNoS project, reference CRoNoS COST Action IC1408.
- Amiguet M, Marazzi A, Valdora M, Yohai V (2017) Robust estimators for generalized linear models with a dispersion parameter. Technical Report 1703.09626v1, arXivGoogle Scholar
- Atkinson AC, Corbellini A, Riani M (2017a) Robust Bayesian regression with the forward search: theory and data analysis. Test, in press, https://doi.org/10.1007/s11749-017-0542-6
- Atkinson AC, Riani M, Cerioli A (2017) Cluster detection and clustering with random start forward searches. J Appl Stat, in press, https://doi.org/10.1080/02664763.2017.1310806
- Boudt K, Rousseeuw P, Vanduffel S, Verdonck T (2017) The minimum regularized covariance determinant estimator. Technical Report 1701.07086v1, arXivGoogle Scholar
- Cerioli A, Atkinson AC, Riani M (2016) How to marry robustness and applied statistics. In: Di Battista T, Moreno E, Racugno W (eds) Topics on methodological and applied statistical inference. Springer, Heidelberg, pp 51–64Google Scholar
- Cerioli A, Farcomeni A, Riani M (2017) Wild adaptive trimming for robust estimation and cluster analysis. SubmittedGoogle Scholar
- Dotto F, Farcomeni A, García-Escudero LA, Mayo-Iscar A (2017) A reweighting approach to robust clustering. Stat Comput, in press, https://doi.org/10.1007/s11222-017-9742-x
- Green CG, Martin D (2014) An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli. Technical Report available at http://christopherggreen.github.io/papers, Department of Statistics, University of Washington