Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML

Bernal-Vasquez, Angela-Maria; Utz , H.-Friedrich; Piepho, Hans-Peter

doi:10.1007/s00122-016-2666-6

Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML

Original Article
Published: 16 February 2016

Volume 129, pages 787–804, (2016)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Angela-Maria Bernal-Vasquez ORCID: orcid.org/0000-0003-0415-8318¹,
H.-Friedrich Utz ² &
Hans-Peter Piepho¹

2298 Accesses
69 Citations
1 Altmetric
Explore all metrics

Abstract

Key message

We review and propose several methods for identifying possible outliers and evaluate their properties. The methods are applied to a genomic prediction program in hybrid rye.

Abstract

Many plant breeders use ANOVA-based software for routine analysis of field trials. These programs may offer specific in-built options for residual analysis that are lacking in current REML software. With the advance of molecular technologies, there is a need to switch to REML-based approaches, but without losing the good features of outlier detection methods that have proven useful in the past. Our aims were to compare the variance component estimates between ANOVA and REML approaches, to scrutinize the outlier detection method of the ANOVA-based package PlabStat and to propose and evaluate alternative procedures for outlier detection. We compared the outputs produced using ANOVA and REML approaches of four published datasets of generalized lattice designs. Five outlier detection methods are explained step by step. Their performance was evaluated by measuring the true positive rate and the false positive rate in a dataset with artificial outliers simulated in several scenarios. An implementation of genomic prediction using an empirical rye multi-environment trial was used to assess the outlier detection methods with respect to the predictive abilities of a mixed model for each method. We provide a detailed explanation of how the PlabStat outlier detection methodology can be translated to REML-based software together with the evaluation of alternative methods to identify outliers. The method combining the Bonferroni–Holm test to judge each residual and the residual standardization strategy of PlabStat exhibited good ability to detect outliers in small and large datasets and under a genomic prediction application. We recommend the use of outlier detection methods as a decision support in the routine data analyses of plant breeding experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust estimation of heritability and predictive accuracy in plant breeding: evaluation using simulation and empirical data

Article Open access 14 January 2020

Integrated nested Laplace approximation inference and cross-validation to tune variance components in estimation of breeding value

Article 15 March 2015

A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits

Article 15 July 2015

References

Anscombe FJ (1960) Rejection of outliers. Technometrics 2:123–147
Article Google Scholar
Anscombe FJ, Tukey JW (1963) The examination and analysis of residuals. Technometrics 5:141–160
Article Google Scholar
Babadi B, Rasekh A, Rasekhi AA, Zare K, Zadkarami MR (2014) A variance shift model for detection of outliers in the linear measurement error model. Abstr Appl Anal 2014:9
Article Google Scholar
Barnett V, Lewis T (2000) Outliers in statistical data. Wiley, New York
Google Scholar
Bernal-Vasquez AM, Möhring J, Schmidt M, Schönleben M, Schön CC, Piepho HP (2014) The importance of phenotypic data analysis for genomic prediction—a case study comparing different spatial models in rye. BMC Genom 15:646
Article Google Scholar
Bradu D, Hawkins DM (1982) Location of multiple outliers in two-way tables, using tetrads. Technometrics 24:103–108
Article Google Scholar
Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 52:707–719
Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data An 65:29–45
Article Google Scholar
Cochran WG, Cox GM (1957) Experimental designs, 2nd edn. Wiley, New York
Google Scholar
Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman and Hall, London
Google Scholar
Estaghvirou SBO, Ogutu JO, Piepho HP (2014) Influence of outliers on accuracy estimation in genomic prediction in plant breeding. G3(4):2317–2328
Google Scholar
Gomez KA, Gomez AA (1984) Statistical procedures for agricultural research. Wiley, New York
Google Scholar
Gumedze FN, Chatora TD (2014) Detection of outliers in longitudinal count data via overdispersion. Comput Stat Data An 79:192–202
Article Google Scholar
Gumedze FN, Jackson D (2011) A random effects variance shift model for detecting and accommodating outliers in meta-analysis. BMC Med Res Methodol 11:19
Article PubMed PubMed Central Google Scholar
Gumedze FN, Welham SJ, Gogel BJ, Thompson R (2010) A variance shift model for detection of outliers in the linear mixed model. Comput Stat Data An 54:2128–2144
Article Google Scholar
Hampel FR (1985) The breakdown points of the mean combined with some rejection rules. Technometrics 27:95–107
Article Google Scholar
Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York
Book Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Google Scholar
Iglewicz B (2000) Robust scale estimators and confidence intervals for location. In: Hoaglin D, Mosteller F, Tukey JW (eds) Understanding robust and exploratory data analysis. Wiley, New York
John JA, Williams ER (1995) Cyclic and computer generated designs, 2nd edn. Chapman and Hall, London
Book Google Scholar
Littell RC (2002) Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS. J Agric Biol Envir S 7:472–490
Article Google Scholar
Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O (2006) SAS for mixed models, 2nd edn. SAS Institute Inc., NC
Google Scholar
Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jannink JL, Singh RP, Autrique E, de los Campos G (2015) Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3 5:569–582
Lourenço VM, Pires AM (2014) M-regression, false discovery rates and outlier detection with application to genetic association studies. Comput Stat Data An 78:33–42
Article Google Scholar
Marubini E, Orenti A (2014) Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points. Epidemiol Biostat Public Health 11:1–17
Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
CAS PubMed PubMed Central Google Scholar
Meyer K (2009) Factor-analytic models for genotype × environment type problems and structured covariance matrices. Genet Select Evol 41:21
Article Google Scholar
Nobre JS, Singer JM (2007) Residual analysis for linear mixed models. Biom J 49:863–875
Article PubMed Google Scholar
Nobre JS, Singer JM (2011) Leverage analysis for linear mixed models. J Appl Stat 38:1063–1072
Article Google Scholar
Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49:1165–1176
Article Google Scholar
Piepho HP, Büchse A, Truberg B (2006) On the use of multiple lattice designs and \(\alpha \)-designs in plant breeding trials. Plant Breed 125:523–528
Article Google Scholar
Pinho LGB, Nobre JS, Singer JM (2015) Cook’s distance for generalized linear mixed models. Comput Stat Data An 82:126–136
Article Google Scholar
Rocke DM, Woodruff DL (1996) Identification of outliers in multivariate data. JASA 91:1047–1061
Article Google Scholar
Ruppert D (2011) Statistics and data analysis for financial engineering. Springer, New York
Book Google Scholar
Schützenmeister A, Piepho HP (2012) Residual analysis of linear mixed models using a simulation approach. Comput Stat Data An 56:1405–1416
Article Google Scholar
Searle SR (1987) Linear models for unbalanced data. Wiley, New York
Google Scholar
Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New York
Book Google Scholar
Smith A, Cullis B, Gilmour A (2001) The analysis of crop variety evaluation data in Australia. Aust NZ J Stat 43:129–145
Article Google Scholar
Swallow W, Kianifard F (1996) Using robust scale estimates in detecting multiple outliers in linear regression. Biometrics 52:545–556
Article Google Scholar
Thompson WA (1962) The problem of negative estimates of variance components. Ann Math Stat 33:273–289
Article Google Scholar
Utz HF (2003) PLABSTAT Manual. http://www.uni-hohenheim.de/ipsp/soft.html. version 3A of 2010-07-19
Wensch J, Wensch-Dorendorf M, Swalve HH (2013) The evaluation of variance component estimation software: generating benchmark problems by exact and approximate methods. Comput Stat 28:1725–1748
Article Google Scholar
Williams ER (1977) Iterative analysis of generalized lattice designs. Aust J Stat 19:39–42
Article Google Scholar
Wulff SS (2008) The equality of REML and ANOVA estimators of variance components in unbalanced normal classification models. Stat Probabil Lett 78:405–411
Article Google Scholar
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
CAS PubMed Google Scholar

Download references

Acknowledgments

This research was funded by KWS-LOCHOW GMBH and the German Federal Ministry of Education and Research (Bonn, Germany) within the AgroClusterEr “Rye-Select: Genome-based precision breeding strategies for rye” (Grant ID: 0315946A). We thank Vanda Lourenço for commenting on the manuscript and Steffen Hadasch for helping with the R codes. We are grateful to KWS-LOCHOW for providing the datasets used in this study and the technical support to run the analyses. We thank the Synbreed project members for their helpful and constructive comments during the discussion sessions and also the anonymous reviewers for suggestions and comments that led to improvements in the clarity of the manuscript.

Author information

Authors and Affiliations

Biostatistics Unit, Institute of Crop Sciences, University of Hohenheim, Fruwirthstrasse 23, 70599, Stuttgart, Germany
Angela-Maria Bernal-Vasquez & Hans-Peter Piepho
Plant Breeding Institute, University of Hohenheim, Fruwirthstrasse 21, 70599, Stuttgart, Germany
H.-Friedrich Utz

Authors

Angela-Maria Bernal-Vasquez
View author publications
You can also search for this author in PubMed Google Scholar
H.-Friedrich Utz
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Piepho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angela-Maria Bernal-Vasquez.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical standards

The authors declare that ethical standards are met, and all the experiments comply with the current laws of the country in which they were performed.

Additional information

Communicated by M. J. Sillanpaa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 91 kb)

Supplementary material 2 (PDF 115 kb)

Supplementary material 3 (PDF 1107 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernal-Vasquez, AM., Utz , HF. & Piepho, HP. Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor Appl Genet 129, 787–804 (2016). https://doi.org/10.1007/s00122-016-2666-6

Download citation

Received: 12 June 2015
Accepted: 09 January 2016
Published: 16 February 2016
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00122-016-2666-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML