Skip to main content
Log in

Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

We review and propose several methods for identifying possible outliers and evaluate their properties. The methods are applied to a genomic prediction program in hybrid rye.

Abstract

Many plant breeders use ANOVA-based software for routine analysis of field trials. These programs may offer specific in-built options for residual analysis that are lacking in current REML software. With the advance of molecular technologies, there is a need to switch to REML-based approaches, but without losing the good features of outlier detection methods that have proven useful in the past. Our aims were to compare the variance component estimates between ANOVA and REML approaches, to scrutinize the outlier detection method of the ANOVA-based package PlabStat and to propose and evaluate alternative procedures for outlier detection. We compared the outputs produced using ANOVA and REML approaches of four published datasets of generalized lattice designs. Five outlier detection methods are explained step by step. Their performance was evaluated by measuring the true positive rate and the false positive rate in a dataset with artificial outliers simulated in several scenarios. An implementation of genomic prediction using an empirical rye multi-environment trial was used to assess the outlier detection methods with respect to the predictive abilities of a mixed model for each method. We provide a detailed explanation of how the PlabStat outlier detection methodology can be translated to REML-based software together with the evaluation of alternative methods to identify outliers. The method combining the Bonferroni–Holm test to judge each residual and the residual standardization strategy of PlabStat exhibited good ability to detect outliers in small and large datasets and under a genomic prediction application. We recommend the use of outlier detection methods as a decision support in the routine data analyses of plant breeding experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Anscombe FJ (1960) Rejection of outliers. Technometrics 2:123–147

    Article  Google Scholar 

  • Anscombe FJ, Tukey JW (1963) The examination and analysis of residuals. Technometrics 5:141–160

    Article  Google Scholar 

  • Babadi B, Rasekh A, Rasekhi AA, Zare K, Zadkarami MR (2014) A variance shift model for detection of outliers in the linear measurement error model. Abstr Appl Anal 2014:9

    Article  Google Scholar 

  • Barnett V, Lewis T (2000) Outliers in statistical data. Wiley, New York

    Google Scholar 

  • Bernal-Vasquez AM, Möhring J, Schmidt M, Schönleben M, Schön CC, Piepho HP (2014) The importance of phenotypic data analysis for genomic prediction—a case study comparing different spatial models in rye. BMC Genom 15:646

    Article  Google Scholar 

  • Bradu D, Hawkins DM (1982) Location of multiple outliers in two-way tables, using tetrads. Technometrics 24:103–108

    Article  Google Scholar 

  • Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 52:707–719

  • Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data An 65:29–45

    Article  Google Scholar 

  • Cochran WG, Cox GM (1957) Experimental designs, 2nd edn. Wiley, New York

    Google Scholar 

  • Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman and Hall, London

    Google Scholar 

  • Estaghvirou SBO, Ogutu JO, Piepho HP (2014) Influence of outliers on accuracy estimation in genomic prediction in plant breeding. G3(4):2317–2328

    Google Scholar 

  • Gomez KA, Gomez AA (1984) Statistical procedures for agricultural research. Wiley, New York

    Google Scholar 

  • Gumedze FN, Chatora TD (2014) Detection of outliers in longitudinal count data via overdispersion. Comput Stat Data An 79:192–202

    Article  Google Scholar 

  • Gumedze FN, Jackson D (2011) A random effects variance shift model for detecting and accommodating outliers in meta-analysis. BMC Med Res Methodol 11:19

    Article  PubMed  PubMed Central  Google Scholar 

  • Gumedze FN, Welham SJ, Gogel BJ, Thompson R (2010) A variance shift model for detection of outliers in the linear mixed model. Comput Stat Data An 54:2128–2144

    Article  Google Scholar 

  • Hampel FR (1985) The breakdown points of the mean combined with some rejection rules. Technometrics 27:95–107

    Article  Google Scholar 

  • Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York

    Book  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    Google Scholar 

  • Iglewicz B (2000) Robust scale estimators and confidence intervals for location. In: Hoaglin D, Mosteller F, Tukey JW (eds) Understanding robust and exploratory data analysis. Wiley, New York

  • John JA, Williams ER (1995) Cyclic and computer generated designs, 2nd edn. Chapman and Hall, London

    Book  Google Scholar 

  • Littell RC (2002) Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS. J Agric Biol Envir S 7:472–490

    Article  Google Scholar 

  • Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O (2006) SAS for mixed models, 2nd edn. SAS Institute Inc., NC

    Google Scholar 

  • Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jannink JL, Singh RP, Autrique E, de los Campos G (2015) Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3 5:569–582

  • Lourenço VM, Pires AM (2014) M-regression, false discovery rates and outlier detection with application to genetic association studies. Comput Stat Data An 78:33–42

    Article  Google Scholar 

  • Marubini E, Orenti A (2014) Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points. Epidemiol Biostat Public Health 11:1–17

    Google Scholar 

  • Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed  PubMed Central  Google Scholar 

  • Meyer K (2009) Factor-analytic models for genotype × environment type problems and structured covariance matrices. Genet Select Evol 41:21

    Article  Google Scholar 

  • Nobre JS, Singer JM (2007) Residual analysis for linear mixed models. Biom J 49:863–875

    Article  PubMed  Google Scholar 

  • Nobre JS, Singer JM (2011) Leverage analysis for linear mixed models. J Appl Stat 38:1063–1072

    Article  Google Scholar 

  • Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49:1165–1176

    Article  Google Scholar 

  • Piepho HP, Büchse A, Truberg B (2006) On the use of multiple lattice designs and \(\alpha \)-designs in plant breeding trials. Plant Breed 125:523–528

    Article  Google Scholar 

  • Pinho LGB, Nobre JS, Singer JM (2015) Cook’s distance for generalized linear mixed models. Comput Stat Data An 82:126–136

    Article  Google Scholar 

  • Rocke DM, Woodruff DL (1996) Identification of outliers in multivariate data. JASA 91:1047–1061

    Article  Google Scholar 

  • Ruppert D (2011) Statistics and data analysis for financial engineering. Springer, New York

    Book  Google Scholar 

  • Schützenmeister A, Piepho HP (2012) Residual analysis of linear mixed models using a simulation approach. Comput Stat Data An 56:1405–1416

    Article  Google Scholar 

  • Searle SR (1987) Linear models for unbalanced data. Wiley, New York

    Google Scholar 

  • Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New York

    Book  Google Scholar 

  • Smith A, Cullis B, Gilmour A (2001) The analysis of crop variety evaluation data in Australia. Aust NZ J Stat 43:129–145

    Article  Google Scholar 

  • Swallow W, Kianifard F (1996) Using robust scale estimates in detecting multiple outliers in linear regression. Biometrics 52:545–556

    Article  Google Scholar 

  • Thompson WA (1962) The problem of negative estimates of variance components. Ann Math Stat 33:273–289

    Article  Google Scholar 

  • Utz HF (2003) PLABSTAT Manual. http://www.uni-hohenheim.de/ipsp/soft.html. version 3A of 2010-07-19

  • Wensch J, Wensch-Dorendorf M, Swalve HH (2013) The evaluation of variance component estimation software: generating benchmark problems by exact and approximate methods. Comput Stat 28:1725–1748

    Article  Google Scholar 

  • Williams ER (1977) Iterative analysis of generalized lattice designs. Aust J Stat 19:39–42

    Article  Google Scholar 

  • Wulff SS (2008) The equality of REML and ANOVA estimators of variance components in unbalanced normal classification models. Stat Probabil Lett 78:405–411

    Article  Google Scholar 

  • Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577

    CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This research was funded by KWS-LOCHOW GMBH and the German Federal Ministry of Education and Research (Bonn, Germany) within the AgroClusterEr “Rye-Select: Genome-based precision breeding strategies for rye” (Grant ID: 0315946A). We thank Vanda Lourenço for commenting on the manuscript and Steffen Hadasch for helping with the R codes. We are grateful to KWS-LOCHOW for providing the datasets used in this study and the technical support to run the analyses. We thank the Synbreed project members for their helpful and constructive comments during the discussion sessions and also the anonymous reviewers for suggestions and comments that led to improvements in the clarity of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angela-Maria Bernal-Vasquez.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical standards

The authors declare that ethical standards are met, and all the experiments comply with the current laws of the country in which they were performed.

Additional information

Communicated by M. J. Sillanpaa.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bernal-Vasquez, AM., Utz , HF. & Piepho, HP. Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor Appl Genet 129, 787–804 (2016). https://doi.org/10.1007/s00122-016-2666-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-016-2666-6

Keywords

Navigation