Statistical Issues and Assumptions of Phylogenetic Generalized Least Squares

  • Roger MundryEmail author


Using phylogenetic generalized least squares (PGLS) means to fit a linear regression aiming to investigate the impact of one or several predictor variables on a single response variable while controlling for potential phylogenetic signal in the response (and, hence, non-independence of the residuals). The key difference between PGLS and standard (multiple) regression is that PGLS allows us to control for residuals being potentially non-independent due to the phylogenetic history of the taxa investigated. While the assumptions of PGLS regarding the underlying processes of evolution and the correlation of the predictor and response variables with the phylogeny have received considerable attention, much less focus has been put on the checks of model reliability and stability commonly used in case of standard general linear models. However, several of these checks could be similarly applied in the context of PGLS as well. Here, I describe how such checks of model stability and reliability could be applied in the context of a PGLS and what could be done in case they reveal potential problems. Besides treating general questions regarding the conceptual and technical validity of the model, I consider issues regarding the sample size, collinearity among the predictors, the distribution of the predictors and the residuals, model stability, and drawing inference based on P-values. Finally, I emphasize the need for reporting checks of assumptions (and their results) in publications.



First of all, I would like to thank László Zsolt Garamszegi for inviting me to write this chapter. I also thank László Zsolt Garamszegi and two anonymous reviewers for very helpful comments on an earlier draft of this chapter. I equally owe thanks to Charles L. Nunn for initially leading my attention to the need for and rationale of phylogenetically corrected statistical analyses. During the three AnthroTree workshops held in Amherst, MA, U.S.A., in 2010–2012 and supported by the NSF (BCS-0923791) and the National Evolutionary Synthesis Center (NSF grant EF-0905606) I learnt a lot about the philosophy and practical implementation of phylogenetic approaches to statistical analyses, and I am very grateful to have had the opportunity to attend them. This article was mainly written during a stay on the wonderful island of Læsø, Denmark, and I owe warm thanks to the staff of the hotel Havnebakken for their hospitality that made my stay very enjoyable and productive at the same time.



Set of entries in the data referring to the same taxon; represented by one row in the data set and corresponds to one tip in the phylogeny.


Quantitative predictor variable.

Dummy coding

Way of representing a factor in a linear model, by turning it into a set of ‘quantitative’ variables. One level of the factor is defined the ‘reference’ level (or reference category), and for each of the other levels a variable is created which is one if the respective case in the data set is of that level and zero otherwise. The estimate derived for a dummy coded variable reveals the degree by which the response in the coded level differs from that of the reference level.


Qualitative (or categorical) predictor variable.

General linear model

Unified approach to test the effect(s) of one or several quantitative or categorical predictors on a single quantitative response; makes the assumptions of normally and homogeneously distributed residuals; multiple regression, ANOVA, ANCOVA, and the t-tests are all just special cases of the general linear model.


Particular value of a factor (for instance, the factor ‘sex’ has the levels ‘female’ and ‘male’).

Predictor (variable)

Variable for which its influence on the response variable should be investigated or controlled for; can be a factor or a covariate.

Response (variable)

Variable being in the focus of the study and for which it should be investigated how one or several predictors influence it.

Right (left) skewed distribution

Distribution with many small and few large values (a left skewed distribution shows the opposite pattern).


  1. Aiken LS, West SG (1991) Multiple regression: testing and interpreting interactions. Sage, Newbury ParkGoogle Scholar
  2. Arnold C, Nunn CL (2010) Phylogenetic targeting of research effort in evolutionary biology. Am Nat 176:601–612CrossRefGoogle Scholar
  3. Budaev SV (2010) Using principal components and factor analysis in animal behaviour research: caveats and guidelines. Ethology 116:472–480CrossRefGoogle Scholar
  4. Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, BerlinGoogle Scholar
  5. Chatfield C (1995) Model uncertainty, data mining and statistical inference. J Roy Stat Soc A 158:419–466CrossRefGoogle Scholar
  6. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New YorkGoogle Scholar
  7. Cohen J, Cohen P (1983) Applied multiple regression/correlation analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates Inc., New JerseyGoogle Scholar
  8. Cooper N, Jetz W, Freckleton RP (2010) Phylogenetic comparative approaches for studying niche conservatism. J Evol Biol 23:2529–2539CrossRefGoogle Scholar
  9. Díaz-Uriarte R, Garland T Jr (1996) Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst Biol 45:27–47CrossRefGoogle Scholar
  10. Díaz-Uriarte R, Garland T Jr (1998) Effects of branch length errors on the performance of phylogenetically independent contrasts. Syst Biol 47:654–672CrossRefGoogle Scholar
  11. Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15CrossRefGoogle Scholar
  12. Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecol Syst 19:445–471CrossRefGoogle Scholar
  13. Field A (2005) Discovering statistics using SPSS. Sage Publications, LondonGoogle Scholar
  14. Forstmeier W, Schielzeth H (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol 65:47–55CrossRefGoogle Scholar
  15. Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183CrossRefGoogle Scholar
  16. Freckleton RP (2009) The seven deadly sins of comparative analysis. J Evol Biol 22:1367–1375CrossRefGoogle Scholar
  17. Freckleton RP (2011) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65:91–101CrossRefGoogle Scholar
  18. Freckleton RP, Cooper N, Jetz W (2011) Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am Nat 178:E10–E17CrossRefGoogle Scholar
  19. Freckleton RP, Jetz W (2009) Space versus phylogeny: disentangling phylogenetic and spatial signals in comparative data. Proc Roy Soc B—Biol Sci 276:21–30CrossRefGoogle Scholar
  20. Garamszegi LZ, Møller AP (2012) Untested assumptions about within-species sample size and missing data in interspecific studies. Behav Ecol Sociobiol 66:1363–1373CrossRefGoogle Scholar
  21. Garland T Jr, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364CrossRefGoogle Scholar
  22. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeGoogle Scholar
  23. Grafen A (1989) The phylogenetic regression. Phil Trans Roy Soc Lond B, Biol Sci 326:119–157CrossRefGoogle Scholar
  24. Grafen A, Ridley M (1996) Statistical tests for discrete cross-species data. J Theor Biol 183:225–267CrossRefGoogle Scholar
  25. Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351CrossRefGoogle Scholar
  26. Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, OxfordGoogle Scholar
  27. Ives AR, Garland T Jr (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59:9–26CrossRefGoogle Scholar
  28. Martins EP, Diniz-Filho JAF, Housworth EA (2002) Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56:1–13CrossRefGoogle Scholar
  29. Mundry R (2011) Issues in information theory based statistical inference—a commentary from a frequentist’s perspective. Behav Ecol Sociobiol 65:57–68CrossRefGoogle Scholar
  30. Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. The University of Chicago Press, ChicagoCrossRefGoogle Scholar
  31. Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884CrossRefGoogle Scholar
  32. Polly PD, Lawing AM, Fabre A-C, Goswami A (2013) Phylogenetic principal components analysis and geometric morphometrics. Hystrix, Ital J Mammal 24:33–41Google Scholar
  33. Quinn GP, Keough MJ (2002) Experimental designs and data analysis for biologists. Cambridge University Press, CambridgeGoogle Scholar
  34. Ramsey PH (1980) Exact type 1 error rates for robustness of student’s t test with unequal variances. J Educ Stat 5:337–349CrossRefGoogle Scholar
  35. R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, AustriaGoogle Scholar
  36. Revell LJ (2009) Size-correction and principal components for interspecific comparative studies. Evolution 63:3258–3268CrossRefGoogle Scholar
  37. Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329CrossRefGoogle Scholar
  38. Rohlf FJ (2006) A comment on phylogenetic correction. Evolution 60:1509–1515CrossRefGoogle Scholar
  39. Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Meth Ecol Evol 1:103–113CrossRefGoogle Scholar
  40. Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems. Meth Ecol Evol 1:3–14CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Max Planck Institute for Evolutionary AnthropologyLeipzigGermany

Personalised recommendations