Abstract
Understanding adaptive patterns is especially difficult in the case of “evolutionary singularities,” i.e., traits that evolved in only one lineage in the clade of interest. New methods are needed to integrate our understanding of general phenotypic correlations and convergence within a clade when examining a single lineage in that clade. Here, we develop and apply a new method to investigate change along a single branch of an evolutionary tree; this method can be applied to any branch on a phylogeny, typically focusing on an a priori hypothesis for “exceptional evolution” along particular branches, for example in humans relative to other primates. Specifically, we use phylogenetic methods to predict trait values for a tip on the phylogeny based on a statistical (regression) model, phylogenetic signal (λ), and evolutionary relationships among species in the clade. We can then evaluate whether the observed value departs from the predicted value. We provide two worked examples in human evolution using original R scripts that implement this concept in a Bayesian framework. We also provide simulations that investigate the statistical validity of the approach. While multiple approaches can and should be used to investigate singularities in an evolutionary context—including studies of the rate of phenotypic change along a branch—our Bayesian approach provides a way to place confidence on the predicted values in light of uncertainty about the underlying evolutionary and statistical parameters.
Keywords
- Phylogenetic Prediction
- Predicted Trait Values
- Exceptional Development
- Phylogenetic Generalized Least Squares (PGLS)
- BayesTraits
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Allman JM, Martin B (2000) Evolving brains. Scientific American Library, Nueva York
Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate phylogeny. Evol Anthropol 19:114–118
Barrett R, Kuzawa CW, McDade T, Armelagos GJ (1998) Emerging and re-emerging infectious diseases: the third epidemiologic transition. Annu Rev Anthropol 27:247–271
Barton RA (1996) Neocortex size and behavioural ecology in primates. Proc R Soc Lond (Biol) 263:173–177
Barton RA, Venditti C (2013) Human frontal lobes are not relatively large. PNAS 110:9001–9006
Cooper N, Kamilar JM, Nunn CL (2012) Longevity and parasite species richness in mammals. PLoS One
Deaner RO, Isler K, Burkart J, van Schaik C (2007) Overall brain size, and not encephalization quotient, best predicts cognitive ability across non-human primates. Brain Behav Evol 70:115–124
Deaner RO, Nunn CL, van Schaik CP (2000) Comparative tests of primate cognition: different scaling methods produce different results. Brain Behav Evol 55:44–52
Diniz-Filho JAF, De Sant’ana CER, Bini LM (1998) An eigenvector method for estimating phylogenetic inertia. Evolution 52:1247–1262
Diniz-Filho JAF, Bini LM (2005) Modelling geographical patterns in species richness using eigenvector-based spatial filters. Global Ecol Biogeogr 14:177–185
Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16:681–735
Fagan WF, Pearson YE, Larsen EA, Lynch HJ, Turner JB, Staver H, Noble AE, Bewick S, Goldberg EE (2013) Phylogenetic prediction of the maximum per capita rate of population growth. Proc R Soc Lond (Biol) 280:20130523
Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15
Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160:712–726
Garland T, Bennett AF, Rezende EL (2005) Phylogenetic approaches in comparative physiology. J Exp Biol 208:3015–3035
Garland T, Dickerman AW, Janis CM, Jones JA (1993) Phylogenetic analysis of covariance by computer simulation. Syst Biol 42:265–292
Garland T, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364
Garland T, Midford PE, Ives AR (1999) An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values. Am Zool 39:374–388
Gelman A (2004) Bayesian Data Analysis. Chapman & Hall/CRC, London/Boca Raton
Grafen A (1989) The phylogenetic regression. Philos Trans R Soc Lond (Biol) 326:119–157
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology., Oxford Series in Ecology and EvolutionOxford University Press, Oxford
Hastings WK (1970) Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 57(1):97–109
Hughes AL, Hughes MK (1995) Small genomes for better flyers. Nature 377:391. doi:10.1038/377391a0
Jungers WL (1978) Functional significance of skeletal allometry in megaladapis in comparison to living prosimians. Am J Phys Anthropol 49:303–314
Kappeler PM, Silk JB (eds) (2009) Mind the gap: tracing the origins of human universals. Springer, Berlin
Lieberman D (2011) The evolution of the human head. Belknap Press, Cambridge
Liu J (2003) Monte Carlo strategies in scientific computing. Springer, Berlin
Maddison WP, Midford PE, Otto SP (2007) Estimating a binary character’s effect on speciation and extinction. Syst Biol 56:701–710
Martin R (2002) Primatology as an essential basis for biological anthropology. Evol Anthropol 11:3–6
Martin RD (1990) Primate origins and evolution. Chapman and Hall, London
Martins EP (1994) Estimating the rate of phenotypic evolution from comparative data. Am Nat 144:193–209
Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149:646–667
McPeek MA (1995) Testing hypotheses about evolutionary change on single branches of a phylogeny using evolutionary contrasts. Am Nat 145:686–703
Mundry R, Nunn CL (2009) Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat 173:119–123
Napier JR (1970) The roots of mankind. Smithsonian Institution Press, Washington
Napier JR, Walker AC (1967) Vertical clinging and leaping—a newly recognized category of locomotor behaviour of primates. Folia Primatol 6:204–219
Nee S (2006) Birth-death models in macroevolution. Ann Rev Ecol Evol S 37:1–17
Nunn CL (2002) A comparative study of leukocyte counts and disease risk in primates. Evolution 56:177–190
Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. University of Chicago Press, Chicago
Nunn CL, Gittleman JL, Antonovics J (2000) Promiscuity and the primate immune system. Science 290:1168–1170
Nunn CL, Lindenfors P, Pursall ER, Rolff J (2009) On sexual dimorphism in immune function. Philos Trans Roy Soc B Biol Sci 364:61–69. doi:10.1098/Rstb.2008.0148
Nunn CL, van Schaik CP (2002) Reconstructing the behavioral ecology of extinct primates. In: Plavcan JM, Kay RF, Jungers WL, Schaik CPv (eds) Reconstructing behavior in the fossil record. Kluwer Academic/Plenum, New York, pp 159–216
O’Hara RB, Sillanpaay MJ (2009) A review of bayesian variable selection methods: what, how and which. Bayesian Anal 4(1):85–118
O’Meara BC, Ane C, Sanderson MJ, Wainwright PC (2006) Testing for different rates of continuous trait evolution using likelihood. Evolution 60:922–933
Organ CL, Nunn CL, Machanda Z, Wrangham RW (2011) Phylogenetic rate shifts in feeding time during the evolution of Homo. Proc Natl Acad Sci USA 108:14555–14559
Organ CL, Shedlock AM (2009) Palaeogenomics of pterosaurs and the evolution of small genome size in flying vertebrates. Biol Lett 5:47–50
Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV (2007) Origin of avian genome size and structure in non-avian dinosaurs. Nature 446:180–184
Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N (2011) Caper: comparative analyses of phylogenetics and evolution in R. http://R-Forge.R-project.org/projects/caper/
Pagel M (1997) Inferring evolutionary processes from phylogenies. Zool Scr 26:331–348
Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884
Pagel M (2002) Modelling the evolution of continuously varying characters on phylogenetic trees: the case of hominid cranial capacity. In: MacLeod N, Forey PL (eds) Morphology, shape and phylogeny. Taylor and Francis, London, pp 269–286
Pagel M, Lutzoni F (2002) Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation. In: Lässig M, Valleriani A (eds) Biological evolution and statistical physics. Springer, Berlin, pp 148–161
Pagel M, Meade A (2007) Bayes traits (http://www.evolution.rdg.ac.uk). 1.0 edn., Reading, UK
Pagel MD (1994) The adaptationist wager. In: Eggleton P, Vane-Wright RI (eds) Phylogenetics and Ecology. Academic, London, pp 29–51
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290
Reader SM, Laland KN (2002) Social intelligence, innovation, and enhanced brain size in primates. PNAS 99:4436–4441
Revell L (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329
Revell LJ (2008) On the analysis of evolutionary change along single branches in a phylogeny. Am Nat 172:140–147
Revell LJ (2011) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol
Rodseth L, Wrangham RW, Harrigan AM, Smuts BB, Dare R, Fox R, King B, Lee P, Foley R, Muller J, Otterbein K, Strier K, Turke P, Wolpoff M (1991) The human community as a primate society. Curr Anthropol 32:221–254
Rohlf FJ (2001) Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution 55:2143–2160
Safi K, Pettorelli N (2010) Phylogenetic, spatial and environmental components of extinction risk in carnivores. Global Ecol Biogeogr 19:352–362
Sherwood CC, Bauernfeind AL, Bianchi S, Raghanti MA, Hof PR (2012) Human brain evolution writ large and small. In: Hofman M, Falk D (eds) Evolution of the primate brain: from neuron to behavior, vol 195. Elsevier, Amsterdam, pp 237–254
Sherwood CC, Subiaul F, Zawidzki TW (2008) A natural history of the human mind: tracing evolutionary changes in brain and cognition. J Anat 212:426–454
Tennie C, Call J, Tomasello M (2009) Ratcheting up the ratchet: on the evolution of cumulative culture. Philos Trans R Soc Lond (Biol) Biol Sci 364:2405–2415
Tooby J, DeVore I (1987) The reconstruction of hominid behavioral evolution through strategic modeling. In: Kinzey WG (ed) The evolution of human behavior: primate models. State University of New York Press, Albany, pp 183–237
van Schaik CP, van Noordwijk MA, Nunn CL (1999) Sex and social evolution in primates. In: Lee PC (ed) Comparative primate socioecology. Cambridge University Press, Cambridge, pp 204–240
Wrangham RW (2009) Catching fire: how cooking made us human. Basic Books, New York
Acknowledgments
We thank Luke Matthews, Tirthankar Dasgupta, László Zsolt Garamszegi, and two anonymous referees for helpful discussion and feedback. Joel Bray helped format the manuscript. This research was supported by the NSF (BCS-0923791 and BCS-1355902).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Phylogenetic Prediction for Extant and Extinct Species
Appendix: Phylogenetic Prediction for Extant and Extinct Species
-
1.
Mathematical Description of Method
Consider the following regression model for \( n \) different species:
In the above model, \( y_{i} \) is the response variable for the \( i{\text{th}} \) species and \( \varvec{x}_{\varvec{i}} = (x_{i1} , \ldots ,x_{im} ) \) are covariates associated with the \( i{\text{th}} \) species. The error terms for all species \( \varvec{ \in } = ( \in_{1} , \in_{2} , \ldots , \in_{n} ) \) follow multivariate normal distribution:
In this equation, \( {\mathbf{V}} \) is the covariance matrix structure and \( \sigma^{2} \) is the standard deviation. Ordinary linear regression usually assumes the errors are independent, identical, and normally distributed, such that the covariance matrix has the same value along the diagonal of V with off-diagonal set to zero. For biological data, however, different species will exhibit similarity because of common ancestry, which leads to positive values on the off-diagonals. Moreover, the diagonal of V may show heterogeneity if root-to-tip distances vary, as might be the case if fossils are included or when the branch lengths are based on molecular change rather than absolute dates. As noted above, it is possible to select scaling parameters that transform the branch lengths to better model the evolution of traits on a given tree topology. The parameter \( \lambda \) scales internal branches (off-diagonal elements of V) between 0 and 1; when \( \lambda = 0 \), this corresponds to no phylogenetic structure, i.e., a star phylogeny. The parameter \( \kappa \) raises all branches to the power \( \kappa \). Thus, when \( \kappa = 0 \), this corresponds to a phylogeny with equal branch lengths, as might occur when speciational change takes place.
Hence, the covariance structure \( {\mathbf{V}} \) can be crucial to comparative analyses of species values, and scaling parameters provide important insights into the evolutionary process and degree of phylogenetic signal in the data.
The objective is to select the optimal model with respect to different covariates and variance structures. Two variance structures, \( \lambda \) and \( \kappa \), are considered as scaling parameters. We aim to select covariates as well as the variance structure that best characterizes trait evolution. Meanwhile, precise estimation of different parameters (regression coefficients, \( \lambda \), \( \kappa \)) is also required. Given that only \( \lambda \) and \( \kappa \) are considered, we can rewrite the distribution of \( \varvec{ \in } \) as follows:
In this equation, \( I_{V} \) indicates the selection of variance structure. \( I_{V} \) will be equal to 1 for estimating \( \lambda \) and 0 for estimating \( \kappa \). The parameters \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) are standard deviations for \( \lambda \) and \( \kappa \). Covariance matrix \( \Sigma (T,I_{V} ,\lambda ,\kappa ) \) is a function of the evolutionary tree \( T \), indicator \( I_{V} \), \( \lambda \), and \( \kappa \). Henceforth, we will use notation \( \varSigma \) to replace \( \Sigma (T,I_{V} ,\lambda ,\kappa ) \).
Using a Bayesian framework, the parameters are treated as random variables and their distribution is investigated. In order to select models, three types of parameters are included in the above model:
-
1.
Parameters for tree selection \( T \). Here, we would use a large number of trees to represent uncertainty in the phylogeny that describes evolutionary relationships among the species. A posterior distribution of M trees \( \{ T_{1} , \ldots , T_{M} \} \) will be used and treated as a uniform distribution (although a single tree can also be used).
-
2.
Parameters for variable selection \( \Theta _{1} = (\varvec{\gamma},\varvec{\beta}) \). This includes the indicator variable \( \varvec{\gamma}= (\gamma_{1} , \ldots ,\gamma_{m} ) \), which indicates whether a variable is included in the model. Moreover, effect size \( \varvec{\beta}= (\beta_{1} , \ldots ,\beta_{m} ) \) for each covariate is also included. The regression coefficient \( \theta_{i} = \gamma_{i} \times \beta_{i} , i = 1, 2, \ldots , m \).
-
3.
Parameters for variance selection \( \Theta _{2} = (I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ) \)
Let \( \varvec{Y} \) and \( \varvec{X} \) be matrices of the response variable and m explanatory variables for n species, respectively, as given below:
Then, the joint posterior distribution for all parameters will be
In the above equation,
-
1.
\( p(\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T) \) is the prior distribution for all parameters. We assume the priors of tree selection, variable selection parameters and variance selection parameters are independent, i.e., \( p\left( {\varvec{\gamma},\varvec{\beta},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T} \right) = p\left( T \right)p\left( {\varvec{\gamma},\varvec{\beta}} \right)p(I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T) \). We also assume prior to variable selection, \( p\left( {\varvec{\gamma},\varvec{\beta}} \right), \) to satisfy \( p\left( {\varvec{\gamma},\varvec{\beta}} \right) = p\left(\varvec{\gamma}\right)p\left( {\varvec{\beta}|\varvec{\gamma}} \right) \). \( p(\gamma ) \) follows a non-informative prior, and for each \( i \), \( \beta_{i} |\gamma_{i} = \left( {1 - \gamma_{i} } \right)N\left( {\hat{\mu }, S} \right) + \gamma_{i} N(0, \tau^{2} ) \), while \( \hat{\mu }, S, \tau^{2} \) are predefined parameters.
-
2.
\( f(\varvec{Y}|\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{X}) \) is the probability density function:
Equation (21.3) is difficult to analyze. However, \( \varvec{\beta} \) can be integrated out, which significantly simplifies the calculation. Consequently, we only need to consider the posterior distribution \( f(\varvec{\gamma},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T|Y,X) \). After the posterior distribution is obtained, \( f(\varvec{\beta}|\varvec{\gamma},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,Y,X) \) follows multivariate normal distribution.
Let \( \varvec{X}\left(\varvec{\gamma}\right) \) be columns of \( \varvec{X} \) with \( \gamma_{j} = 1 \) and \( \Sigma ^{'} =\Sigma ^{ '} \left( {T, I_{V} ,\lambda , \kappa ,\sigma^{2} } \right) = (\frac{1}{{\sigma^{2} }}\varvec{X}\left(\varvec{\gamma}\right)^{T}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right) + \frac{1}{{\tau^{2} }}\varvec{I}) \), then the posterior distribution can be simplified to:
where \( A_{1} = \varvec{Y}^{'}\Sigma ^{ - 1} \varvec{Y} - \frac{1}{{\sigma_{\lambda }^{2} }}(y^{'}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right)\Sigma ^{'} \varvec{X}\left(\varvec{\gamma}\right)^{'}\Sigma ^{ - 1} \varvec{Y}),\) and \( A_{2} = \varvec{Y}^{'}\Sigma ^{ - 1} \varvec{Y} - \frac{1}{{\sigma_{\kappa }^{2} }}(y^{'}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right)\Sigma ^{'} \varvec{X}\left(\varvec{\gamma}\right)^{'}\Sigma ^{ - 1} \varvec{Y}). \)
The posterior distribution from Eq. (21.4) is difficult to obtain; hence, we generate posterior samples using MCMC (Liu 2003) to select the optimal model. Gibbs sampling will be used to get the posterior samples. Gibbs sampling is an algorithm that can generate a sequence of samples from the joint probability distribution of two or more random variables. In each iteration of Gibbs sampling, we use the following procedure to obtain posterior samples:
-
1.
Simulate \( T_{k} \) from \( \{ T_{1} , \ldots ,T_{M} \} \);
-
2.
Simulate \( \varvec{\gamma} \) from \( f(\varvec{\gamma}|I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}) \);
-
3.
Simulate \( I_{V} \) from \( f(I_{V} |{\varvec{\upgamma}},\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}) \);
-
4.
Simulate \( \lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) from \( f(\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} |{\varvec{\upgamma}},I_{V} ,T,\varvec{Y},\varvec{X}) \);
-
5.
Simulate \( \varvec{\beta} \) from \( f\left( {\varvec{\beta}|{\varvec{\upgamma}},\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}} \right). \)
Since \( \varvec{\gamma} \) and \( I_{V} \) in Step 1 follow a Bernoulli distribution, the posterior sample can be directly obtained. In Step 4, \( \lambda , \kappa \) can be obtained by the Metropolis–Hasting Algorithm (Hastings 1970) and \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) from the inverse gamma distribution. \( \varvec{\beta} \) in Step 5 follows a multivariate normal distribution.
After N posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \{ T^{(1)} , T^{(2)} , \ldots , T^{(N)} \} \) have been obtained, we are interested in which model should be selected, which can be achieved via different criteria and goals:
-
1.
Model with the highest posterior probability
Let \( P\left( {M_{i} |\varvec{X}, \varvec{Y}} \right), i = 1,2 \ldots ,2^{m + 1} \) be the posterior probability of \( i{\text{th}} \) candidate models. \( P\left( {M_{i} |\varvec{X}, \varvec{Y}} \right) \) is given by the percentage of model \( i \) in the posterior samples. The one with the highest posterior probability can be selected as the optimal model:
-
2.
Inclusion probability for variables (model selection)
The inclusion probability of \( j{\text{th}} \) variable can be defined as \( P(\gamma_{j} = 1|X, Y ) \), which is a marginal probability across all posterior samples. This probability can be estimated by \( P\left( {\gamma_{j} = 1 |X, Y } \right) = \frac{{\mathop {\sum }\nolimits \gamma_{j}^{(k)} }}{N} \)
-
3.
Probability of the variance structure
The probability of \( \lambda \) model and \( \kappa \) model, \( P(I_{V} = 1|\varvec{X}, \varvec{Y} ) \), can be obtained through \( P\left( {I_{V} = 1 |\varvec{X}, \varvec{Y} } \right) = \frac{{\mathop {\sum }\nolimits I_{V}^{(k)} }}{N} \)
-
4.
Inference on regression coefficients
For a specific candidate model \( M_{i} \), the inference on parameters for \( M_{i} \) can be directly obtained from posterior samples for \( M_{i} \). Moreover, estimation of effect size in general for a certain covariates can be obtained through Bayesian model averaging (BMA) (O’Hara and Sillanpaay 2009). For example, \( \beta_{i} \), which is effect size for the \( i{\text{th}} \) covariates, can be estimated as posterior mean:
where \( E_{{M_{k} }} (\beta_{i} ) \) is the average of posterior sample \( \beta_{i} \) for \( M_{k} \) model. The above estimator is actually the general mean of posterior sample for \( \beta_{i}^{(k)} \). So we can use
as the estimator for the variance of \( \hat{\beta }_{i} \).
-
Model Checking
Bayesian model checking (Gelman 2004) can be used to check whether the model is consistent with the data. Consider data \( \varvec{Y},\varvec{X} \) and corresponding posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \{ T^{(1)} , T^{(2)} , \ldots , T^{(N)} \} \). Under the assumption of a linear model, we can use each posterior sample to generate one predicted (i.e., “fake”) \( \varvec{Y}^{(k)} \) given \( \{ \varvec{X}, T^{\left( k \right)} ,\Theta _{1}^{\left( k \right)} ,\Theta _{2}^{(k)} \} \) in the following way:
where \( \Sigma ^{\left( k \right)} =\Sigma (T^{\left( k \right)} , I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{(k)} ) \). So for each posterior sample, one fake \( \varvec{Y}^{(k)} \) can be obtained. A predefined function \( z^{(k)} = f(\varvec{Y}^{(k)} ) \) with \( \{ \varvec{Y}^{(k)} , = 1,2, \ldots ,N\} \) can be obtained and compared to \( z_{C} = f(\varvec{Y}) \) obtained through real data. With comparison between \( \{ z^{(k)} \} \) and \( z_{C} \), the validity of the model is evaluated.
The logic of model checking is that if the model is valid, then the generated fake \( {\mathbf{Y}} \)s should be statistically similar to the true observed \( \varvec{Y} \). The choice of function \( f \) depends on the dataset and model we have used. However, there are several commonly used \( f \) functions, e.g., variance (\( z^{(k)} = {\text{var}}(\varvec{Y}^{(k)} ) \)) and median (\( z^{(k)} = {\text{median}}(\varvec{Y}^{(k)} ) \)). Each time, we check \( z_{C} \) against the distribution of \( \{ z^{(k)} \} \) and two-sided p-values will be identified. If p-value is smaller than 0.05, we will conclude that the data are not consistent with the model.
Prediction for an unknown response (Gelman 2004) for a species is also available in this Bayesian framework, for example, if one wishes to predict a value for a species that has not yet been studied or to investigate whether a particular species deviates from expectations of the evolutionary model (Organ et al. 2011). Consider species \( n_{1} \) with known tree \( T_{\text{new}} \) and explanatory variable \( \varvec{X}_{\text{new}} \), but response variable \( \varvec{Y}_{\text{new}} \) is missing. Assume posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \left\{ {T^{\left( 1 \right)} , T^{\left( 2 \right)} , \ldots , T^{\left( N \right)} } \right\} \) are obtained, the joint distribution of \( \left( {\varvec{Y}^{\left( k \right)} ,\varvec{Y}_{\text{new}}^{\left( k \right)} } \right)^{T} \) given \( \varvec{X}_{\text{new}} ,T_{\text{new}} ,\Theta _{1}^{\left( k \right)} ,\Theta _{2}^{\left( k \right)} \) will be:
for each posterior sample. Let \( \Sigma _{\text{new}}^{(k)} =\Sigma \left( {T_{\text{new}} ,I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{(k)} } \right) \), then the covariance matrix for combined tree will satisfy:
Since we have already observed y, the distribution of \( y_{\text{new}}^{\left( k \right)} \) can be obtained through a conditional normal distribution:
while
So for each posterior sample, one simulated \( \varvec{Y}_{\text{new}}^{(k)} \) can be obtained. Then, we can use the median and variance of predictive draws \( \{ \varvec{Y}_{\text{new}}^{(k)} , k = 1, 2, \ldots ,N\} \) to make predictions for values of the response variable in the new species. If the observed value for the species falls outside of, for example, the 95 % credible interval of predictions, one might infer that an exceptional amount of evolutionary change has occurred.
-
2.
Simulation Test of Method Implemented in BayesModelS
We use simulated data to evaluate the performance of BayesModelS, focusing on estimation of parameters (but not prediction). Comparisons between our procedure and stepwise regression were conducted. For each dataset, we simulated predictor variables \( \varvec{X} \) and response variable \( \varvec{Y} \) with known associations among the variables on a single phylogeny taken from a posterior distribution of 100 phylogenies for 87 primate species. The variables for each species are independently and identically distributed according to \( N(0, 1) \). For BayesModelS, we then ran analyses across 100 trees. For stepwise regression, we used a single tree, which was identical to the tree used to generate the data.
Two different sets of simulated data were used. The first dataset is used to check whether Bayesian variable selection correctly identifies the variables to include in the statistical model, as compared to stepwise regression. The inclusion posterior probability of significant and insignificant effects was also evaluated. Consider a regression model with 10 covariates. The coefficients for covariates will be assumed to follow the distribution:
while \( \mu ,\sigma_{1}^{2} ,\sigma_{2}^{2} \) are predefined as \( 1,0 .1,0 .01 \), respectively. \( I_{S} \) is an indicator of whether or not this effect is active (i.e., nonzero). The response variable \( \varvec{y} \) can be simulated from Eq. (21.1). Different dataset with \( I_{V} = 1 \), \( \lambda /\kappa = {\text{Unif}}\;[0, 1] \), and \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} = 0.01, 0.02, 0.03 \) will be used.
In reality, sometimes the true regression coefficients are neither zero nor large (O’Hara and Sillanpaay 2009), as in the previous dataset. The sizes of coefficients can be tapered toward zero. In this part, we consider a regression framework similar to O’Hara and Sillanpaay (2009), with the following regression model:
For simulations, known values of \( \alpha = { \log }(10) \) and \( \sigma_{\lambda }^{2} , \sigma_{\kappa }^{2} = 0.01,0.02, 0.03 \) were used. The covariate values, the \( x_{ij} \)’s, were simulated independently and drawn from a standard normal distribution, \( N(0,1) \). We also assume \( m = 21 \) and \( \varvec{ \in }\sim \mathcal{N} (0, \varSigma (T, 1, 0.5, 0.5)) \) or \( \varvec{ \in }\sim\mathcal {N} (0, \varSigma (T, 0, 0.5, 0.5)) \), for the models of \( \lambda \) and \( \kappa \), respectively. The regression coefficients, \( \theta_{j} \), were generated as equal distance between \( a - bk \) and \( a + bk \), while \( a = 0, b = 0.05 \). Twenty datasets were generated for \( k = 1, 2, \ldots , 20 \).
We used several performance measures to evaluate BayesModelS. We checked whether BayesModelS can successfully identify the correct model, identified as the model with the highest posterior probability. For the stepwise regression, the optimal model was chosen using both forward and backward stepwise procedures. Repeated simulations were conducted to check the percentage of time the two methods identify the correct model.
Moreover, median of inclusion probability for each covariate in Bayesian Model Selection was also evaluated. This is compared to the percentage of inclusion for each covariate of stepwise regression through repeated simulations. We do the following for 500 times. Each time, we use a tree to generate data. Then, we use Bayesian method and stepwise regression to estimate the correct model for these data. Since we know the true model, we know whether this is right or wrong for the two methods. We assess the statistical performance of BayesModelS and stepwise regression from this set of results.
The percentage of time each method can identify correct model with the simulated data can be found in the following Fig. 21.9
In more than 90 % of the simulations, the Bayesian Model Selection procedure identified the correct model, regardless of the \( \sigma_{\lambda }^{2} /\sigma_{\kappa }^{2} \) value. It is worth noting that stepwise regression performed well when the number of significant effects is high. When the number of significant effects is low, stepwise regression performs poorly due to a high Type I error rate (Mundry and Nunn 2009).
Next, model checking and prediction with Bayesian Model Selection were conducted. One fixed sample from Data 2 was simulated with \( I_{V} = 1 \), \( \lambda = 0.5, \sigma_{\lambda }^{2} = 0.1, k = 20 \). We used four functions to check validity of the model, mean, variance, median, and range. The checking result can be found in Fig. 21.10. We find that the model is consistent with the data, which is not surprising since the data are generated from line model.
Finally, we used BayesModelS to predict unknown species in a simulation context. Each time, one species was identified as “missing” and then the predict() function was used to predict this species based on the remaining 86 species. The predictive sample was compared to the true response, as shown in Fig. 21.11. We can find that the true response of most species is within 95 % confidence interval of prediction, which means Bayesian Model Selection can effectively make prediction on unknown species.
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nunn, C.L., Zhu, L. (2014). Phylogenetic Prediction to Identify “Evolutionary Singularities”. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-662-43550-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43549-6
Online ISBN: 978-3-662-43550-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)