Bayesian Analysis of ANOVA and Mixed Models on the Log-Transformed Response Variable

The analysis of variance, and mixed models in general, are popular tools for analyzing experimental data in psychology. Bayesian inference for these models is gaining popularity as it allows to easily handle complex experimental designs and data dependence structures. When working on the log of the response variable, the use of standard priors for the variance parameters can create inferential problems and namely the non-existence of posterior moments of parameters and predictive distributions in the original scale of the data. The use of the generalized inverse Gaussian distributions with a careful choice of the hyper-parameters is proposed as a general purpose option for priors on variance parameters. Theoretical and simulations results motivate the proposal. A software package that implements the analysis is also discussed. As the log-transformation of the response variable is often applied when modelling response times, an empirical data analysis in this field is reported. Supplementary Information The online version contains supplementary material available at 10.1007/s11336-021-09769-y.

This document contains the supplementary material to the paper Bayesian analysis of ANOVA and mixed models on the log-transformed response variable and it is organized as follows.In section S1 we complement the discussion on the choice of prior specification for the hyper-parameters γ contained in section 4.1 of the main paper.In section S2, the minimum MSE estimator conditioned to the variance components of the overall mean θ m is derived and its connection to the Bayesian framework is explained.This quantity is used as benchmark in the simulation study.In Section S3, some additional tables concerning the results of the simulation discussed in Section 5 of the paper are reported.Section S4 contains an additional simulation study in which covariates are included in the model and the frequentist properties of the posterior predictive distribution are investigated.Section S5 reports the information about the convergence diagnostics of the MCMC algorithm used to fit the models compared in the application of Section 6.Eventually, the proof of Corollary 1 and some software details useful to estimate models with dependent random effects are contained in Section S6.

S1 A simulation exercise for the choice of γ
The conditions on the existence of moments in theorem 1 are expressed as lower bounds for γ parameters.In section 4.1, we suggest for the γs to set a priori values close, but somewhat larger, than the lower bounds stated in the theorem.The reason is that a value of γ very close to the lower bound would cause numerical instability in the estimation of moments from the relevant posteriors (technically the integral is finite, but very large).Specifically, provided r is the moment order in whose finiteness we are interested, we suggest to set γ to the lower bound that warrants the existence of the r + c moment where c is a small constant.The aim of this simulation exercise is to show why setting c > 0 is necessary and to justify that c ≥ 0.5 is a sensible choice.To stay safe on the numerical stability side, in the simulations and applications of the main paper we set c = 1.
To the purposes of this section, we consider a single scenario from the simulation study described in section 5 (specifically: n j = 2, σ 2 = 0.5, φ = 1).We focus on the functional θ m .The γ m parameter of prior ( 16) is fixed as to guarantee the existence of the r + c moment with r = 2. Three different choices of c, i.e. c = 0.01, 0.5, 1 are compared.Figure S1 reports the Monte Carlo distributions (based on B = 10000 replicates) of posterior standard deviations sd(θ m |w).The numerical instability associated to very small c, i.e. c = 0.01, is apparent from the heavy tail of

S2 Minimum MSE estimator conditioned to the variance components
In order to have a complete characterization of the estimation problem, a useful finding might be the minimum MSE Bayes estimator, conditioned with respect to the variance components.It is the parallel result of the one by Zellner (1971) for the log-normal mean.Even if the deduced estimator could be of little practical interest, it might represent an useful benchmark for the considered methods in the simulation study.For computational easiness, the one-way random effect model (9) in the balanced case (i.e.n j = n g , ∀j) is considered.Moreover, we include in the model only a general mean term x T ij β = µ, ∀i, j.Assuming the variance components σ 2 and τ 2 as known, the only unknown parameter is the global mean in the log-scale µ.Similarly to Zellner (1971), the research of an optimal conditional estimator is restricted to the class of estimators θ * m = exp { w} k.The main result and its relationship with the Bayesian estimation is contained in the following theorem.
Proposition 1. Considering the estimators of the functional θ m = exp{µ + 2 −1 (σ 2 + τ 2 )} that consider σ 2 and τ 2 as known and are included in the class: then the one that minimizes the frequentist MSE is: Furthermore, it coincides with the conditioned Bayes estimator under the prior p(µ) ∝ 1 that minimizes the relative quadratic loss function.
Proof.Recalling that w ∼ N µ, σ 2 n −1 + τ 2 m −1 , the MSE of the considered class of estimator is: where c is a constant.The quantity is minimized when: Starting from the fact that: the expression of the Bayes estimator under relative quadratic loss can be easily derived.
Even if it is not proved to be an optimal estimator, its use for benchmarking purposes appears to be largely justified by the good frequentist properties that the Bayes estimator under relative quadratic loss has in the log-normal estimation framework.The formal result is presented in the following proposition.
Proposition 2. The Bayes estimator of θ c (v j ) conditioned with respect to the variance components under the prior p(µ) ∝ 1 that minimizes the relative quadratic loss function is: Proof.To obtain the estimator, the distribution of θ c (v l )|σ 2 , τ 2 , w must be deduced removing the conditioning on v from the marginal posterior: Setting the value t j = µ + v j , the following result can be obtained Recalling that the Bayes estimator under relative quadratic loss in a log-normal context is: the final result is obtained by substitution.

S3 Additional tables on simulations
The tables reported in this section refer to the simulation exercise of section 5 of the paper.In the first place, tables S1 and S2 contain the results for posterior means of θ m and θ c (v j ) for the scenarios characterized by n j = 5, thereby complementing tables 1 and 2. Results concerning the frequentist properties of the credible intervals are showed in table S3 for the overall mean predictor and S4 for the group means.
Eventually, the results concerning three further prior specifications with respect to those mentioned in the simulation design of section 5 are reported in tables S5 -S8.Specifically, the following estimators are considered: a-b) the posterior means of θ m and θ c (v j ) when priors are: and: where that will be labelled as θIG 2 m and θIG 2 c (v j ).This latter set of prior specifications complements the comparison with the family of inverse-gamma priors for the variance parameters, as this specification is very popular in applications.
Table S2: RABias and RRMSE for the considered estimators of the group-specific expectations in the different scenarios with n g = 5.

S4 Simulation study: evaluations of effects and predictions
In this section we carry out an additional simulation study in which the data generating process is characterized by the presence of a continuous covariate.We have two main aims: exploring if different priors on the variance components change the frequentist properties of the regression coefficients' estimates, and evaluating the performances of the posterior predictive distributions.
As in section 5, the results obtained under the following priors for the variance components are compared: GIG(1, 0.01, γ m ), IG(1, 1), and IG(0.001, 0.001), where γ m is fixed to fulfill the condition iii) of theorem 1, computing the leverage for each vector xk .Diffuse priors on the regression coefficients are set to complete the model specification.
The estimates of β 1 are compared in terms of bias, RMSE, frequentist coverage, and average posterior intervals width in tables S9 and S10.The results about the regression coefficient exhibit minor changes from one prior specification to another, whereas some problems emerge for the intervals under the IG(1, 1) prior: they are significantly larger than those obtained with the other priors, affected by over-coverage especially when n j = 2.
Eventually, in figure S2 results (bias, log RMSE, frequentist coverage, and average width) concerning the predictions at each covariate values x1,k are reported for the different scenarios.Focusing Table S3: Frequentist coverage and width for the credible intervals of θ m in the different scenarios.the result (A3) still holds since the (A1) can be used after some algebra: lim Focusing on condition ii), the result in (A4) can also be retrieved with the considered structure of D since: lim This is due to the fact that the limit of all the off-diagonal entries is 0 (lim τ 2 0 →+∞ ρτ 1 /τ 0 = 0 and lim τ 2 1 →+∞ ρτ 0 /τ 1 = 0), and the diagonal structure is equal to those in Theorem 1.

S6.2 Stan code to specify the GIG distribution
Unfortunately, the GIG distribution is not implemented in the Stan library.However, if a model with correlated random effects needs to be fitted, the following statement before the model syntax can be included: functions{ // GIG prior: only integer lambda real GIG_lpdf(real y, int lambda, real delta, real gamma){ real log_p; log_p = -lambda * log(gamma / delta) -log(2.0)-log(modified_bessel_second_kind(lambda, delta * gamma)) + (lambda -1.0) * log(y) -0.5 * (delta * delta / y + gamma * gamma * y); return(log_p); } } Then, for instance, if the variance component tau2 0 is declared in the parameters block, then a GIG prior can be specified in the model block as follows: tau2_0 ~GIG(lambda, delta, gamma); noting that only integer values for the parameter lambda can be fixed.

Figure S1 :
Figure S1: Monte Carlo distributions of the posterior standard deviation of θ m under different choices of c.
The predictors will be denoted as θGIG 0.001 m , θGIG 0.001 c (v j ) for the first prior setting and θGIG 0.1 m , θGIG 0.1 c (v j ) for the second one.The aim of these additional simulations is to assess the prior sensitivity of posterior distributions with respect to alternative choices of the GIG scale parameter δ. c) the posterior means of θ m and θ c (v j ) when priors are:

Figure S2 :
Figure S2: Bias, log RMSE, frequentist coverage, and average width of the predictions at different covariate values x1,k .

Figure S3 :
Figure S3: Plots concerning the convergence of the MCMC algorithm for the basic parameters the model fitted under GIG priors on the reduced dataset.

Table S1 :
Bias and RMSE for the considered estimators of θ m in the different scenarios with n g = 5.

Table S4 :
Average frequentist coverage and average width for the credible intervals of group means θ c (v j ).

Table S5 :
Bias and RMSE for the considered estimators of θ m in the different scenarios.

Table S6 :
Frequentist coverage and width for the credible intervals of θ m in the different scenarios.