Bayesian A-optimal two-phase designs with a single blocking factor in each phase

Two-phase experiments are widely used in many areas of science (e.g., agriculture, industrial engineering, food processing, etc.). For example, consider a two-phase experiment in plant breeding. Often, the first phase of this experiment is run in a field involving several blocks. The samples obtained from the first phase are then analyzed in several machines (or days, etc.) in a laboratory in the second phase. There might be field-block-to-field-block and machine-to-machine (or day-to-day, etc.) variation. Thus, it is practical to consider these sources of variation as blocking factors. Clearly, there are two possible strategies to analyze this kind of two-phase experiment, i.e., blocks are treated as fixed or random. While there are a few studies regarding fixed block effects, there are still a limited number of studies with random block effects and when information of block effects is uncertain. Hence, it is beneficial to consider a Bayesian approach to design for such an experiment, which is the main goal of this work. In this paper, we construct a design for a two-phase experiment that has a single treatment factor, a single blocking factor in each phase, and a response that can only be observed in the second phase.


Introduction
Several kinds of experiments in agriculture, industry, food science, or bio-engineering are performed in two phases. For example, consider an experiment in plant breeding where the first phase is run in a field and the second phase is run in a laboratory. The samples obtained from the field are analyzed in the laboratory for further evaluation. Commonly, the first phase is done in multiple field blocks, while the second phase is performed with multiple machines or is performed in several days. In this example, it can be seen that the effects of interest (e.g., varieties or treatments) cannot be assessed in the first phase, but must be evaluated in the second phase. We call designs for such experiments two-phase designs (Brien 1983;Curnow 1959;McIntyre 1955;Wood et al. 1988). Furthermore, we expect that there is field-block-to-field-block B Nha Vo-Thanh nha.vo-thanh@uni-hohenheim.de Hans-Peter Piepho hans-peter.piepho@uni-hohenheim.de 1 Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany and machine-to-machine (or day-to-day, etc.) variation. In experimental jargon, these sources of variation are called blocking factors and the factors of interest are called treatment factors. In this example, each phase has its own block structure and these must be combined to form the proper analysis. Therefore, optimal two-phase designs must take the treatment factor(s) and the two blocking factors into account simultaneously.
It is well known that optimal designs depend on the statistical linear (or nonlinear) model envisioned for analysis (Atkinson et al. 2007). In the example above, two possible analytical strategies could be considered. With both strategies, the effects of interest are commonly considered as fixed, while different assumptions are made for the effects of the two blocking factors. In particular, the first one assumes the effects of blocks in both phases as fixed. This model is referred to as a linear fixed-effects model. The second one assumes the effects of blocks in both phases as random. This model is referred to as a linear mixed model. Under this model, the variances of random block effects and random residual errors are called variance components (Searle et al. 2009). In plant breeding, the second analytical strategy is commonly used because (i) the numbers of blocks in both phases might be large enough for the block variances to be estimated with sufficient accuracy, and (ii) block effects in both phases might be small. Hence, there may be substantial gains to be achieved by recovering the inter-block information (Yates 1939). The second strategy will be our focus in this paper.
There are several types of two-phase experiments in the current literature. For example, Jarrett and Ruggiero (2008) considered an application of two-phase experiments in gene expression microarrays, while Brien (2017) showed several applications of two-phase experiments, e.g., in plant pathology, greenhouse chamber experimentation, product storage, sensory evaluation, and laboratory analysis of crops. These applications involve single or multiple treatment factors, and single or multiple blocking factors in each phase. We refer to these two papers for those who are unfamiliar with applications of two-phase experiments in practice. In this paper, we limit our attention to a type of two-phase experiment where there is only a single treatment factor and a single blocking factor in each phase, which will be defined in more detail in Sect. 2.
In the current literature, there are only a few studies regarding constructing optimal two-phase designs under a linear fixed-effects model and a linear mixed model (Smith et al. 2006(Smith et al. , 2015Brien 2019;Vo-Thanh et al. 2022). When constructing optimal two-phase designs under a linear mixed model, we are often required to specify values of variance components, see Smith et al. (2006). Thus two-phase designs found under the linear mixed model are then optimal for these given values of variance components. In practice, however, these values are rarely known exactly before analyzing the data collected from an experiment, even though it may be possible to state a plausible range of values before an experiment. Furthermore, optimal designs found at given values of variance components might not be optimal at other values of variance components (Shah and Sinha 1989, Chapter 5). Hence, it is practical to construct optimal two-phase designs for a range of plausible values of variance components, referred to as robust two-phase designs. This task will be the focus of our work. To the best of our knowledge, there are no studies regarding finding such robust two-phase designs.
Two-phase designs often have complex blocking structures (e.g., non-orthogonal) when the block sizes in both phases are different. This will make the problem of searching for optimal two-phase designs more challenging. Therefore, it is advantageous to identify a suitable combined blocking pattern in both phases. Then the question is how to find a good allocation of treatments to this pattern. Vo-Thanh et al. (2022) studied extensively a nested row-column structure and showed that this pattern is indeed good for two-phase designs under a linear fixed-effects model. They have not, however, considered this approach for constructing two-phase designs under a linear mixed model.
Studying robust two-phase designs requires a prior knowledge of variance components. The prior knowledge could be based on a well-educated guess or historical data of twophase experiments. Here we assume that we do not have specific values for the variance components and we want to construct robust two-phase designs from scratch. Therefore, we consider various prior distributions for the variance components (Mylona et al. 2014;Goos and Mylona 2018), implementing these in a Bayesian framework (see a detailed review in Chaloner and Verdinelli (1995)). One of the major challenges with the Bayesian approach is that it is computationally expensive to get a good approximation for a Bayesian optimality criterion. Often, evaluating this requires integral techniques. Several works have utilized a quadrature approach to approximate the Bayesian criterion for studying several design problems such as designs with nonlinear models, and blocked and split-plot designs with linear mixed models (Gotwalt 2010;Gotwalt et al. 2009;Mylona et al. 2014). In this paper, we will use this approach to get a good approximation for the Bayesian criterion.
Several issues are studied in this paper. First, we propose a simple expression for computing the treatment information matrix for two-phase designs under a linear mixed model and a nested row-column pattern. Second, we derive an update formula to compute a generalized inverse of the treatment information matrix when finding optimal two-phase designs. Third, an overall efficiency factor and a Bayesian criterion are proposed to rank a two-phase design for given values of variance components and for a range of plausible values of variance components, respectively. From the literature, we will study three prior distributions, i.e., uniform, log-normal, and half-Cauchy distributions for variance components. We propose an approximation to compute the Bayesian criterion which is based on quadrature rules. We compare our proposed approximation with existing methods used for computing multidimensional integrals. We utilize the search framework proposed in Vo-Thanh and Piepho (2022) to find Bayesian optimal two-phase designs. We implement our work in a package for given values of variance components as well as for a range of plausible values of variance components with customized weights assigned for given values of the variance components. Finally, we derive two useful theorems when considering two-phase designs that can be constructed from balanced incomplete block designs.
Section 2 reviews a linear mixed model and proposes a simple expression for the treatment information matrix of two-phase designs under this model and the nested rowcolumn pattern. Section 3.1 proposes an overall efficiency factor to rank a two-phase design for given values of variance components. This is then extended to a Bayesian criterion which ranks two-phase designs for a range of plausible values of the variance components (Sect. 3.2). Section 3.3 considers several choices of prior distribution. Section 4 proposes several approximations for the Bayesian criterion computed under the three prior distributions mentioned above. Section 5 evaluates our proposed approximation and compares this with existing methods. In Sect. 6, we briefly describe a general search algorithm to find optimal Bayesian two-phase designs. In Sect. 7, we illustrate the search algorithm for various examples. Section 8 considers two-phase designs that can be found from balanced incomplete block designs and derives two useful theorems under this case. Section 9 concludes the paper along with suggestions for potential extensions of our work.

Linear model and information matrix
Suppose that we wish to arrange v treatments with r replications in b 1 blocks of size k 1 in the first phase (Phase 1). In the second phase (Phase 2), the n = vr samples are then arranged in b 2 blocks of size k 2 . The major challenge is to find a good allocation of treatments to blocks in both phases. In the following sections, we consider a linear model and an information matrix for this design problem.

Linear model
The linear model for our design problem is written as: where y is an n × 1 response vector, 1 n is an n × 1 vector of ones, μ is a general mean, X is an n × v design matrix for treatment effects, τ is a v × 1 vector of unknown treatment effects, Z 1 and Z 2 are n × b 1 and n × b 2 design matrices for block effects in both phases, β 1 and β 2 are b 1 × 1 and b 2 × 1 vectors of unknown block effects in both phases, and is an n × 1 vector of unknown random residual errors. We assume that the expected value of is zero and the covariance matrix of is var( ) = σ 2 0 I n , where σ 2 0 is a constant variance of responses and I n is an identity matrix of size n. In the rest of this paper, we will assume that the effects of blocks in both phases are treated as random, while the treatment effects are treated as fixed. Under this assumption, model (1) is also called a mixed model (Searle et al. 2009, Chapter 1).
As mentioned earlier, Vo-Thanh et al. (2022) studied twophase designs with several blocking patterns under the linear fixed-effects model. They showed that in a few special cases, two-phase designs obtained under a nested row-column pattern are indeed optimal. This still holds for designs obtained under the linear mixed model considered in this paper, where it is indeed sufficient to use a block design in one of the two phases to construct a two-phase design (e.g., see Sect. 7 and Brien et al. 2011, Big with Big). In complex cases, designs obtained under different blocking patterns (e.g., nonorthogonal blocking pattern) yield slightly better designs, compared to those obtained with a nested row-column pattern. However, in general, they showed that efficiencies of two-phase designs obtained under a nested row-column pattern do not differ much from those obtained under different blocking patterns. Thus, in this paper, we will limit our attention to two-phase designs under this pattern. This also implies C γ 1 γ 2 and the inverse of V can be expressed in tidy forms, as will be shown subsequently. Therefore, C γ 1 γ 2 can be easily computed.
The nested row-column pattern is described as follows. Let p = gcd(b 1 , b 2 ), and q = gcd(k 1 , k 2 ), where gcd is the greatest common divisor. After some algebra, we have n = pqb 1 p b 2 p = b 1 k 1 = b 2 k 2 , where b 1 p = b 1 / p, and b 2 p = b 2 / p. Moreover, it is elementary to show that b 1 = pb 1 p , b 2 = pb 2 p , k 1 = qb 2 p , and k 2 = qb 1 p . From these facts, we can use a nested row-column pattern involving p superblocks (rectangles), each of which has b 1 p rows and b 2 p columns. Rows in superblocks are blocks in Phase 1, while columns in superblocks are blocks in Phase 2. The intersection between a row and a column contains q experimental units. This pattern will be illustrated in Sect. 5 (Example I). Under the nested row-column pattern, the design matrices for the block effects in both phases can be represented as: where ⊗ is a Kronecker product. Thus, V can be written as: where J qb 2 p , J b 1 p , and J q are square matrices of ones of sizes qb 2 p , b 1 p , and q, respectively. As shown in Eq. (4), V is a block diagonal matrix, where any main diagonal block is a Thus, to find the inverse of V, it is sufficient to find the inverse of V 0 , which can be expressed according to the following proposition.
Proposition 1 Given a matrix V 0 = γ 1 I b 1 p ⊗ J qb 2 p + γ 2 J b 1 p ⊗ I b 2 p ⊗ J q + I qb 1 p b 2 p , there are two useful results as shown below: (i) The inverse of this matrix is where where ϕ 5 = −ϕ 1 qb 2 p −ϕ 2 qb 1 p +ϕ 3 qb 1 p b 2 p +ϕ 4 n .
The proof of this proposition can be found in Supplementary Section A. From this proposition, C γ 1 γ 2 can be written as: where N 1 = X Z 1 and N 2 = X Z 2 are the treatment-block incidence matrices in Phases 1 and 2, respectively, and ϕ i are identified in Proposition 1 (i = 1, 2, . . . , 5).

Overall efficiency factor
In the theory of optimal designs, it is well known that designs can be ranked by the A-, D-, G-, I -, C-or E-optimality criterion (Atkinson et al. 2007). The choice of a criterion depends on the goal of experiments. For instance, if we are interested in precise estimations of the model parameters, both D-and A-optimality criteria are good choices. If we are interested in precise predictions of the response, both I -and G-optimality criteria are good choices. These aforementioned criteria are prevalent and are implemented in many software packages (e.g., JMP, SAS PROC OPTEX). When we are interested in treatment comparisons and a fixed-effects model is used, there is another criterion, called an overall efficiency factor (Shah and Sinha 1989, Chapter 3). It is computed based on the average variance of the estimated difference between pairs of treatments. It is highly preferable to rank designs for this purpose and has a connection with A-optimality criterion (Shah and Sinha 1989, Section 3.3 of Chapter 3). Shah and Sinha (1989, Chapter 5) extended this criterion to rank block designs and row-column designs under a linear mixed model. There is, however, little theoretical work on this criterion under a linear mixed model. In this paper, we first utilize this criterion to rank two-phase designs for given values of variance components γ = (γ 1 , γ 2 ). Second, we will extend this criterion to cover the case in which we have prior information on γ . We limit our attention to two-phase designs for which rank(C γ ) = v − 1 so that all treatment comparisons are estimable. Given a value of γ , the average variance of pairwise treatment comparisons V γ is computed as: where γ is the Moore-Penrose inverse of C γ , and 1 n γ 1 n = 0 since 1 n C γ 1 n = 0. We rank designs according to Generally, we seek designs that maximize A γ , which is the same as minimizing V γ . From Eqs. (8) and (9), it can be seen that A γ is also the harmonic mean of the non-zero eigenvalues of 1 r C γ . Given this connection, both criteria will lead to A-optimal two-phase designs.

Bayesian criterion
As introduced earlier, optimal designs found at given values of variance components might not be optimal at another set of values of variance components. Therefore, it is advantageous to consider a criterion that can rank designs for a wide range of values of variance components. To achieve this, we develop a Bayesian criterion to rank two-phase designs for all γ ∈ D ⊂ [0; +∞) × [0; +∞) = R 2 + , where R + = [0; +∞). The general Bayesian criterion can be expressed as where u(γ ) is a utility function and p(γ ) is a prior distribution summarizing our knowledge for γ . In other words, A B is the expected utility of a design obtained by integrating the product of the utility function and the prior for γ over D (Chaloner and Verdinelli 1995, see Eq. (1)). Our main goal is to maximize or minimize the expected utility, depending on the choice of the utility function. There are several ways to choose a utility function (entropy). Here, we consider u(γ ) = A γ and p(γ ) = p 1 (γ 1 ) p 2 (γ 2 ), where p 1 (γ 1 ) and p 2 (γ 2 ) are two independent priors for γ 1 and γ 2 , respectively. Thus, Eq. (10) can be written as: Because 0 ≤ ϕ 1 ≤ 1 qb 2 p and 0 ≤ ϕ 2 ≤ 1 qb 1 p , it is convenient to use ϕ = (ϕ 1 , ϕ 2 ) instead of γ = (γ 1 , γ 2 ) when deriving an approximation of A B (see Proposition 1). Thus, Eq. (11) can be written as whereD is obtained by transforming D using γ → ϕ. As before, designs with high A B are preferable.

Choice of prior distribution
In Bayesian inference, selecting prior information is crucial. Generally, there are several ways to choose a prior, such as a noninformative, a highly informative, or a weakly informative prior. As mentioned in the introduction, in this paper, we assume that we do not have a prior knowledge on the variance components. Therefore, we use a noninformative prior (Liu and Wasserman 2022, Chapter 13). Noninformative priors are often selected from known probability distributions (Liu and Wasserman 2022, Chapter 13). The history of objective priors in Bayesian inference can be found in Kass and Wasserman (1996) and Ghosh (2011). Consonni et al. (2018) provided a review on the choice of prior distributions. The selection of probability distributions is often based on improper priors (i.e., a prior probability density that is not integrable), flat priors (or invariant priors), Jeffreys's prior, and conjugate priors (Liu and Wasserman 2022, Chapter 13). Gelman (2006) studied a few prior distributions for variance components for a simple hierarchical model (i.e., one treatment and one blocking factor), i.e., uniform, inversegamma, and half-Cauchy distributions. The author showed that the uniform distribution belongs to the class of flat priors. The inverse-gamma and half-Cauchy distributions belong to the class of conjugate priors, meaning that the posterior distributions of these are also inverse-gamma or half-Cauchy distributions. The author recommended the uniform and half-Cauchy distributions since the inverse-gamma distribution is sensitive when the variance component approaches zero. Furthermore, because the variance ratios (γ ) can only take positive values, another possible prior distribution for γ is the log-normal distribution. For these reasons, in the next section, we will study three prior distributions for variance components, i.e., the uniform, log-normal, and half-Cauchy distributions.

Approximations of the Bayesian criterion
It might be intractable to compute A B because of the complexity of the joint function A ϕ p 1 (ϕ 1 ) p 2 (ϕ 2 ). Therefore, it is useful to propose an approximation for computing A B . There are several ways to approximate A B using numerical integration techniques, e.g., Monte Carlo methods (based on sequences of pseudorandom numbers), quasi-Monte Carlo methods (based on low-discrepancy sequences), Gaussian (or Gauss) quadrature rules (Davis and Rabinowitz 2007;Kythe and Schäferkotter 2004;Stoer and Bulirsch 2013). These techniques compute the numerical value of a definite integral. It is, however, time-consuming to compute A B using Monte Carlo and quasi-Monte Carlo methods since they require a large number of sample points. In contrast, Gauss quadrature rules propose well-chosen sample points and weights corresponding to the sample points so that A B can be computed within a reasonable computing time. Therefore, we will study this technique to get a numerical solution for A B . In the next section, we discuss how to choose quadrature rules for one-dimensional integrals. We then extend this to two-dimensional integrals in which we use a full grid to obtain quadrature rules.

Quadrature rules for one-dimensional integrals
We study how to choose quadrature rules for a onedimensional integral, which is also called a univariate integral. Suppose that we want to compute the integral where l 1 and l 2 are scalars (l 1 < l 2 ), f (x) is a smooth function that can be well approximated by polynomials of degree 2d − 1 on the interval [l 1 , l 2 ], and φ(x) is a nonnegative weight function. In numerical integration, it is well known that where d is the number of sample points used, ω i are quadrature weights and ξ i are the roots of a dth orthogonal polynomial used for the weight function (ξ i is also called abscissa or node) (Stoer and Bulirsch 2013). Depending on a weight function, we can identify quadrature nodes and corresponding quadrature weights. Here, we consider three weight functions, i.e., φ(x) = 1, e −x 2 , or 1 1+x 2 . These weight func- Notice that ξ i are the roots of a dth orthogonal polynomial used for the weight function. (e i ) 1 are the first components of the eigenvectors of Jacobi matrices (Trefethen and Bau 1997, Chapter 37) tions correspond to the three prior distributions studied in this paper. For φ(x) = 1, it is well known that Legendre polynomials can be used to obtain nodes and weights to compute This is called Gauss-Legendre quadrature (Gautschi 2004). For φ(x) = e −x 2 , Hermite polynomials can be used to obtain nodes and weights to compute 1+x 2 , we can apply a combination between Lanczos and Gram-Schmidt procedures to Chebyshev polynomials. Details will be discussed in the next paragraphs. The Legendre, Hermite, and Chebyshev polynomials and their weights are shown in Table 1. Goos and Mylona (2018) also provided an excellent tutorial for using Gauss quadrature rules for one-dimensional integrals for several forms of φ(x), including Gauss-Hermite, Gauss-Stieltjes-Wigert, generalized Gauss-Laguerre, and Gauss-Jacobi quadrature. These techniques were illustrated with split-plot experiments. They did not, however, consider the weight φ(x) = 1 1+x 2 that we will also study in this paper.
Let χ be a random variable that has the probability density function p(x). Here we want to compute the posterior function u(x) p(x)dx, where u(x) is a utility function. In the following, we explain how to apply Gauss-Legendre quadrature, Gauss-Hermite quadrature, and the combination between Lanczos and Gram-Schmidt procedures to the Chebyshev polynomials to compute the posterior functions for the three prior distributions.
is a continuous uniform distribution with the two boundaries l 1 and l 2 (l 1 < l 2 ). Parameters l 1 and l 2 are called hyperparameters. The probability density function p(x) is equal to 1 l 2 −l 1 when l 1 ≤ x ≤ l 2 and zero otherwise. Therefore, our integral is of the form 1 l 2 −l 1 l 2 l 1 u(x)dx. As mentioned above, the Gauss-Legendre quadrature is applied for the interval of [−1 , 1]. So, we need to perform a change of interval to obtain l 2 l 1 u(x)dx as follows: By applying the Gauss-Legendre quadrature rule with d points (ξ i , ω i ), the above equation can be approximated as: Log-normal distributions: Consider χ ∼ LN (μ, ψ 2 ), where μ and ψ are the location parameter and the scale parameter of the log-normal distribution. Under this case, we need to com- By using a change of variable twice and applying the Gauss-Hermite quadrature rule, our integral can be approximated as follows: , and ξ i and ω i are nodes and weights obtained from the Gauss-Hermite quadrature rule. Half-Cauchy distributions: Assume that χ ∼ HC(μ, ψ), where μ is the location parameter (specifying the location of the peak of the distribution), and ψ is the scale parameter which specifies the half-width at half-maximum. Our integral is of the form: One of the possible approaches to approximate this is to use the well-known technique Lanczos with a different inner product, which is similar to the Stieltjes procedure. This approach can be applied to obtain the orthogonal polynomial for an arbitrary weight function, see Olver (2022). In what follows, we provide a brief workflow to obtain d nodes and its weights for an arbitrary weight function φ(x). It is a modification of Gauss quadrature rules. The general workflow of Gauss quadrature rules can be found in textbooks or works on numerical integration (e.g., Golub and Welsch 1969;Kythe and Schäferkotter 2004). Traditionally, this workflow is based on a monomial basis {1, x, x 2 , . . . , x d } with an inner product . defined as p 1 .
where p 1 (x) and p 2 (x) are functions chosen from the monomial basis. Next, these polynomials are orthonormalized to orthogonal polynomials, which can be done using the well-known Gram-Schmidt orthogonalization. Let {q 0 (x), q 1 (x), q 2 (x), . . . , q d (x)} be a set of orthogonal polynomials obtained from the Gram-Schmidt orthogonalization. The Gauss quadrature points are then the roots of q d (x). Furthermore, as is reviewed in, e.g., Chapter 37 of Trefethen and Bau (1997), these roots (and also the quadrature weights) can be obtained by solving a tridiagonal d×d eigenvalue problem whose entries are obtained by a well-known algorithm Lanczos (Lanczos 1950). As mentioned in Johnson (2019), for numerical stability when constructing Gauss quadrature rules computationally for arbitrary weight functions, the author suggested applying the Lanczos process to polynomials in the basis {T 0 , T 1 , T 2 , . . . , T d } of Chebyshev polynomials, after rescaling the interval [l 1 , l 2 ] to [−1, 1], rather than using the monomial basic. We call this Gauss-Chebyshev-Lanczos approach. It is implemented in the Julia QuadGK.jl package (Johnson 2022b). It is used to obtain nodes and weights for half-Cauchy distributions considered in this paper.
By applying the aforementioned technique, our integral can be approximated as follows: where ξ i and ω i are nodes and weights obtained from the Gauss-Chebyshev-Lanczos approach.

Quadrature rules for two-dimensional integrals
We extend the quadrature rules for one-dimensional to twodimensional integrals, which are used to approximate A B . Without loss of generality, we assume that f (x, y) is a smooth function on the domain D = [l 1 , l 2 ] × [l 1 , l 2 ]. This can be accomplished for bounded rectangular domains by a proper change of interval. Suppose that we want to com- y) can be well approximated by polynomials of degree 2d − 1 within D, and φ 1 (x) and φ 2 (y) are nonnegative weight functions. Using the tensor product rule (full-grid), our integral can be written as: where d is the number of sample points used for each dimension, ω 1i and ω 2i are quadrature weights obtained from the first and second dimensions, and ξ i and ξ j are the roots of dth orthogonal polynomials used in the first and second dimensions. Under the tensor product rule, this approximation requires d 2 sample points. Notice that it is possible to use the sparse grid approach, originally proposed by Smolyak (1963) to obtain sample points. However, this approach is recommended only when the number of dimensions is greater than two. Also, under this approach, we recommend using nested quadrature rules instead of Gauss quadrature rules, for example, Fejér and Clenshaw-Curtis. In the rest of this article, we will use the right-hand side of Eq. (18) to approximate A B . The approximated value is denoted asÃ Q B .

Multidimensional integration-benchmark
Another technique to compute a multidimensional integral is based on a multidimensional h-adaptive integration (Berntsen et al. 1991;Genz and Malik 1980), which uses a globally adaptive subdivision strategy. This algorithm recursively partitions the integration domain into smaller subdomains, and then applies the same integration rule to each, until convergence is achieved. It is best suited for a moderate number of dimensions (e.g., < 7), and is superseded for high-dimensional integrals by other methods (e.g., Monte Carlo variants or sparse grids). A general structure of adaptive integration algorithms consists of the following four main steps: (1) choose some subregion(s) from a set of subregions, (2) subdivide the chosen subregion(s), (3) apply the integration rule to the new subregions; update the subregion set, and (4) update global integral and error estimates; check for convergence. Genz and Malik (1980) improved the adaptive subdivision strategy to facilitate parallelization at the subregion level and allow for more than one integrand, proposed several different integration rules that can be selected for use with the algorithm, and proposed a new error estimation method. This approach is implemented in the Julia package HCubature by Johnson (2022a). In the next section, we will use this package to evaluate the performance of our approximated Bayesian criterion proposed in the previous section.

Evaluating approximations of the Bayesian criterion
We evaluate the performance of the approximations of Bayesian criterion based on quadrature rules and HCubature. The approximated value of A B found using HCubature is denoted asÃ H B . For our proposed approximated Bayesian criterion using quadrature rules, up to 20 nodes are used for each dimension, meaning that we approximate A γ up to 2 × 20 − 1 = 39 degrees, which requires the evaluation of up to 400 points to get an approximated value of A B . The approximated value of A B under this approach is denoted asÃ Q B . Let us consider the two-phase experiment that was briefly mentioned in the introduction as follows.
Example I: Suppose that an experimenter wants to test 10 treatments in a field involving 6 blocks in Phase 1, each of which can test 10 treatments. The 60 samples obtained from this phase are then analyzed in a laboratory in Phase 2. Since a machine in the laboratory can only process four samples at a time, we need 15 machines to process all the samples. Because there might be field-block-to-field-block and machine-to-machine variation, it is essential to consider these sources of variation as blocking factors. Following the notation used in Sect. 2, parameters for this experiment are v = 10, r = 6, b 1 = 6, k 1 = 10, b 2 = 15, and k 2 = 4. Under these parameters, a nested row-column pattern has p = gcd(b 1 , b 2 ) = 3 superblocks, and each of which has b 1 p = b 1 / p = 2 rows and b 2 p = b 2 / p = 5 columns. There are q = gcd(k 1 , k 2 ) = 2 experimental units in a cell formed by the intersection of a row and a column (Fig. 1a). First, we generate a random two-phase design for Example I to evaluate approximations of the Bayesian criterion. This design is created by randomly assigning treatments to the nested row-column pattern, see Fig. 1a. Next, we identify hyperparameters for each distribution (i.e., l 1 and l 2 in uniform distributions, μ and ψ in normal and half-Cauchy distributions). These hyperparameters can be based on an educated guess or empirical datasets. Here we suppose that we do not have empirical datasets. Thus, it is reasonable to consider two scenarios for γ i : (1) 0 ≤ γ i ≤ 1 and (2) γ i > 1(i = 1, 2). In the following analysis, we choose [0, 1] and [1, 10] as two boundaries for γ under uniform distributions. We useÃ H B as a benchmark.  Figure 2a shows approximations for A B when the variance ratios γ 1 and γ 2 are small. In this case, the value of A H B is approximated to 0.79090. The approximated values of A B using Gauss-Legendre converge toÃ H B after 5 nodes. When one of the variance ratios is small and the other variance ratio is large, the values ofÃ Q B andÃ H B are slightly smaller than the ones obtained from Fig. 2a. This is clearly shown in Figs. 2b and 2c. When both variance ratios are large, it is expected that the block effects in both phases can be considered as fixed. In this case, the values ofÃ Q B are smallest compared to other values ofÃ Q B shown in Figs. 2ac. Under this analysis, it is sufficient to use 5 nodes to get a good approximated value of A B . This indicates that using Gauss-Legendre reduces not only computing times but also gives a good approximation for A B . This is mainly because it requires a smaller number of sample points and well-chosen sample points, compared to prevalent approaches such as Monte Carlo and quasi-Monte Carlo.
(a) (b) Fig. 1 (a) A random allocation of treatments to a nested row-column pattern for Example I. (b) A Bayesian optimal two-phase design for Example I is found with γ 1 ∼ U (0, 1) and γ 2 ∼ U (0, 1). In both pan-els, rectangles are superblocks. Rows in superblocks are blocks in Phase 1, while columns in superblocks are blocks in Phase 2 Based on these findings, we propose a search framework in the next section to find a Bayesian optimal two-phase design where we use 10 nodes to approximate the Bayesian criterion. As an illustration, a Bayesian optimal two-phase design for Example I found with γ 1 ∼ U(0, 1) and γ 2 ∼ U(0, 1) is shown in Fig. 1b. As shown in this figure, one of the good properties of this design is that treatments have equal replications in each superblock. This property resembles one of the properties of balanced incomplete block designs in which all pairs of treatments occur together within a block an equal number of times (Shah and Sinha 1989). For instance, treatment 1 occurs twice in each superblock. This is opposed to the design shown in Fig. 1a, where treatment 1 occurs unequally in each superblock.
Likewise, we use two log-normal distributions, i.e., LN (0, 1 2 ) and LN (1, 1 2 ), as priors for both variance ratios to evaluate values ofÃ H B andÃ Q B . When γ i ∼ LN (0, 1 2 ), it is expected that the variance ratios are small, while when γ i ∼ LN (1, 1 2 ), it is expected that the variance ratios are large (i = 1, 2). Figure 3 shows various approximated values of A B with the four combinations of the two variance ratios. The structures of this figure and Fig. 2 are identical. Here we use a quantile function (i.e., inverse cumulative distribution function) to get an upper bound for the two distributions with the probability value of 0.999. This avoids the problem when both variance ratios approach infinity (+∞) when using the HCubature approach.
Similar to the uniform distributions, the values ofÃ Q B obtained from the Gauss-Hermite rule converged to the value ofÃ H B with 5 nodes for the four cases considered in Fig. 3. Finally, we considered the Gauss-Legendre rule to get an approximated value of A B where the weight function and A γ are combined as a joint function. This approach also worked quite well, compared to the Gauss-Hermite approach. It converges toÃ H B after 10 nodes.
We consider two half-Cauchy distributions HC(0, 1) and HC(1, 10) as priors for γ . Similar to the uniform and lognormal distributions, these distributions represent the small and large values of variance ratios as discussed before. Here we also use a quantile function to get an upper bound for the two distributions with the probability value of 0.99. This avoids the problem when both variance ratios approach infinity (+∞). Likewise, the quadrature approaches converge toÃ H B after 10 nodes. However, the Gauss-Legendre approach converges more slowly toÃ H B than the Gauss-Chebyshev-Lanczos approach.
Note that it is possible to use the HCubature approach to find Bayesian optimal two-phase designs. However, it is costly to use this approach when searching for a large two-phase design (i.e., a design with a large number of treatments). This is because update formulae cannot be applied under the HCubature approach. For example, we compared computing times of finding Bayesian two-phase designs for Example I with one iteration. Using the HCubature approach, it took 159.21 (s), 1050.56 (s), and 3467.97 (s) to obtain twophase designs for the uniform, log-normal, and half-Cauchy distributions, respectively. However, using the quadrature rules, the average computing time for uniform, log-normal, and half-Cauchy distributions was less than 1 (s).

A search algorithm
In the literature, most search algorithms for comparative experiments focus on variants of simulated annealing (Kirkpatrick et al. 1983) and Tabu search (Glover 1989), see a review in Vo-Thanh and Piepho (2022). Recently, Vo-Thanh and Piepho (2022) considered a variant of hill climbing (Appleby et al. 1961) to obtain optimal row-column designs with a complex blocking structure, and equal or unequal treatment replications. They showed that their proposed algorithm is competitive with the variants of simulated annealing and Tabu search used to search for optimal designs for comparative experiments. In this paper, we utilize their work to obtain Bayesian two-phase designs. The workflow is briefly described as follows.
Generally, a computer search starts with an initial design, which is also a candidate design. We then iteratively improve this design in a number of steps (ρ 0 ). At each step, given a candidate design, a better design can be found via neighbor designs of the current candidate design. A neighbor design is obtained from a given design by applying a swapping mechanism to this design. In this paper, we consider a swapping mechanism where we exchange two treatments in the design matrix X, while fixing the nested row-column pattern (Z). The next candidate design is selected among the neighbor designs generated by performing all possible swaps in X. This selection is based on the accepting and rejecting strategies considered in Vo-Thanh and Piepho (2022), usingÃ Q B as a main search criterion. The selected candidate design will be used in the next iteration. A pseudocode of the search framework is shown in Algorithm 1, while details of this algorithm can be found in Vo-Thanh and Piepho (2022). As pointed out in Vo-Thanh and Piepho (2022), it is essential to use an efficient update formula when computing a generalized inverse of the treatment information matrix before and after interchanging treatments. Here we adapted their strategies to derive an update formula allowing to swap two treatments in the design matrix X, while fixing the design matrix Z (see Supplementary Section B). In the next section, we apply this search framework to find Bayesian optimal two-phase designs.

Bayesian two-phase designs
We illustrate the search framework to find Bayesian optimal two-phase designs for 16 parameter settings shown in Table 2. The first column of this table represents an identifi-cation (ID) for the examples. The remaining columns show the design information as mentioned in Sect. 2. We study the three prior distributions for these 16 design settings. Based on the preliminary results in Sect. 5, we applied Gauss quadrature rules with 10 nodes for each dimension to approximate the Bayesian criterion.
As mentioned in Sect. 5, it is practical to choose a prior distribution for a variance ratio that represents small and large variances. In the following, we use the notation γ ab to represent the choice of the two variance ratios where a and b represent the choices of γ 1 and γ 2 . There are four possible choices of variance ratios considered in this section, i.e., γ −− , γ −+ , γ +− and γ ++ , where − and + represent small and large variance ratios. For instance, γ −+ represents the case where γ 1 is chosen to be small, while γ 2 is chosen to be large.
For each of the three distributions studied in this paper, we applied our proposed algorithm to search for optimal designs with four combinations of two variance ratios as nodes. The y axis shows various approximated values of A B .Ã Q B10 is the value ofÃ Q B found using Gauss-Hermite with 10 nodes discussed above. In total, for each of the three distributions, we obtained 64 Bayesian optimal two-phase designs. Parameters used in the search algorithm are (ρ 0 , ρ 1 , ρ 2 , ρ 3 ) = (400, 100, 400, 1) where ρ 0 is the number of iterations used to search for a design, ρ 1 is the length of a fitness array, ρ 2 is the length of a visited candidate array, ρ 3 is a percentage of neighbor designs. Details of these parameters can be found in Vo-Thanh and Piepho (2022). The implementation of our algorithm can be found in the supplementary material of this paper (see JCode.7z). The algorithm was run under Julia 1.5.3 on Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz and 16GB of memory. The average computing time required to find 64 two-phase designs for each distribution is about 13,690.68 s (≈ 3.8 h). In the following section, we detail our findings. Figure 5 shows three panels that represent various information of the 64 Bayesian two-phase designs obtained from the four combinations (γ −− , γ −+ , γ +− and γ ++ ). Figure 5a shows the values ofÃ H B of the 64 optimal designs found with γ −− , γ −+ , γ +− and γ ++ , where "−" and "+" represent the two uniform distributions U(0, 1) and U (1, 10)  For this case, it is best to confound the blocks in both phases as advocated in Brien et al. (2011, Big with Big). This is achieved under the nested row-column structure. For Example 1, the designs obtained with the four combinations of variance ratios are balanced incomplete block designs. For Examples 3 and 4, the Bayesian optimal two-phase designs are nested designs. Vo-Thanh et al. (2022) recommended constructing a design in a phase with smaller block size first. Then blocks in this resulting design can be grouped to form blocks in the other phase. This grouping is done via the nested row-column pattern. For Example 5, the Bayesian optimal two-phase design is a row-column design with q = 1 (one experimental unit per cell). For Examples 6, 7, and 8, the Bayesian optimal two-phase designs are row-column designs with q > 1 (more than one experimental unit per cell). For Examples 9 and 10, the Bayesian optimal designs are nested row-column designs with q = 1 (one experimental unit per cell). For Examples 11 to 16, the Bayesian optimal designs are nested row-column designs with q > 1 (more than one experimental unit per cell). Figure 5b shows the percentage change ofÃ H B of the Bayesian optimal designs obtained from γ −− and γ ++ . The percentage change is computed as follows. We take the value ofÃ H B of a design obtained from γ ++ and subtract the value ofÃ H B of a design obtained from γ −− . The result is a gain or loss. We take the gain or loss and divide it by the value ofÃ H B of the design obtained from γ ++ . Finally, we multiply the result by 100 to arrive at the percentage change in the design efficiency. As shown in this panel, the percentage change can vary between 0.52% to 20%. As expected, the values ofÃ H B of designs obtained under γ ++ are slightly lower than the values ofÃ H B of designs obtained under γ −− . This indicates there is a substantial benefit when considering an analysis with the linear mixed model, which is often the case in plant breeding. There are seven examples where the gaps are less than 5% (Examples 2,4,5,6,7,8,and 11 Table 2, the block sizes in Examples 9 and 10 in both phases are relatively small, while the numbers of blocks in both phases are large. We conjecture that this might be a reason why we have seen a large gap. Therefore, in this case, we would suggest considering an analysis with random block effects instead of fixed block effects in both phases for the recovery of inter-block information.

Uniform distributions
Suppose that we are also interested in the performance of our designs under the fixed-effects model discussed in the introduction. This is relevant because optimal designs obtained under a mixed model are often disconnected under a fixed-effects model. However, this is not the case with our Bayesian optimal two-phase designs since these designs are optimized for a range of plausible values of variance components. In fact, Fig. 5c shows the percentage change of A ∞ of Bayesian two-phase designs obtained from γ −− and γ ++ , where A ∞ is the design efficiency of a design computed under the fixed-effects model, and A ∞ of the Bayesian two-phase design obtained from γ ++ is used as a reference. As shown in this figure, these designs are indeed connected under the fixed-effects model. Furthermore, the losses of design efficiencies for all but Example 3 are relatively small. This suggests that our Bayesian two-phase designs could be used for analysis with a linear fixed-effects model.

Log-normal distributions
In this section, we consider two log-normal distributions LN (0, 0.5) and LN (1, 0.5) covering the two bounds [0, 1] and [1, 10] for the two variance ratios as indicated before. Likewise, "−" and "+" represent the two log-normal distributions LN (0, 0.5) and LN (1, 0.5), respectively. Figure 6 shows three panels that represent information on the 64 Bayesian two-phase designs obtained from the four scenarios (γ −− , γ −+ , γ +− and γ ++ ). Its structure is analogous to that of Fig. 5. As discussed before, we want to avoid setting the variance ratios to infinity when computing A γ . Therefore, we use a quantile function to get upper bounds when finding the values ofÃ H B , where we choose the value of probability of 0.99. Hence,Ã H B can be evaluated. As shown in Fig. 6a, the values ofÃ H B of the Bayesian optimal twophase designs obtained from the four scenarios range from 0.62485 to 0.96641. For Example 1, we also obtained four balanced incomplete block designs as were obtained with the uniform distributions. As shown in Fig. 6b, its pattern is similar to the pattern of Fig. 5b, where the percentage changes are less than 10% for all but Example 9. When considering the fixed-effects model, all designs are indeed connected as shown in Fig. 6c. For Examples 3, the percentage change (loss) is the largest among other examples, which is around 2.6%. If experimenters accept a minor loss (< 3%), these designs could be used to plan a two-phase experiment when the linear fixed-effects model is of interest.

Half-Cauchy distributions
Two half-Cauchy distributions are studied in this section, i.e., HC(0, 1) and HC(1, 10) as priors for γ . Under this case, "−" and "+" represent HC(0, 1) and HC(1, 10), respectively. Likewise, Fig. 7 shows information on the 64 Bayesian designs found under the two half-Cauchy distributions. The structure of this figure is identical to the structure of Fig. 5. The values ofÃ H B of the Bayesian optimal designs range from 0.60759 to 0.92974. For the percentage changes ( Fig. 7b), there is also a similar pattern as we have seen with those obtained under the uniform and log-normal distributions. The losses for all but Example 9 are less than 10%, while the loss for Example 9 is around 12%. In terms of the linear fixed-effects model, all designs are also connected and the losses are at most 2.76%. Overall, the general picture here is that we obtain consistent results as with the uniform and log-normal distributions.

Bayesian balanced incomplete block designs and their dual designs
As discussed in the previous section, when the block sizes in both phases are identical, it is best to confound blocks in both phases to construct a two-phase design. This is indeed an optimal strategy as shown in Brien et al. (2011). Therefore, in such a case, we can use a block design to construct a two-phase design, which will be our focus in this section. In particular, we focus on a balanced incomplete block design and its dual design, where the dual design is obtained by interchanging the roles of treatments and blocks (Shah and   • a uniform distribution γ ∼ U(l 1 , l 2 ): • a log-normal distribution γ ∼ LN (μ, ψ 2 ): • a half-Cauchy distribution γ ∼ HC(μ, ψ): where λ = r (k−1) v−1 , arctan(x) is the arctangent of x, and ln(x) is the natural logarithm of x.
Theorem 2 Consider a block design with its parameters v, r , b, and k in a single phase, where its dual design with parametersṽ = b,r = k,b = v,k = r is a balanced incomplete block design, andb ≤ṽ (Fisher's inequality). Let γ be the ratio between block and residual error variances. The values of A B computed from the three prior distributions can be characterized as follows: • a uniform distribution γ ∼ U(l 1 , l 2 ): • a log-normal distribution γ ∼ LN (μ, ψ 2 ): • a half-Cauchy distribution γ ∼ HC(μ, ψ): where α 1 = r (v − 1)(b − 1), α 2 = (v − 1)kb(r − 1), and The proofs of Theorems 1 and 2 can be found in Supplementary Sections C and D.

Application
In this section, we apply Theorem 1 to compute A B for Example 1. This is because the design parameters of this example are v = 7, r = 3 b 1 = 7 and k 1 = 3, which satisfy the necessary condition of a balanced incomplete design since λ = r (k 1 − 1)/(v − 1) = 1. From Theorem 1, it is easy to verify that A γ = 1 r (r + (λ−r )γ 1+k 1 γ ) = 3+7γ 3+9γ . The values of A B can be computed according to Eqs. (19)-(21).

Conclusion
In this paper, we presented a general framework to search for Bayesian two-phase designs based on the variant of hill climbing studied in Vo-Thanh and Piepho (2022). We considered three families of distributions, i.e., uniform, log-normal, and half-Cauchy distributions. We considered Gauss quadrature rules to approximate the Bayesian criterion. This reduces the computational expense of searching for Bayesian optimal two-phase designs. Furthermore, we proposed two theorems in the case when two-phase designs can be constructed from balanced incomplete block designs. These theorems can be used as upper bounds for A γ of two-phase designs under the linear mixed model. Such upper bounds have not yet received much attention in the literature.
An important novel feature of our approach is that it can handle Bayesian two-phase designs for various parameter settings and can find an efficient Bayesian two-phase design within a reasonable time. This is because we used the Gauss quadrature rules to approximate the Bayesian criterion where it is adequate to use 10 nodes to get a good approximated value of A B for the three prior distributions considered in this paper. Furthermore, we applied Gauss quadrature rules for an arbitrary weight (based on a half-Cauchy distribution) that has not yet been considered in the current literature on Bayesian two-phase designs.
It is worthy to mention that if we have informative prior information for the variance components, our search framework can also be applied, replacing p(γ ) by the informative prior information and D by the specific domain. This should be quite common in practice, because historical data on previous trials of two-phase experiments as the one that is planned is often available. From this, we should be able to set up an informative prior for variance components. Here, the main bottleneck is the assembly of the historical data in an accessible database, which might require substantial effort in practice.
Even though we have studied several prior distributions, in general, it is not straightforward to propose a recommendation for selecting a right prior distribution in the case where researchers have no specific prior information about the value of variance components. In this case, our recommendation is to use a flat prior distribution (e.g., a uniform distribution) in searching for two-phase optimal designs. Alternatively, we could perform a simulation study where we obtain many designs for various prior distributions. This could be done using our proposed framework. Based on the result, we can select the best design that is optimal for all various prior distributions. It is also acknowledged that an experiment may target multiple responses, in which case a multivariate model could be considered, but such an extension is beyond the scope of this paper.
We suggest several subjects that merit further research. First, it is possible to extend our proposed approach to handle two-phase designs with more than two blocking factors and/or one treatment factor. The first challenge is to identify a blocking structure involving several blocking factors and a treatment structure involving several treatment factors. The second challenge is to derive a simple form for the treatment information matrix so that update formulae can be proposed. This is very crucial to avoid computational burden when searching for a large two-phase design. The last challenge is to approximate a Bayesian criterion. This can be done using quadrature approaches. However, it might take a great amount of time to search for a large twophase design. For example, the current nested row-column pattern can be extended to two-phase designs where there are two blocking factors in Phase 1 and one blocking factor in Phase 2. The remaining challenge here is to derive a simple form so that the treatment information matrix can be computed easily. This could be done by extending Proposition 1. Another example is that when there are two treatment factors and two blocking factors, it might be possible that two treatment factors can be combined using a factorial structure. Then, the challenge is to derive a simple form for the treatment information matrix. Second, our proposed approximation of the Bayesian criterion can be adapted to search for block designs and row-column designs under linear mixed models. Third, even though we limited our attention to three families of prior distributions, the quadrature rules can be applied to different prior distributions. Particularly, the Gauss-Chebyshev-Lanczos approach could be employed here to get nodes and weights for an arbitrary weight function if a quadrature rule for this weight function is unavailable. Fourth, for some design parameters, it might be possible to obtain better two-phase designs with a complex blocking pattern, compared to those obtained under a nested row-column pattern. It will take a great amount of computing time, however, to search for a Bayesian two-phase design in this case. This is because efficient update formulae will not be available. However, this option might be worthy to look at in further research, although we conjecture that the efficiencies of these designs will not differ too much, compared to those obtained under a nested row-column pattern.
In this paper, we have not considered Jeffreys's prior. One of the key features of this prior is that it is invariant under a change of coordinates for the parameter vector γ . This makes it of special interest for use with scale parameters. It is, however, rather complex to directly compute Jeffreys's prior since it requires the square root of the determinant of the Fisher information matrix. Therefore, it would be practical to propose an approximation for this prior. This could be done by using the work of Natarajan and Kass (2000) who considered an approximation for Jeffreys's prior.