Semiparametric Bayesian inference on generalized linear measurement error models

The classical assumption in generalized linear measurement error models (GLMEMs) is that measurement errors (MEs) for covariates are distributed as a fully parametric distribution such as the multivariate normal distribution. This paper uses a centered Dirichlet process mixture model to relax the fully parametric distributional assumption of MEs, and develops a semiparametric Bayesian approach to simultaneously obtain Bayesian estimations of parameters and covariates subject to MEs by combining the stick-breaking prior and the Gibbs sampler together with the Metropolis–Hastings algorithm. Two Bayesian case-deletion diagnostics are proposed to identify influential observations in GLMEMs via the Kullback–Leibler divergence and Cook’s distance. Computationally feasible formulae for evaluating Bayesian case-deletion diagnostics are presented. Several simulation studies and a real example are used to illustrate our proposed methodologies.


Introduction
alized linear measurement error models (GLMEMs), have received a lot of attention in past years. For example, Stefanski and Carroll (1985) developed a bias-adjusted estimator, a functional maximum likelihood estimator and an estimator exploiting the consequences of sufficiency for a logistic regression when covariates were subject to MEs; Stefanski and Carroll (1987) studied parameter estimation in GLM with canonical form when the explanatory vector was measured with an independent normal error; Buzas and Stefanski (1996) investigated instrumental variable estimation in GLMEMs with canonical link functions; Aitkin and Rocci (2002) presented an EM algorithm for maximum likelihood estimation in GLMs with continuous MEs in the explanatory variables; Battauz (2011) developed a Laplace-based estimator for GLMEMs; Battauz and Bellio (2011) proposed a structural analysis for GLMs when some explanatory variables were measured with error and the ME variance was a function of the true variables.
All the above mentioned studies assume that the covariate MEs in GLMEMs are distributed as a fully parametric distribution such as the multivariate normal distribution. However, in some applications, the covariate MEs in GLMEMs may not follow a fully parametric distribution but follow a non-normal distribution such as the skew-normal ) and skew-t ) and bimodal and heavy-tailed distributions (Lachos et al. 2011). Moreover, the violation of the parametric assumption on the covariate MEs may lead to unreasonable or even misleading conclusions. Therefore, it is of practical interest to consider a flexible distributional assumption on the covariate MEs in GLMEMs. The nonparametric method is one of the most widely adopted approaches to specify a flexible probability distribution for random variables or MEs in the Bayesian framework.
The Dirichlet process (DP) prior (Ferguson 1973) is the most popular nonparametric approach to specify a probability distribution for random variables or MEs in the Bayesian framework due to the availability of some efficient computational techniques. The nonparametric method has been successfully used to make statistical inference on various random effects models. For example, see Kleinman and Ibrahim (1998), Dunson (2006), Guha (2008), Lee et al. (2008), Chow et al. (2011), Tang and Duan (2012) and Tang et al. (2014). However, to the best of our knowledge, little work is done on Bayesian analysis of GLMEMs with the covariate MEs following a nonparametric distribution. Hence, this paper develops a semiparametric Bayesian approach to simultaneously obtain Bayesian estimations of parameters and covariates subject to MEs, and proposes two Bayesian case deletion diagnostics to detect the potential influential observations under the centered DP mixture model specification of the covariate MEs in GLMEMs. In this paper, a hybrid algorithm is presented to generate observations required for a Bayesian inference from the posterior distributions of parameters and covariates subject to MEs by combining the stick-breaking prior and the Gibbs sampler (Geman and Geman 1984) together with the Metropolis-Hastings algorithm.
Bayesian case deletion approaches to detect influential observations (or sets of observations) have been proposed for some statistical models such as linear regression models (Carlin and Polson 1991), GLMs (Jackson et al. 2012) and generalized linear mixed models (Fong et al. 2010) based on the Kullback-Leibler divergence (K-L divergence) and the conditional predictive ordinate. But, extending these existing Bayesian case deletion diagnostics to our considered GLMEMs is computationally challenging because of the complexity of the considered models and the unknown distribution of the covariate MEs. To this end, the well-known Markov chain Monte Carlo (MCMC) algorithm is employed to develop two computationally feasible Bayesian case deletion diagnostics to assess the effect of cases (or sets of observations) on posterior distributions or estimations of parameters based on the K-L divergence and Cook's distance in this paper.
The rest of this paper is organized as follows. Section 2 introduces GLMEMs by using the centered DP mixture model to specify the distribution of covaraite MEs. Section 3 develops a Bayesian MCMC algorithm to make Bayesian inference on GLMEMs by using the Gibbs sampler together with the Metropolis-Hastings algorithm. Two Bayesian case deletion diagnostic measures are presented to detect influential observations based on the K-L divergence and Cook's distance in Sect. 3. Several simulation studies and a real example are used to illustrate our proposed methodologies in Sect. 4. Some concluding remarks are given in Sect. 5. Technical details are presented in the Appendix.

Generalized linear measurement error models
For i = 1, . . . , n, let y i denote the observed outcome variable, x i be a r × 1 vector of the unobserved covariate variables, and v i be a p × 1 vector of the observed covariate variables for the ith individual. Generally, the unobserved components of covariates may vary across different individuals. For simplicity, we assume that the unobserved components of covariates have the same components for z 1 , . . . , z n , where z i = (x i , v i ) for i = 1, . . . , n. Given z i , we assume that y i 's are conditionally independent of each other, and the conditional distribution of y i is a one-parameter exponential family with a canonical parameter θ i and a mean that is a function of z i . That is, for i = 1, . . . , n, the conditional probability density function of y i given z i is given by vector of unknown regression coefficients. Generally, there are two approaches to specify the ME structure. One is the structural ME model, and the other is the functional ME model. In a structural ME model, the special assumption is made on the distributional structure of the unobserved covariates, whilst nothing is assumed on the unobserved covariates in a functional ME model. Following Carroll et al. (2006), if the true covariate x i is measured m times for individual i, giving outcomes w i j for j = 1, . . . , m, the structural ME model can be expressed as where the MEs u i j 's are assumed to follow an unknown distribution, and are independent of x i . Following Lee et al. (2008), we use the DP mixture model to specify the distribution with (α g , g ) ∼ P and P ∼ DP(τ P 0 ), where π g is a random probability weight between 0 and 1 such that 0 ≤ π g ≤ 1 and ∞ g=1 π g = 1, N r (α g , g ) denotes the multivariate normal distribution with mean α g and covariance matrix g , P is a random probability with an unknown form, P 0 is a base distribution that serves as a starting-point for constructing the nonparametric distribution, and τ is a weight assigning a priori to the base distribution and represents the certainty of P 0 as the distribution of P. The widely used distribution for P 0 is the multivariate normal distribution. The DP prior with the stick-breaking representation may yield non-zero mean of MEs (Yang et al. 2010), which is inconsistent with the classic assumption that mean of u i j is zero (Carroll et al. 2006).
Inspired by Yang et al. (2010), we consider the following truncated and centered DP (TCDP) mixture model for u i j : where G is the number of the truncated mixture components, π g is taken to be the following stick-breaking procedure: ∼ Beta(1, τ ) for g = 1, . . . , G − 1, and ν G = 1 so that G g=1 π g = 1. Based on the above specified TCDP mixture model, it is quite difficult to sample observations from posterior distribution of u i j via the MCMC algorithm because of the complicated posterior distribution involved. To this end, we generate u i j from π g α * g in which α * L i j is sampled from the following reformulated model. Let π = {π 1 , . . . , π G }, α * = {α * 1 , . . . , α * G } and = { 1 , . . . , G } in which g = diag(ω g1 , . . . , ω gr ) for g = 1, . . . , G. It follows from Lee et al. (2008) that Equation (2.4) can be rewritten as where δ g (·) is a discrete probability measure concentrated at g, f 1 (π) is specified by the stick-breaking prior as given in Eq. (2.5), the prior distribution of α * g involved in f 2 (α * ) = G g=1 p(α * g ) is given by where = diag(ψ 1 , . . . , ψ r ), ξ 0 , 0 , c 1 and c 2 are hyperparameters whose values are assumed to be known, (c 1 , c 2 ) denotes the Gamma distribution with parameters c 1 and c 2 , and the prior distribution for ω g j involved in f 3 ( ) = G g=1 r j=1 p(ω g j ) is given by where ϕ a j , ϕ c j and ϕ d j are the pre-specified hyperparameters. To complete specification of the covariate ME model, we require defining a true covariate model. Following Aitkin and Rocci (2002) and Gustafson (2004), the true covariate model for x ki (k = 1, . . . , r ) can be defined as where γ k0 is an intercept, γ kv = (γ k1 , . . . , γ kp ) is a p × 1 vector of unknown regression parameters, and ε ki 's are residuals and assumed to be independent of the covariates v i and MEs u i j 's. The model defined in Eqs. (2.1)-(2.3) together with (2.9) is referred to as a GLMEM. The above defined model is not identifiable when there are no replicate measurements, i.e., m = 1. In this case, some identification conditions on parameters are required (Lee and Tang 2006), for example, we may set σ 2 x and γ k0 to be some prespecified values.
Based on the above presented joint probability density function and priors, a Bayesian approach is developed to make statistical inference on parameters in {τ, θ y , θ γ } by utilizing the Gibbs sampler together with the Metropolis-Hastings algorithm for our considered GLMEMs.

Bayesian inference on GLMEMs
From Eq. (2.10) and the above defined priors, it is easily seen that it is rather difficult to directly make Bayesian inference on parameters in {τ, θ y , θ γ } because of the intractable high-dimensional integrals involved. Owing to recent development in statistical computing, the Gibbs sampler is employed to generate a sequence of random observations from the joint posterior distribution p(ξ , , τ, x , x| y, w, v), and Bayesian estimates of unknown parameters and covariates subject to MEs are obtained from the generated sequence of random observations, x , x} are iteratively drawn from the following conditional distributions: p(ξ |α * , ), x , γ , L) and p(u|α, , L, x, w, θ u ). These conditional distributions are presented in the Appendix.

Bayesian estimates
(3.1) The consistent estimates of the posterior covariance matrices var(β| y, v, w), var(φ| y, v, w), var(γ | y, v, w) and var(σ 2 x | y, v, w) can be obtained from the sample covari-ance matrices of the above generated observations. For example, var(β| y, v, w) = . The standard errors for components of β can be obtained from the diagonal elements of the sample covariance matrix.

Bayesian case influence analysis
In this subsection, we consider Bayesian case-deletion influence analysis based on the case-deletion method given in Cook and Weisberg (1982). For notational simplicity, let R = { y, v, w} be a full data set, and To assess the effect of the ith individual on the posterior distribution of parameter vector ϑ = {τ, θ y , θ γ }, we use the following Kullback-Leibler (K-L) divergence as a Bayesian case influence measure, where p(ϑ|R) and p(ϑ|R (i) ) are the posterior distributions of ϑ for the full data set R and the reduced data set R (i) , respectively. Thus, K L(i) measures the distance (discrepancy) between two posterior distributions p(ϑ|R (i) ) and p(ϑ|R), which can be regarded as a Bayesian analogue of the likelihood displacement (Cook and Weisberg 1982).
To measure the effect of the ith individual on the posterior mean of ϑ, we use the following Cook's distance as another Bayesian case influence measure, whereθ = ϑ p(ϑ|R)dϑ andθ (i) = ϑ p(ϑ|R (i) )dϑ are the posterior means of ϑ for the full data set R and the deleted data set R (i) , respectively, and W ϑ is selected to be the posterior covariance matrix of ϑ. A large value of C D(i) corresponds to an influential observation with respect to the posterior mean. Generally, we can useD + d × SM as a benchmark (e.g., see Lee and Tang 2006), whereD and SM are the mean and standard error of {C D(i) : i = 1, . . . , n}, and d is a selected constant depending on the problem-by-problem. Specifically, we set d = 3.0 in our conducted simulation studies and d = 5.0 in real example analysis.
To compute K L(i), we require calculating the marginal posterior distributions p(ϑ|R) and p(ϑ|R (i) ). It is rather difficult to directly compute K L(i) because of the MEs involved. It is desirable to develop a computationally feasible formula to reduce computational burden. It is easily shown from independence of individuals that which indicates that computation of K L(i) can be done using MCMC samples from the full data posterior distribution p(ϑ|R) via the above developed Gibbs sampler. Specifically, if ϑ ( ) is the th Gibbs sample after J burn-in iterations for = 1, . . . , L, thus we get the MCMC approximation of K L(i) as On the other hand, to compute C D(i), we need evaluatingθ,θ (i) and W ϑ . It follows from Eq. (3.2) and the definitions ofθ,θ (i) and W ϑ thatθ = E ϑ|R (ϑ), (ϑ). Similarly, the MCMC approximations ofθ,θ (i) and W ϑ are given bŷ Regardless of K L(i) or C D(i), we need computing p i (ϑ ( ) ). From the definition of our considered model, we obtain where E u i ,x i denotes the expectation taken with respect to the joint distribution of u i and x i (denoted by p(u i , x i |θ u )). Monte Carlo approximation of p i (ϑ ( ) ) in Eq. (3.5) can be implemented by using the following steps: Step 3 Repeating Steps 1 and 2 for T times.
• Step 4 Getting the nested Gibbs samples {(u . Thus, the MCMC approximation of p i (ϑ ( ) ) is given by Combining Eqs. (3.2)-(3.6) yields the values of K L(i) and C D(i) for all individuals. Index plots of K L(i) and C D(i) can be used to identify influential cases.

Numerical examples 4.1 Simulation studies
To investigate the finite performance of the preceding proposed Bayesian approaches under various distributional assumptions of the MEs u i j and prior specifications, we conducted the following simulation studies by generating 100 replicated data sets from our defined GLMEMs with sample size n = 200 together with m = 5.
In the first simulation study, each of 100 replicated data sets . . , n} was generated from a Poisson distribution with the probability den- where v i 's were generated from a multivariate normal distribution with mean zero and covariance matrix 0.25I 3 and components x 1i and x 2i in x i were generated via Eq. (2.9). In this case, φ = 1 relating to Eq. (2.1) is a constant. The true values of .5) for k = 1 and 2, and σ 2 x = 1, respectively. To test the effectiveness of using the TCDP prior to approximate distributions of MEs u i j = (u i j1 , u i j2 ) , we considered the following eight distributional assumptions for u i jk .  The above four assumptions were used to illustrate that even when the assumed distribution is multimodal, our presented TCDP prior can still capture their characteristics.

Assumption 6
We specified the distribution of u i jk to be u i jk ∼ (1, 1).
The above presented four assumptions were designed to generate the skewed distribution for u i jk . Based on the generated u i jk 's, we obtained data set {w i jk : i = 1, . . . , n, j = 1, . . . , m, k = 1, 2} via Eq. (2.3).
The hyperparameter values of the prior distributions for τ and σ 2 x were specified as follows. To ensure that our presented DP mixture approximations were not biased with respect to the selection of our hyperparameters, we set ϕ a j = 3, ϕ c j = 150 and ϕ d j = 20 for j = 1, . . . , r , c 1 = 5 and allowed c 2 to be generated randomly from a uniform distribution U (2, 6). For the hyperparameters relating to the prior distribution of τ , we set a 1 = 200 and a 2 = 10 to generate large values of τ which lead to more unique covariate MEs. For the conjugate prior of σ −2 x , we set c 3 = 10 and randomly generated c 4 from a uniform distribution U (9, 10). For the hyperparameters ξ 0 and 0 , we took ξ 0 = 0 and 0 = I r to satisfy the condition of the centered DP procedure. Also, to investigate sensitivity of Bayesian estimates to prior inputs, we considered the following three types of priors for β and γ k .
• For each of the generated 100 data sets, the preceding proposed MCMC algorithm with G = 50 was used to evaluate Bayesian estimates of unknown parameters and covariates subject to MEs for each of three types of priors based on three different starting values of unknown parameters. The estimated potential scale reduction (EPSR) values (Gelman et al. 1996) for all unknown parameters were computed. For the first five test runs, we observed that the EPSR values of all unknown parameters were less than 1.2 after 10,000 iterations. Hence, L = 5000 observations after 10,000 burn-in iterations were collected to evaluate Bayesian estimates via Eq. (3.1). Results under eight assumptions together with three types of prior inputs were presented in Table 1, where 'Bias' was the absolute difference between the true value and the mean of the estimates based on 100 replications and 'RMS' was the root mean square between the estimates based on 100 replications and its true value. Also, for comparison, we calculated Bayesian estimates of β for each of the above generated 100 data sets under eight distributional assumptions of u i jk on the basis of a GLM without error modelling. The corresponding results were given in Table 2.
Examination of Tables 1 and 2 indicated that (i) Bayesian estimates were reasonably accurate regardless of distributional assumptions of u i j and prior inputs of unknown parameters because their Bias values were less than 0.10 and their RMS values were less than 0.20; (ii) Bayesian estimates were not sensitive to prior inputs of β and γ k    under our considered three prior inputs; (iii) Bayesian estimates obtained from the type A prior input behaved better than those obtained from the type B and type C prior inputs in terms of Bias and RMS, but their differences were minor; (iv) Bayesian estimates obtained from the type A and type B prior inputs were slightly better than those obtained from the type C prior input, but their differences were negligible; (iv) our proposed semiparametric Bayesian method produced smaller bias and RMS values than a Bayesian approach to a GLM without error modelling.
To investigate the accuracy of using TCDP prior to approximate distribution of u i jk , we calculated means and standard deviations ofû i jk 's across individuals and plotted the true densities of u i jk 's against their corresponding approximated densities for a randomly selected replication. Table 3 presented the estimated means and standard deviations of u i jk 's for our considered eight assumptions. To save space, we only plotted densities of u i jk andû i jk for Assumption 4 in Fig. 1. Examination of Table 3 and Fig. 1 implied that (i) the TCDP prior approximations to distributions of u i jk 's were flexible enough to recover the shapes of u i jk 's distributions for our considered eight distributional assumptions of u i jk ; (ii) the mean and standard deviation of the true distribution of u i jk can be estimated well by our proposed method.  To illustrate our proposed Bayesian case-deletion influence measures, we conducted the second simulation study. In this simulation study, the data set {(y i , v i , w i , x i ) : i = 1, . . . , 200} was generated by using the same setup as specified in the first simulation study, but outliers were created by changing y i as y i + 30 for i = 1, 100 and 150. We calculated the corresponding values of diagnostics K L(i) and C D(i) for the above generated data set including outliers. Results were presented in Fig. 2. Examination of Fig. 2 indicated that cases 1, 100 and 150 were detected to be influential as expected.

An example
To illustrate our proposed methods, we considered a data set from Framingham heart study, which has been analyzed by Carroll et al. (2006, Section 9.10) and Muff et al. (2015) via a logistic ME model with the normality assumption of covariate ME. The Mean and SD denote empirical mean and standard deviation of the generated random samples, respectively; EMean and ESD represent mean and standard deviation of the sampled posterior samples, respectively data set consisted of a series of exams taken over two years. Here, we only analyzed the data set from exam 3 with n = 1615 men aged between 31 and 65. We took y i to be the indicator for coronary heart disease, which was assumed to follow a Bernoulli distribution, v i to be the indicator for smoking and x i to be the transformed (unobserved) long-term blood pressure (i.e., x i = log(SBP i − 50)), where SBP was an abbreviation of the systolic blood pressure. Since it is impossible to measure the long-term SBP, measurements at single clinical visits had to be used as a proxy (Muff et al. 2015). Also, due to daily variations or deviations in the measurement instrument, the single-visit measures might considerably differ from the long-term blood pressure (Carroll et al. 2006). Hence, to estimate the magnitude of the error, SBP had been measured twice at different examinations. The two proxy measures for x i were denoted as w i1 and w i2 , respectively. Following Muff et al. (2015), we fitted the data set via the following logistic ME model (LOGMEM): for i = 1, . . . , n and j = 1, 2, where ε i i.i.d.
For the hyperparameters a 1 and a 2 , we took a 1 = 250 and set a 2 to be a value generated randomly from a uniform distribution U (25, 30) to yield large values of τ leading to more unique covariate MEs. For the hyperparameters c 3 and c 4 , we randomly generated c 3 from a uniform distribution U (1, 10) and randomly sampled c 4 from a uniform distribution U (1, 100) to yield relatively diffuse values of σ 2 x . For the hyperparameters ξ 0 and 0 , we took ξ 0 = 0 and 0 = I 2 to satisfy the condition of the centered DP procedure. Similar to simulation studies, we set ϕ a j = 3, ϕ c j = 100 and ϕ d j = 20 for j = 1, . . . , r , c 1 = 5 and allowed c 2 to be generated randomly from a uniform distribution U (2, 6). The preceding presented MCMC algorithm with G = 250 was used to obtain Bayesian estimates of parameters and MEs u i j 's. Similarly, the EPSR values of all unknown parameters were computed by using three parallel sequences of observations generated from three different starting values of unknown parameters. Their EPSR values were less than 1.2 after about 20,000 iterations. We collected 10,000 observations after 20,000 burn-in iterations to evaluate Bayesian estimates of parameters. Results were given in Table 4. Examination of Table 4 showed that the SBP has a positive effect on the coronary heart disease, whilst smoking has a slightly negative effect on the coronary heart disease and the SBP. Also, for comparison, we evaluated Bayesian estimates of parameters in the following logistic model: The corresponding results were presented in Table 4, which showed that our considered logistic ME model leaded to a smaller estimate and a smaller SE of parameter β x than the logistic model without ME.
To illustrate our proposed case deletion influence measures, we computed the values of diagnostics K L(i) and C D(i), which were presented in Fig. 3. Examination of Fig. 3 indicated that cases 10,59,207,208,222,362,367,386,391,456,501,530,533,709,976,1093,1096,1162,1187,1336,1430 and 1502 were detected to be influential by diagnostics K L(i) and C D(i). To investigate the effect of these influential observations on Bayesian estimates of unknown parameters, we also calculated Bayesian estimates of unknown parameters for our considered data set with these influential cases deleted. The corresponding results were given in Table 4. Examination of Table 4 indicated that these influential individuals have a relatively large influence on Bayesian estimates of β x and β v .

Discussion
We discussed Bayesian estimates of unknown parameters and Bayesian case-deletion diagnostics for generalized linear mixed models with covariates subject to MEs. Under the unknown distribution assumptions of random MEs, we used the TCDP mixture model to approximate the distribution of random ME. We also obtained Bayesian estimates of unknown parameters and random MEs and their standard errors, and presented two Bayesian case-deletion influence diagnostics to detect influential observations. Simulation studies and a real example were used to illustrate our proposed methodologies. The empirical results showed that (i) the TCDP mixture model approximation can well capture characteristics of the distribution for random ME; and (ii) our proposed methods can be used to effectively detect influential observations. This paper considered the balanced repeated measurement for the covariate subject to ME so that we can use the TCDP mixture model to approximate the distribution of random ME. When an unbalanced repeated measurement for the covariate subject to ME is considered, other methods may be employed to address the issue, which is our further work.
Since it is rather difficult to directly sample observations from the posterior distribution of (ξ , , ϕ b , π , α * , , L, τ, u), the blocked Gibbs sampler is employed to solve the above difficulties. The conditional distributions relating to implement Gibbs sampling of the nonparametric components are given as follows.
Step (h) The conditional distribution of L i j can be shown to be p(L i j |π , ∼ Multinomial(π * i j1 , . . . , π * i jG ), where π * i jg is proportional to π g p(u i j | α g , g ) and p(u i j |α g , g ) ∼ N r (α g , g ) for g = 1, . . . , G.
Step (i) Consider the conditional distribution of u i j . It is easily shown that the conditional distribution p(u i j |α, , is a non-standard distribution. Thus, we cannot directly sample u i j from its conditional distribution. Here, the Metropolis-Hastings algorithm is employed to sample observation u i j from its conditional distribution via the following steps. At the tth iteration with a current value u (t) i , a new candidate u i j is generated from the normal distribution N m (u (t) i j , σ 2 u D u i j ), where Then, the new candidate u i j is accepted with probability .
The variance σ 2 u can be chosen such that the average acceptance rate is approximately 0.25 or more.
The variance σ 2 φ can be chosen such that the average acceptance rate is approximately 0.25 or more. Particularly, if c(y i , φ) = c(y i )/φ, we have p(φ −1 | y, x, v, β) ∼ (r + p + a 3 , a 4 + 0.5(β − β 0 ) (H 0 β ) −1 (β − β 0 ) − n i=1 (y i θ i − b(θ i ) + c(y i ))). Also, if c(y i , φ) = ζ c(y i ), we obtain p(φ −1 | y, x, v, β) ∼ (r + p + a 3 , a 4 + 0.5(β − β 0 ) (H 0 where ζ is a constant that does not depend on φ and y i . Step (k) The conditional distribution p (β| y, x, v, φ) can be expressed as Similarly, the Metropolis-Hastings algorithm for simulating observations from the conditional distribution p(β| y, x, v, φ) is implemented as follows. Given the current value β (t) , a new candidate β is generated from N p+r (β (t) , σ 2 β β ) and is accepted with probability min{1, p(β| y, x, v, φ)/ p(β (t) | y, x, v, φ)}, where Step (l) The conditional distribution p(γ * k |x, v, σ 2 x ) is given by p Step (m) It is easily shown that the conditional distribution p(σ 2 x |x, v, γ ) is given by p where L i = {L i j : j = 1, . . . , m}. The Metropolis-Hastings algorithm for sampling observations from p(x i |y i , v i , γ , w i , α, , β, σ 2 x , φ, L i ) is implemented as follows. Given the current value x (t) i , a new candidate x i is generated from N r (x (t) i , σ 2 a H x i ) and is accepted with probability min 1, where t) . The variance σ 2 a can be chosen such that the average acceptance rate is approximately 0.25 or more.