1 The M-based models

Martinez-Beneito highlights two main advantages of coregionalization models: computational convenience and validity by construction. In particular, he emphasizes the computational advantages of the M-model construction. Indeed, hierarchically formulated M-models and associated Bayesian computational methods are shown to be computationally efficient in handling multivariate and multiarray spatial lattice data of many variables.

The M-model proposal has two issues: identification and interpretation. It seems to me that a good understanding of the two issues is important for both the methodological and practical reasons. Here, I briefly discuss these issues and the need for additional research.

Consider any p-variate M-model with p variable-specific spatial dependence parameters, denoted M-model (\(c_1,c_2,\ldots , c_p\)) hereafter. It is mentioned in Botella-Rocamora et al. (2015), MacNab (2016b), and the present paper that the spatial parameters \(c_1,c_2,\ldots , c_p\) and the M-matrix therein are not identifiable. It seems to me that the gain in computational efficiency for M-models comes at a price: the data loss control and identification of the spatial dependence parameters. An important question arises: What are the additional benefit(s) the M-models might bring, compared to their counterparts of separable models? Examples of competing separable models include the M-model (c) (with a general spatial parameter c), the separable models of the Mardia family, and the intrinsic multivariate CAR (MiCAR) models, to name a few; see Table 1 for a brief illustrative DIC comparison. Separable models also have similar or greater computational advantages. Additionally, data (partially) inform on the general spatial parameter in a separable model.

Briefly illustrated here using the two data sets presented in the present paper, the results of my recent study suggest that (i) the Markov chain Monte Carlo (MCMC) implementation for the M-model (c) may be more stable, (ii) the spatial parameter in M-model (c) is identifiable, (iii) the M-model (c) may outperform M-model (\(c_1,c_2,\ldots ,c_p\)) in terms of DIC (see Table 1), (iv) the M-model (c) may lead to less posterior shrinkage of the marginal correlation and cross-correlation functions (see Fig. 1), and (v) the two models can produce nearly identical posterior relative risk smoothing, prediction, and inference (see Fig. 2).

Table 1 DIC results for indicated models. The Minnesota cancer mortality data and the BC adverse medical events data
Fig. 1
figure 1

Illustrative comparisons of the posterior estimates of correlations (corr) and cross-correlations (cross-corr) functions between the estimated M-model \((c_1,c_2,c_3)\) and M-model (c). The estimates are illustrated using the results for county 1, which has 8 first-order neighbors. Each of the plots shows a cluster of higher estimates of correlations and cross-correlations between the county 1 and its 8 first-order neighbors, compared to the estimates between the county 1 and its higher-order neighbors. The Minnesota cancer mortality data

Fig. 2
figure 2

Illustrative comparisons of posterior estimates, median and standard deviation (sd), of relative risks between the estimated M-model (c) and M-model \((c_1,c_2,c_3)\). The Minnesota cancer mortality data

In Botella-Rocamora et al. (2015), the posterior estimates of the within-location covariance matrix \({\varvec{\varSigma }}\) are used to draw inference for (log) relative risk correlations between diseases. Due in part to the complex “entanglement” of the spatial and non-spatial parameters in the M-model and in MGMRFs in general, and perhaps due in part to the area-specific scaling factors, such interpretation of \({\varvec{\varSigma }}\) for inference on the pair-wise associations between diseases should be questioned. For example, it seems to me that there is a tendency for the posterior estimates of the correlation parameters in \({\varvec{\varSigma }}(={\varvec{M}}{\varvec{M}}^{\top })\) to overestimate the disease risk associations. For the Minnesota cancer mortality data, the Pearson correlation coefficient (PCC) for the esophageal and lung cancers is 0.28. But, the posterior median (and standard deviation) of the associated correlation parameter in \({\varvec{\varSigma }}\) is \(r_{13} = 0.66\,(0.22)\) for the estimated M-model A and \(r_{13}= 0.73\,(0.21)\) for the estimated M-model B. Table 2 presents the estimated non-spatial correlation parameters for the estimated M-model and the MiCAR, respectively. The correlation estimates for the MiCAR A are the closest to the PCCs.

Fig. 3
figure 3

Illustration of shrinkage priors on the spatial parameters, resulting from placing the PDC on the (descending) ordered eigenvalues of \({\varvec{C}}_s\) (the 9 plots on the left) and the PDC on the (descending) ordered singular values of \({\varvec{C}}\) (the 9 plots on the right), respectively. The PDC ranges are (-0.322, 0.178) and (0, 0.178), respectively, calculated from the eigenvalues of \({\varvec{W}}\) of connected first-order neighbors in 87 Minnesota counties of USA

These results on the M-models warrant a further investigation. For example, assessing the M-models via a simulation study may offer insights into their utility as spatial smoothers in multivariate disease mapping. The use of the M-models (with variable-specific spatial parameters), rather than the counterparts of separable models, for modeling multi-way data, as presented in Martinez-Beneito et al. (2017), also warrants practically useful motivation, interpretation, and numeric assessment and comparison.

2 Posterior sensitivity to prior choices for \({\varvec{C}}\) and \({\varvec{\varSigma }}\)

In the context of multivariate disease mapping where data may contain limited information, Greco and Trivisano raise important practical issues concerning hyperprior choices for MCARs and effect on model comparison and selection. They point out that (i) posterior sensitivity to prior specifications for \({\varvec{C}}\) and \({\varvec{\varSigma }}\) may have a complex impact on model selection (using DIC) and (ii) the currently used prior specifications over the reparameterizations of the matrices \({\varvec{C}}_s\), \({\varvec{C}}\), and \({\varvec{\varSigma }}\) may lead to order-sensitive posterior risk shrinkage, predictions, and inference. In disease mapping and small area estimation, notable posterior sensitivities to hyperprior specifications of (M)GMRFs are not uncommon and should be reported as important indications of statistical uncertainty.

In what follows, I further explain the issues raised by Greco and Trivisano with illustrative results of additional multivariate analysis of the Minnesota cancer data.

Table 2 Posterior median and standard deviation (sd) of correlation parameters in \({\varvec{\varSigma }}\), derived from the estimated M-models and MiCAR
Fig. 4
figure 4

Posterior estimates (median and standard deviation) of relative risks for the linear coregionalization cMpCAR\(_{\tiny \text{ UC }}\)(21). Positive definite constraint is placed on the singular values of \({\varvec{C}}\) via singular value decomposition. The coregionalization coefficients matrix \( {\varvec{A}}=P({\varvec{\theta }}) {\varvec{e}}P({\varvec{\theta }})^{\top } \) is the square root of \({\varvec{\varSigma }}\). The Minnesota cancer data

I begin with a brief illustration and explanation of how and why the uniform priors placed on the eigenvalues of a symmetric matrix \({\varvec{C}}_s\), or on the singular values of an asymmetric \({\varvec{C}}\), with uniform priors on the associated Givens angles, might impose a priori restrictions on the elements of \({\varvec{C}}_s\) or \({\varvec{C}}\). Figure 3 presents the resulting element-wise prior distributions for all elements of \({\varvec{C}}_s\) and \({\varvec{C}}\), respectively. These element-wise histograms were calculated based on 10,000 samples of \({\varvec{C}}_s = P({\varvec{\theta }}) {\varvec{e}}P({\varvec{\theta }})\) or \({\varvec{C}}= P({\varvec{\theta }}_L) {\varvec{s}}P({\varvec{\theta }}_R)\), where the ordered eigenvalues \({\varvec{e}}\) were simulated from Unif(− 0.322, 0.178), the ordered singular values \({\varvec{s}}\) from Unif(0, 0.178), and the Givens angles \({\varvec{\theta }}\), \({\varvec{\theta }}_L\), and \({\varvec{\theta }}_R\) from Unif\((-\pi /2, \pi /2)\).

As noted by Greco and Trivisano, element-wise prior patterns can be observed from Fig. 3. Placing the above-mentioned priors on the eigenvalue decomposition of \({\varvec{C}}_s\) leads to skewed prior distributions on the diagonal elements of \({\varvec{C}}_s\). Heavier prior restrictions toward zero are placed on the off-diagonal elements of \({\varvec{C}}_s\). The skewed prior distributions, from right-skewed to the left-skewed (see the 9 plots on the left), correspond to the descending order of the eigenvalues from 0.178 down to \(-\,0.322\). Likewise, the patterns of increasing prior concentration toward zero, over the diagonal and the off-diagonal elements of \({\varvec{C}}\) (illustrated in the 9 plots on the right), are also in line with the descending order of the positive singular values, from the upper limit 0.178 down toward 0.

It should be mentioned that these patterns are not due to the use of priors on the Givens angles but are the result of eigen- or singular value decomposition with ordered eigen- or singular values for unique decomposition of \({\varvec{C}}_s\) or \({\varvec{C}}\). It is readily verified that these patterns should disappear if the priors for \({\varvec{C}}_s\) or \({\varvec{C}}\) were simulated from the same reparameterization but with un-sorted eigen- or singular values. Notice that the descending or ascending ordering of the eigen- or singular values is necessary to enable identification of \({\varvec{C}}_s\) or \({\varvec{C}}\) via its unique decomposition. Figure 3 also indicates that, compared to placing priors on the eigenvalue decomposition of \({\varvec{C}}_s\), placing priors on the singular value decomposition of \({\varvec{C}}\) may lead to greater posterior shrinkage on the diagonal elements \( \{c_{jj}, \forall j \}\) but less posterior shrinkage on the off-diagonal elements \(\{ c_{jl}, \forall ~j \ne l \}\).

Table 3 DIC results for indicated models

In the Minnesota cancer mapping application, and for the cMpCARs of the Type II decompositions, notable order sensitivities were observed from the resulting deviance information measures (see Table 3) and from the posterior estimates of spatial and non-spatial parameters (see Table 4). The posterior estimates of relative risks were relatively unchanged for esophageal and lung cancers, respectively, with modest order sensitivity for laryngeal cancer (see Fig. 4). Similar results are also observed from the cMpCARs of the Type I decomposition (MacNab 2018).

Table 4 Posterior estimates for the linear coregionalization cMpCAR\(_{\tiny \text{ UC }}({\varvec{C}}, {\varvec{A}})\)(21), with constraint \(s_j \in (0, c_{\tiny \text{ max }}), \forall j\), on the singular value decomposition of an asymmetric matrix of \({\varvec{C}}\), \({\varvec{A}}= P({\varvec{\theta }}) {\varvec{e}}^sP({\varvec{\theta }})\), and for the three ways of ordering the variables indicated in table

Placing hierarchical priors on the elements of \({\varvec{C}}\), as presented in MacNab (2016b, 2018), may be one approach to order-invariant estimation of \({\varvec{C}}\) and MGMRFs. My recent case studies seem to indicate that placing hierarchical priors on the elements of \({\varvec{C}}\) may impose less posterior shrinkage to \({\varvec{C}}\), perhaps less shrinkage to the diagonal elements of \({\varvec{C}}\) (MacNab 2016b, 2018). The plots (a)–(d) in Fig. 5 illustrate that the estimated correlation and cross-correlation functions of the cMpCAR\(_{\tiny \text{ UC }}({\varvec{C}}, {\varvec{A}})\) with positive definiteness constraint (PDC) and associated priors on the singular value decomposition (SVD) of \({\varvec{C}}\) are similar to, but overall lower than, those of the cMpCAR\(_{\tiny \text{ UC }}({\varvec{c}}, {\varvec{A}})\), the MGMRF with a diagonal matrix \({\varvec{C}}=\text{ diag }({\varvec{c}})\). Note that the PDC and associated priors on the SVD of \({\varvec{C}}\) may impose considerable shrinkage to both the diagonal and off-diagonal elements of \({\varvec{C}}\), which might be a reason that the estimated correlation and cross-correlation functions of the cMpCAR\(_{\tiny \text{ UC }}({\varvec{C}}, {\varvec{A}})\), in Fig. 5, plots (a)–(d), are overall lower than those of the cMpCAR\(_{\tiny \text{ UC }}({\varvec{c}}, {\varvec{A}})\). In contrast, the plots (e)–(h) in Fig. 5 seem to suggest that element-wise HPs on \({\varvec{C}}\) may lead to notably less posterior shrinkage to the correlation and cross-correlation functions.

Greco and Trivisano also comment and illustrate the impact of sensitivity to prior specification for \({\varvec{\varSigma }}\) on model comparison and selection. I agree with them that in the present paper the observed differences between the models may be influenced by the different prior specifications for \({\varvec{\varSigma }}\) or \({\varvec{\varGamma }}\). Briefly illustrated in Table 1, when data contain limited information, posterior sensitivity can be observed from the same (and relatively simple) model but different (Wishart) prior specifications for \({\varvec{\varGamma }}\).

Fig. 5
figure 5

Illustrative comparisons of posterior estimates of correlations (corr) and cross-correlations (cross-corr) between the estimated cMpCAR\(_{\tiny \text{ UC }}({\varvec{C}}, {\varvec{A}})\) and cMpCAR\(_{\tiny \text{ UC }}(\text{ diag }(c_1,c_2,c_3), {\varvec{A}})\), where \({\varvec{A}}\) is the square root of \({\varvec{\varSigma }}\). The estimates are presented for county 1. The matrix \({\varvec{C}}\) in cMpCAR\(_{\tiny \text{ UC }}({\varvec{C}}, {\varvec{A}})\) is allowed to be asymmetric. The estimated correlations and cross-correlations are based on (i) placing positive definiteness constraint (PDC) priors on the singular value decomposition (SVD) of \({\varvec{C}}\), the 4 plots on the left, or (ii) placing hierarchical priors (HP) on the elements of \({\varvec{C}}\), the 4 plots on the right. The Minnesota cancer mortality data

3 Stationary and non-stationary (M)GMRFs

Sain and Furrer comment that Markov random fields do not, in general, lead to stationary models. This would be true for the coregionalization models. In general, the so-called edge effects lead to latent (M)GMRFs with marginal correlations that differ by location. While not discussed in the present paper, formulations of stationary (M)GMRFs for rectangular lattice-neighborhood schemes (with appropriate boundary conditions/adjustments) are discussed in Besag (1972, 1974) and Mardia (1988). Similar approaches can be taken to formulate stationary latent fields that lead to stationary coregionalization models. As mentioned by Sain and Furrer, stationary (M)GMRFs may be motivated and formulated by problem-driven considerations of neighborhood structures. Compared to non-stationary (M)GMRFs, these models typically involve smaller number of unknown parameters and often have computational advantages, say, in terms of scalability and efficiency.

In the present paper, some non-stationary (M)GMRFs with locally varying (adaptive) spatial and/or scale parameters are briefly outlined. These models are indeed complex and contain many parameters. Briefly mentioned in the paper, locally adaptive (M)GMRFs may be considered for their flexibility of modeling complex multivariate interaction and dependence structures, perhaps facilitated by additional data for covariates and explanatory variables.

I agree with Sain and Furrer that “it would be interesting to see if the different types of coregionalization models are stationary and how they compare with each other in this respect.” While a stationary coregionalization model may be built by formulating stationary latent fields, an interesting question would be whether or how a stationary coregionalization model may be built from non-stationary latent fields. In addition, it would be interesting to know whether the “entanglement” of the spatial and non-spatial parameters in the coregionalization models, say the models of the Type II decomposition with full matrices \({\varvec{C}}\) and \({\varvec{A}}\) or their SVC counterparts, may give the MGMRFs the flexibility to model or approximate stationary or nearly stationary Gaussian fields.

The computational advantages of (stationary) GMRFs also motivated recent considerations of fitting (stationary) GMRFs to (stationary) Gaussian fields formulated through specifications of the covariance functions (Rue and Tjelmeland 2002; Cressie and Verzelen 2008; Lindgren et al. 2011). In this context, both the local and global properties of the GMRFs are important (Rue and Tjelmeland 2002). As noted in Rue and Tjelmeland (2002), one important question is whether a GMRF with a small neighborhood can approximate a Gaussian field with long correlation length. Figure 5 seems to indicate that the linear coregionalization MGMRF with element-wise HPs for an asymmetric matrix \({\varvec{C}}\) of spatial parameters, which control for conditional spatial dependencies and cross-dependencies in the latent MGMRF, may have the flexibility to approximate smooth multivariate Gaussian fields. A follow-up and more rigorous research into this perceived flexibility is necessary.

Rue and Tjelmeland (2002) indicate that local Markov random fields are able to fit global properties to some extent. Sain and Furrer mention the need of higher-order neighborhood structures for smoother fields. I agree with Sain and Furrer that extensions of the MGMRFs to higher-order neighborhood structures and associated Markovian dependence and independence may be conceptually straightforward but analytically and computationally complex. Nevertheless, formulation and implementation of coregionalization MGMRFs of higher-order neighborhood structures are more manageable for p-variate GMRFs with p variable-specific spatial parameters or for separable MGMRFs with a general spatial parameter.

4 Various approaches to model formulation and related applications

Sain and Furrer comment on the fact that, while the coregionalization framework unifies several lines of MGMRF development, “there is still no one unified model formulation that allows movement between the different approaches through some set of parameters.” They rightly correct me and show that the Sain et al. (2011) framework contains separable models. Indeed, if we free ourselves to allow the off-diagonal block matrix elements \({\varvec{\beta }}_{ik}\) (when \(i \sim k\)) in the Sain et al. MGMRF framework to be parameterized with both the spatial and non-spatial dependence parameters, the Sain et al. family of MGMRFs actually contains the MGMRFs of both Type I and II decompositions. To put it differently, the following joint precision matrix

$$\begin{aligned} {\varvec{\varOmega }}_{\text{ vec }~({\varvec{\zeta }}^{\top })}^{\tiny \text{ MGMRF }} = {\varvec{D}}_m \otimes {\varvec{\varGamma }}- ({\varvec{W}}_U \otimes {\varvec{B}}+ {\varvec{W}}_U^{\top } \otimes {\varvec{B}}^{\top }) \end{aligned}$$
(1)

(Equation (14) in the paper) represents a general formulation of the MGMRFs contained within the Sain et al. (2011), the Mardia (1988), and the linear coregionalization (MacNab 2016a, b) frameworks. Through various parameterizations of \({\varvec{B}}\) (eg. \({\varvec{B}}={\varvec{B}}({\varvec{C}}, {\varvec{\tau }})\) or \({\varvec{B}}={\varvec{B}}({\varvec{C}}, {\varvec{\varGamma }})\) or \({\varvec{B}}=B({\varvec{C}}, {\varvec{\varSigma }}^{1/2})\) or \({\varvec{B}}={\varvec{B}}({\varvec{C}}, {\varvec{A}})\)), specific MGMRFs of the Type I or II decomposition could be derived to have a precision matrix (1).

Martinez-Beneito comments on the need to better understand whether models produced from one approach can be reproduced from another approach. He also calls for better understanding of the different features of the models produced by the different approaches. The Sain and Furrer commentary and the above discussion offer some relevant new insights. For example, if we define a MGMRF by its joint precision matrix (1), the MGMRFs produced by the Mardia (1988) approach can be reproduced by the Sain et al. (2011) approach, and vice versa.

The models with a precision matrix (1) but with different parameterizations of \({\varvec{B}}\) are different MGMRFs with different partial correlation and cross-correlation matrix functions. They can also represent different conditionally formulated MGMRFs, one based on univariate conditionals and the other multivariate conditionals. For MGMRF estimation and inference, the different lines of model development and different model constructions also have had considerable influence on our choice for positive definiteness constraint and for hyperprior specification. As pointed out by Greco and Trivisano and discussed earlier, the observed differences between the various models, say, those presented in the present paper, may due in part to the different prior specifications for the model parameters. I agree with Martinez-Beneito on the appeal of casting the coregionalization MGMRFs within a matrix algebraic framework. For example, the spatially varying coregionalization MGMRFs presented in the paper can be seen as being built within a matrix algebraic framework. Indeed, the advantages of the Martinez-Beneito (2013) framework are well illustrated in Martinez-Beneito et al. (2017), where the use of matrix theory and algebra for the formulations of complex M-models, and the associated statistical computations, is presented.

In general, the challenges in constructing, constraining, and estimating a MGMRF differ considerably depending on whether we pursue a separable or non-separable model. If a non-separable model is considered, then a model with a diagonal matrix of spatial parameters is, in general, more readily constrained and estimated, compared to its counterparts with a full matrix of spatial parameters. My own experiences, and the results presented in recent literature, also correspond with Martinez-Beneito’s comment that, at least in the context of multivariate disease mapping, MGMRFs with a full matrix of spatial parameters may not be necessary or may be over-parameterized, particularly for data of rare events.

MGMRFs with a full matrix of spatial dependence parameters may be useful when the goal is estimation and inference on multivariate spatial dependencies. For example, in the Sain et al. (2011) study, the motivating example for their bivariate MGMRF proposal was to model and draw inference on asymmetric local dependencies between two climate variables: temperature and precipitation. The pair-wise conditional asymmetric spatial dependencies are quantified in relation to the variables and to the site labeling. In some applications, this may be an appealing feature of the MGMRFs. For example, complex and diverse interaction structures may be modeled by varying the neighborhood structures, the labeling of the neighbor sets, and the parameters in the MGMRFs. In the contexts of image analysis and restoration, computer vision, social network analysis, and spatial data fusion, these MGMRFs may be potentially useful for modeling and learning complex and varied local patterns and features of dependencies and interactions.

Indeed, there is a lot to learn about the various MGMRF constructions. A good understanding of the various approaches to formulating MGMRGs should enable us to develop subject-matter-specific models that provide principled ways to express dependency and interaction structures. I agree with Sain and Furrer that developing objective procedures and practical guidance for choosing between competing models is an area of ongoing and necessary research and progress. Potential utilities of the various MGMRF constructions may be better explored as we succeed in tackling the computational challenges in statistical estimation and inference. Overcoming these challenges may also open new frontiers for MGMRF development and application.

5 Statistical computation

As mentioned in the paper, the currently available computational methods and tools for Bayesian hierarchical MGMRF models primarily use Gibbs or Metropolis-within-Gibbs sampling algorithms that capitalize on the conditional probability formulations of (latent) Markov random fields (Besag et al. 1991; Besag and Green 1993). The full conditionals facilitate relatively simple programming for location-wise or variable- and location-wise posterior sampling, often requiring little or no matrix algebra. The main disadvantage of these component-wise Gibbs sampling methods is that the MCMC simulations can be impractically slow and the computational costs may be prohibitive for datasets with a large number of sites (i.e., areal units) and/or a large number of variables. Nevertheless, these computational tools are useful for modestly sized datasets and have enabled us to gain deeper knowledge about the conditionally formulated models discussed in the present paper.

While it contains limited mathematics tools, the WinBUGS (or OpenBUGS) freeware offers a user friendly and accessible interface for Bayesian analysis of the majority of the MGMRFs available to date. As illustrated in the recent literature and in the present paper, WinBUGS may still be quite useful to statisticians and practitioners who wish to use, learn, and test these MGMRFs in real-life applications, at least in the near future.

I agree with Greco–Trivisano and Sain–Furrer that writing computer code and packages outside WinBUGS, say, for a “ready-to-use Bayesian software environment,” would be a worthwhile effort and can be essential for computational flexibility, efficiency, and scalability. In the pursuit of this effort, alternative computational methods and tools may be developed by tapping into sparse matrix methods that are available in software of high-level programing language, such as the R (https://www.r-project.org), Python (https://www.python.org), and MATLAB (https://www.mathworks.com/products /matlab.html). For example, an R-package may be developed for existing computational methods and traditional component-wise or block Gibbs samplers (Rue 2001; Rue and Held 2005). New Gibbs updating strategies for computationally efficient posterior sampling on large lattices (Brown et al. 2017; Marcotte and Allard 2018) may also be explored by programing in R, Python, or MATLAB.

There are also several less-explored computational options that can take the advantage of sparse MGMRF precision matrix. For example, instead of using the Gibbs sampler for fully Bayesian hierarchical inference involving MGMRF, we may explore the possibility of developing an R-package for the so-called hybrid Monte Carlo algorithm, also known as the HMC or the Hamiltonian MC algorithm (Neal 1996; MacNab 2003a, b; MacNab et al. 2004; Gustafson et al. 2004; Girolami and Calderhead 2011). If successful, the R-package may provide a tool for MCMC sampling of complex multivariate posteriors, say, for the generalized linear mixed (GLMM) models with the SVC priors discussed in the paper. In the context of Bayesian disease mapping and ecological regression, my earlier works in this direction (MacNab 2003a, b; MacNab et al. 2004; Gustafson et al. 2004) explored GMRF estimation for modestly large datasets (MacNab 2003a, b). Compared to the component-wise Gibbs sampler, an adequately tuned HMC algorithm may facilitate computationally more efficient joint posterior sampling of correlated (latent) components, such as correlated random effects in GLMMs. For MGMRFs, a computational challenge is again the tuning of user-specified parameters that (i) control the step size for proposal distribution and (ii) determine a desired number of Monte Carlo runs. Recent works considered optimal tuning (Beskos et al. 2013) or automatic tuning of the HMC parameters (Hoffman and Gelman 2014). These lines of research are important and should make the HMC algorithm more accessible.

We may also tap into the existing tools for Bayesian or approximate Bayesian estimation and inference. For example, the Hamiltonian Monte Carlo sampling tools offered by Stan interfaces (Stan Development Team 2016), such as rstan for R, PyStan for Python, and MatlabStan for MATLAB, may be explored and utilized. The R-package for stochastic gradient MCMC, sgmcmc, may also be considered or expanded as a computational option for large datasets. Another option is to access and improve the well-known Integrated Nested Laplace Approximations (INLA) tool in R, the R-INLA, for approximate Bayesian inference, perhaps for MGMRFs of small or modest p and a modest number of hyperparameters; see Rue et al. (2017) for a recent review on approximate Bayesian computing with INLA.

Sain and Furrer comment on likelihood estimation as a means to explore and address issues concerning (i) choice of parameterization and (ii) potential impact of transformations or constraints for parameters on estimation. These and similar issues may also be explored and addressed within a Bayesian hierarchical inferential framework using efficient Bayesian tools. Nevertheless, likelihood-based estimation methods, such as the pseudolikelihood approach (Besag 1974, 1975), (penalized) maximum likelihood methods (Dempster 1977; Fessler and Hero 1995; Descombes et al. 1999; Zammit-Mangion and Rougier 2018), penalized quasi-likelihood methods (Breslow and Clayton 1993; Guha et al. 2009; Huque et al. 2018), or suitable variations, may indeed be useful options. Likelihood approaches to hierarchical MGMRF estimation typically involve (i) manipulations of sparse MGMRF precision matrices, (ii) iterative procedures, and (iii) careful and adequate quantification of estimation uncertainty (Ainsworth and Dean 2006; MacNab et al. 2004; MacNab and Lin 2009; Guha et al. 2009).

Variational inference (de Freitas et al. 2001; Kucukelbir et al. 2015, 2017; Blei et al. 2017 (a recent review); Zhang et al. 2018), composite likelihood methods (see Varin et al. 2011; Larribe and Fearnhead 2011 for recent reviews), and parallel computing (Gonzalez et al. 2011; Brown et al. 2017; Castruccio and Genton 2018) are also potential options to be explored and utilized for analyzing data on large lattices.