High-dimensional order-free multivariate spatial disease mapping

Despite the amount of research on disease mapping in recent years, the use of multivariate models for areal spatial data remains limited due to difficulties in implementation and computational burden. These problems are exacerbated when the number of areas is very large. In this paper, we introduce an order-free multivariate scalable Bayesian modelling approach to smooth mortality (or incidence) risks of several diseases simultaneously. The proposal partitions the spatial domain into smaller subregions, fits multivariate models in each subdivision and obtains the posterior distribution of the relative risks across the entire spatial domain. The approach also provides posterior correlations among the spatial patterns of the diseases in each partition that are combined through a consensus Monte Carlo algorithm to obtain correlations for the whole study region. We implement the proposal using integrated nested Laplace approximations (INLA) in the R package bigDM and use it to jointly analyse colorectal, lung, and stomach cancer mortality data in Spanish municipalities. The new proposal allows for the analysis of large datasets and yields superior results compared to fitting a single multivariate model. Additionally, it facilitates statistical inference through local homogeneous models, which may be more appropriate than a global homogeneous model when dealing with a large number of areas.


Introduction
Research on methodology for the spatial (and spatio-temporal) analysis of areal count data has grown tremendously in the last years, and statistical models have proven an essential tool for studying the geographic distribution of data in small areas.The main objective of these techniques is to smooth standardized mortality (incidence) ratios or crude rates to discover geographic patterns of the phenomenon under study.These models and methods have been mainly applied in epidemiology to analyse the incidence and mortality of chronic diseases such as cancer, but some recent research has demonstrated their applicability to the spatial and spatio-temporal analysis of crimes (see for example Li et al., 2014), and in particular crimes against women (see for example Vicente et al., 2018Vicente et al., , 2020a)).Although research on single disease analysis has been very fruitful and abundant since the seminal work of Besag et al. (1991), joint modelling of several responses offers some advantages.The first is that it improves smoothing by borrowing strength between diseases.The second, and probably more important, is that it allows to establish relationships between different diseases in terms of similar or completely different geographical distributions, i.e. in terms of correlations between spatial patterns.This is crucial, as these correlations may indicate associations with common underlying risk factors and certain (usually unknown) connections between the different diseases.The joint analysis is carried out through multivariate spatial models that can cope with both spatial correlation within diseases and correlation between diseases.Not only can multivariate models account for correlation between diseases, but also improve estimates by borrowing information from nearby areas.There is a considerable amount of research on Bayesian multivariate spatial models for count data, most of the proposals relying on Markov chain Monte Carlo (MCMC) algorithms for estimation and inference.However, their use in practice is still limited due to a lack of "easy to use" implementations of the models in statistical packages and the computational burden of most of the proposals that preclude practitioners from exploiting their undoubted advantages over univariate counterparts.A comprehensive review of the subject can be found in the work of MacNab (2018) which discusses the three main lines in the construction of multivariate proposals based on Gaussian Markov random fields.Namely, a multivariate conditionals-based approach (Mardia, 1988), a univariate conditionals-based approach (Sain et al., 2011), and a coregionalization framework (Jin et al., 2007).Regarding the latter, Martinez-Beneito (2013) derives a general theoretical setting for multivariate areal models that covers many of the existing proposals in the literature.However, this procedure is unaffordable for a moderate to large number of diseases due to the high computational cost of the MCMC algorithms.Botella-Rocamora et al. (2015) reformulate the Martínez-Beneito framework and present the so called M-models as a simpler and more computationally efficient alternative.This approach makes it possible to increase the number of diseases in the model at the expense of the identifiability of certain parameters.Recently, Vicente et al. (2020b) consider the M-models-based approach to analyse in space and time different crimes against women in India.These authors estimate the M-models using integrated nested Laplace approximations (INLA) and numerical integration for Bayesian inference (see Rue et al., 2009) and implement the procedure using the 'rgeneric' construction in R-INLA (Lindgren and Rue, 2015).The result is a "ready-to-use" function for a wide audience with limited programming skills.Several alternatives to Gaussian Markov random fields have been also proposed in the disease mapping literature.A very attractive modelling approach is the use of splines to smooth risks (Goicoa et al., 2012).Research on multivariate spline models for fitting spatio-temporal count data is not so abundant and focuses on multivariate structures to deal with the spatial and temporal dependence for one response measured in several time periods (see for example Mac-Nab, 2016;Ugarte et al., 2010Ugarte et al., , 2017)).Very recently, Vicente et al. (2021) propose multivariate P-spline models to study the spatio-temporal evolution of four crimes against women.Unfortunately, inference for these multivariate proposals (and also for univariate approaches) become unfeasible when the number of areas is very large, and the scalability of the procedures is an issue.New directions in disease mapping points towards developing new methods for Bayesian inference when the number of small areas is very large (MacNab, 2022).Creating computationally efficient methods for large data sets is one of the greatest challenges in the field of univariate and multivariate spatial statistics.Several methods for massive geostatistical data (point-referenced) have been already proposed (see for example Cressie and Johannesson, 2008;Lindgren et al., 2011;Nychka et al., 2015;Katzfuss, 2017;Katzfuss and Guinness, 2021, among others).However, in the case of areal (lattice) count data, research on the scalability of statistical models is not so abundant.Recently, Orozco-Acosta et al. (2021, 2022) propose a scalable Bayesian modelling approach for univariate high-dimensional spatial and spatio-temporal disease mapping data.They propose to divide the spatial domain into D subregions where independent models can be fitted simultaneously.To avoid the border effect in the risk estimates, k -order neighbours are added to each subregion so that some areal units will have several risk estimates.Finally, a unique posterior distribution for these risks is obtained by either computing the mixture distribution of the estimated posterior probability density functions or by selecting the posterior marginal risk estimate corresponding to the original domain to which the area belongs.This proposal reduces computational time and, in contrast to fitting a single model to the whole domain, it allows different degree of spatial smoothness over the areas within the different subdomains.
The main objective of this paper is to present a new approach to fit order-free multivariate spatial disease mapping models in domains with a very large number of small areas avoiding high RAM/CPU usage, and making it accessible to users with limited computing facilities.In particular, we combine the Orozco-Acosta et al. (2021, 2022) "divide-and-conquer" approach with a modification of the Botella-Rocamora et al. (2015) M-models to avoid overparametrization.An additional interesting novelty of our proposal is that we are able to retrieve both the posterior distributions of the correlations between the spatial patterns of each disease in the whole spatial domain, as well as in each of the subdivisions.We have implemented the methodology in INLA to reduce computational burden through our R package bigDM (Adin et al., 2022), that also implements recent high-dimensional univariate proposals.The rest of the article has the following structure.Section 2 reviews the M-models to fit multivariate data.In Section 3 we present the new methodology to make the multivariate models scalable.In Section 4, we conduct a simulation study to compare the performance of this new modelling approach with a single multivariate spatial M-model fitted to the whole domain.Finally, in Section 5, we use the new proposal to jointly analyse lung, colorectal and stomach cancer male mortality in Spanish municipalities.The paper closes with a discussion.

M-models for multivariate disease mapping
Let us assume that the area of interest is divided into I contiguous small areas and data are available for J diseases.Let O ij and E ij denote the number of observed and expected cases respectively in the i-th small area (i = 1, . . ., I) and for the j-th disease (j = 1, . . ., J). Conditional on the relative risks R ij , the number of observed cases in the i-th area and the j-th disease is assumed to follow a Poisson distribution with mean Here E ij is computed using indirect standardization as where k is the age-group, n ijk is the population at risk in area i and age-group k for the j-th disease, and m jk is the overall mortality (or incidence) rate of the j-th disease in the total area of study for the k-th age group.The log-risk is modelled as where α j is a disease-specific intercept and θ ij is the spatial effect of the i-th area for the j-th disease.Following the work by Botella-Rocamora et al. (2015), we rearrange the spatial effects into the matrix Θ = {θ ij : i = 1, . . ., I; j = 1, . . ., J} to better comprehend the dependence structure.The main advantage of the multivariate modelling is that dependence between the spatial patterns of the different diseases can be included in the model, so that latent associations between diseases can help to discover potential risk factors related to the phenomena under study.These unknown connections can be crucial to a better understanding of complex diseases such as cancer.
The potential association between the spatial patterns of the different diseases are included in the model considering the decomposition of Θ as where Φ and M deal with dependency within and between diseases respectively.We refer to Equation ( 2) as the M-model.In the following, we briefly describe the two components of the M-model.The matrix Φ is a matrix of order I × K and it is composed of stochastically independent columns that are distributed following a spatially correlated distribution.Usually, as many spatial distributions as diseases are considered, that is, K = J, although J and K may be different (see Corpas-Burgos et al., 2019, for a discussion).To deal with spatial dependence, different spatial priors have been considered in the literature, most of them based on the well known intrinsic conditional autoregressive (iCAR) prior (Besag, 1974).Namely, the proper CAR (pCAR), a proper version of the iCAR; the Besag et al. (1991) prior (BYM), which combines iCAR and exchangeable random effects; the Leroux et al. (1999) prior (LCAR) that models spatially structured and spatially unstructured variability through a weighted sum of the iCAR precision matrix and the identity, or a modified version of the BYM model denoted as BYM2 (Dean et al., 2001;Riebler et al., 2016).In summary, the columns of Φ follow multivariate normal distributions with mean 0 and precision matrix Ω whose expression depends on the spatial prior.In this paper, we consider the iCAR prior for the columns of Φ, and hence the precision matrix is Ω iCAR = τ (D w − W) = τ Q, where W = (w il ) is the spatial binary adjacency matrix defined as w ii = 0, w il = 1 if the i-th and the l-th areas are neighbours (share a common border) and 0 otherwise, D w = diag(w 1+ , • • • , w I+ ), with the diagonal elements w i+ being the number of neighbours of the i-th area, and τ is the precision parameter.Note that Q is the usual spatial neighbourhood matrix.
On the other hand, M is a K × J nonsingular but arbitrary matrix and it is responsible for inducing dependence between the different columns of Θ, i.e, for inducing correlation between the spatial patterns of the diseases.In Equation (2), the cells of M act as coefficients, so they can be considered as coefficients of the log-relative risks on the underlying patterns captured in Φ and treated as fixed effects with a normal prior distribution with mean 0 and a large (and fixed) variance.Note that, assigning N (0, σ) priors to the cells of M is equivalent to assigning a Wishart prior to M M, i.e., M M ∼ W ishart(J, σ 2 I J ).The multivariate approach allows the estimation of the correlation between the spatial patterns of the diseases, an interesting and useful feature, as a high positive correlation would support the hypotheses of common risk factors, and hence connections between diseases.The covariance matrix between the spatial patterns is obtained as M M. For further details see Botella-Rocamora et al. (2015).
For notation purposes and to incorporate the dependencies between different diseases in the model, we introduce the vec(•) operator.Let A = (A 1 , . . ., A J ) be an I × J matrix with I × 1 columns A j , for j = 1, . . ., J. The vec(•) operator transforms A into an IJ × 1 vector by stacking the columns one under the other, that is, vec(A) = (A 1 , . . ., A J ) .Using this notation, the multivariate Model (1) can be expressed in matrix form as where α = (α 1 , . . ., α J ) , R = (R 1 , . . ., R J ) , R j = (R 1j , . . ., R Ij ) , j = 1, . . ., J, and I J and 1 I are the J ×J identity matrix and a column vector of ones of length I respectively.Then, once the between-diseases dependencies are incorporated into the model, the resulting prior distributions for vec (Θ) with Gaussian kernel has a precision matrix given by Recall that this precision matrix accounts for both within and between-disease dependencies: the Ω 1 , . . ., Ω J matrices control the within-diseases spatial variability and the matrix M captures the between-diseases variability.Note that if Ω 1 = . . .= Ω J = Ω w , the covariance structure is separable and can be expressed as w are the between-and within-disease covariance matrices, respectively.Note that in our case iCAR .This M-model based framework includes both separable and non-separable covariance structures, and can accommodate different spatial dependency structures with different withindisease covariance matrices.

Model fitting, identifiability issues and prior distributions
Traditionally, MCMC techniques have been used for Bayesian model fitting and inference.However, despite the advances in research, it is widely acknowledged that MCMC techniques can be computationally very demanding.The INLA approach (see Rue et al., 2009) has turned out to be very popular in recent years.It is designed for latent Gaussian fields and is based on integrated nested Laplace approximations and numerical integration.Many models used in practice are implemented in R-INLA (Lindgren and Rue, 2015), and others can be implemented by means of generic functions with some extra-programming work.The M-model based approach is not directly available in R-INLA, but it can be implemented using the 'rgeneric' construct (see for example Vicente et al., 2020b).In this paper, we use INLA for model fitting and inference.
Spatial models usually present identifiability issues which are generally overcome using sumto-zero constraints on the spatial random effects (see Eberly and Carlin, 2000;Goicoa et al., 2018, for details).In the multivariate setting, these constraints are considered for all the diseases in the model.Additionally, the M-models bring about new identifiability issues.As pointed out by Botella-Rocamora et al. (2015), any orthogonal transformation of the columns of Φ and of the rows of M in Equation (2), causes an alternative decomposition of Θ, and therefore neither Φ nor M are identifiable and inference on them should be precluded.However, Θ and the covariance matrix M M are perfectly identifiable, so inference is confined to those quantities.It is worth noting that the decomposition of the between-diseases covariance matrix as Ω −1 b = M M avoids dependence on the order in which the diseases are introduced into the model, but it leads to an overparameterization problem.In the M-model proposal, J × J parameters are used to estimate the covariance matrix even though only J × (J + 1)/2 parameters are required.In their paper, Botella-Rocamora et al. (2015) put independent normal priors N (0, σ 2 ) on each entry of the matrix M and they show that this is equivalent to assigning a Wishart prior to the covariance matrix, i.e., M M ∼ W ishart(J, σ 2 I J ).To avoid the overparameterization of the covariance matrix we propose to use the Barlett decomposition (see, for example, Peña and Irie, 2022).In more detail, if where L is the Cholesky factor of V, and whose diagonal elements are independently distributed as χ 2 random variables and the offdiagonal elements are independently distributed as normal random variables.More precisely, c 2 j ∼ χ 2 υ−j+1 and n jl ∼ N(0, 1) for j, l = 1, . . ., J with j > l.Using this decomposition, only J × (J + 1)/2 hyperparameters (cells of A) are needed to estimate the covariance matrix Ω −1 b .Note that if V = I J , then L = I J .Finally, to avoid order dependence with the diseases, we introduced M into Equation (4) as the eigen-decomposition of Ω −1 b .Chung et al. (2015) consider a family of Wishart densities for the prior of the covariance matrix and recommend the use of υ = J + 2 degrees of freedom to make the prior a little bit more informative.In this work we follow this recommendation.Details on how to implement this in R-INLA can be found in Appendix A.

Scalable Bayesian models for high-dimensional multivariate diseases mapping
The M-model approach can be computationally intensive or even unfeasible when the number of areas (I) is very large.This limitation highlights the need for new methods.Here, we propose to use a divide and conquer strategy partitioning the spatial domain (D) into D subregions, so that local multivariate spatial models can be simultaneously fitted in the different subregions.In each subregion, we consider the prior distribution with Gaussian kernel and precision matrix given in Equation ( 4) to deal with within-disease spatial variation and between-disease correlations.

Disjoint models
A natural way to think of partitions is to consider subregions based on administrative subdivisions of the area of interest, for example provinces, states or counties.Once we have a partition of the spatial domain D, each geographic unit must belong to a single subregion, i.e.
Then, the log-risks of the models in each subregion where for each subregion d, J ) and α , and each R . ., J} is the matrix of spatial effects in partition d including both within and between-disease dependence structure.In more detail, this model can be expressed as d)  . . .
where the precision matrix of the multivariate normal random effect vector vecΘ (1) , . . ., vecΘ (D)  is a block-diagonal matrix of dimension IJ × IJ whose blocks correspond to the precision matrices Ω vec Θ (d) , d = 1, . . ., D. Having considered a partition of the spatial domain D, the full domain log-risk is just the union of the posterior estimates of each subregion, i.e., log

Models with overlapping partitions
Disjoint partitions, such as the one considered in the previous subsection, might suffer from border effects as areas in the boundary of a given partition would not borrow information from neigbouring areas from a contiguous subdivision.Consequently, the risk estimates in those areas may not be correct.This inconvenience can be solved by considering an alternative modelling approach in which k-order neighbours are added to each subregion of the partition, so that border areas have neighbours from other subregion of the partition.In this case, the entire spatial region D is divided into a set of overlapping subregions and some small areas will belong to more than one of such subdivisions, i.e., D = ∪ D d=1 D d and D i ∩D j = ∅ for neighbouring subregions.Similar to the disjoint Model (6), D submodels will be simultaneously fitted.However, as D d=1 I d > I, the final risk R = (R 1 , . . ., R J ) with R j = (R 1j , . . ., R Ij ) , j = 1, . . ., J, is no longer the union of the posterior estimates obtained for each submodel as areas located in the borders of the spatial partition would have more than one estimated posterior distribution.
Two different strategies can be considered to obtain a unique posterior estimate of the relative risk for those areas in more than one subregion.Orozco-Acosta et al. (2021) propose to calculate the mixture distribution of the estimated posterior probability density functions of the relative risks in the different subdivisions, with weights proportional to the conditional predictive ordinate (CPO) values (Pettit, 1990).To compute the mixture, suppose that area i belongs to m(i) subregions of the spatial domain D and let f be the posterior estimates of the probability density functions of the j-th disease in the i-th area.Then the mixture distribution of R ij can be written as where CP O k ij is the conditional predictive ordinate of area i and disease j obtained in partition k, so that w k ≥ 0 and m(i) k=1 w k = 1 (see for example Lindsay, 1995;Frühwirth-Schnatter, 2006).More recently, Orozco-Acosta et al. ( 2022) consider using the posterior marginal distribution of the relative risk estimated from its original partition.Based on the results obtained from a simulation study, they show that this strategy outperforms the use of mixture distributions in terms of risk estimation accuracy and true positive/negative rates.In this paper, this is also the default strategy used to obtain unique posterior distributions for each relative risk R ij .

Between-disease correlations and variance parameters
In addition to enlarge the effective sample size and improving smoothing by borrowing information from different diseases, one of the main advantages of multivariate disease mapping models is that they take into account correlations between the spatial patterns of the different diseases, that is, they reveal connections between diseases.Fitting a single multivariate model to the region of interest provides correlations between the diseases in the whole study domain thus revealing overall relationships.In addition, it also provides the diagonal elements of the between-disease covariance matrix, hereafter referred to as variance parameters.In the case of separable covariance structures (the kronecker product of between and within disease covariance matrices) these parameters control the amount of smoothing within diseases.By dividing the spatial domain into subregions, we obtain the posterior distributions of these parameters in each of the subdivisions.In addition, we are able to retrieve the between disease correlations and variances for the whole region.Hence, partition models provide extra information as they give insight about local connections between the diseases in the subdivisions (which in general are administrative divisions) and the global connection in the whole study region.
To obtain global estimates of the parameters of interest in the overall study domain from the partition models, we adapt the consensus Monte Carlo (CMC) algorithm originally proposed by Scott et al. (2016).The idea behind consensus Monte Carlo is to divide the data into shards (in our case, the shards corresponds to different subdivisions of the spatial domain), give each shard to a worker machine which does a full Monte Carlo simulation from a posterior distribution given its own data, and then combine the posterior simulations from each worker (or submodel) to produce a set of global draws representing the consensus belief among all the workers.Here, we briefly describe how to adapt the ideas behind the CMC algorithm to our case.
Let ψ = (ρ, σ 2 ) denotes the vector with the parameters of interest where ρ = (ρ 12 , . . ., ρ J−1,J ) contains the between-disease correlations and σ 2 = (σ 2 1 , . . ., σ 2 J ) are the diagonal elements of the between-disease covariance matrix, and let ψ kd denote the local estimate of the k-th parameter of ψ in each subdomain D d , d = 1, . . ., D. We first extract samples of size S from the posterior marginal estimates of ψ kd denoted as ψ s kd for k = 1, . . ., J × (J + 1)/2, d = 1, . . ., D and s = 1, . . ., S.Then, we combine the draws using weighted averages where w d are normalized weights inversely proportional to the posterior marginal variances of ψ kd .Finally, we approximate the posterior marginal density function of the parameter ψ k from the combined draws ψs k .

Model selection criteria
Two of the most widely used criteria to compare Bayesian models are the deviance information criterion (DIC) (Spiegelhalter et al., 2002) and the Watanabe-Akaike information criterion (WAIC) (Watanabe, 2010).However, with partition models, it is not straightforward to get these quantities as we fit as many models as subdivisions.Hence, a procedure to estimate these quantities from the scalable models described in Sections 3.

Simulation study
We conduct a simulation study to compare the performance of the different M-models described in Section 2. Specifically, our interest relies on comparing the fit of a single model to the whole domain (hereafter referred to as the global model) and the partition models, in terms of parameter estimates and relative risk estimation accuracy.The I = 7907 municipalities of continental Spain and J = 3 diseases are used as the simulation template, as this imitates the case study presented in Section 5. Two different scenarios have been considered to recover the possible underlying generating process of spatially correlated disease risks.In the first scenario, samples are generated from a fixed covariance structure based on the spatial neighbourhood graph of the whole area under study, that is, the global model is used as the generating model.In contrast, in the second scenario, independent samples for each partition (Spanish Autonomous Regions, see Figure B.1 in Appendix B) are generated using the covariance structures of the partition, that is, the Disjoint model is used as the data generating mechanism.Further details are given below.

Data generation
As one of the main advantages of the joint modelling of several responses is to analyze the relationships between different diseases in terms of correlations between spatial patterns, we are interested in evaluating how well these parameters are estimated when using the multivariate spatial models described in this paper.To do this, we start by sampling from a multivariate normal distribution with precision matrix equal to Ω vec(Θ) = Ω b ⊗Ω iCAR , by fixing the elements of the between-disease covariance matrix as where σ 2 j are variance parameters, and ρ kl = ρ lk are between-disease correlation coefficients.Note that σ kl denotes the covariances between each pair of diseases.Then, for each sample of Θ r , r = 1, . . ., 100 we compute the relative risks R r ij following Equation (3).Finally, we generate O ij counts for area i and disease j using a Poisson distribution with mean µ , where E ij are the expected number of cases of our case study data (lung, colorectal and stomach cancer mortality in Spanish males).
In Scenario 1, the neighbourhood graph of all the 7907 municipalities is used to define the spatial precision matrix Ω iCAR .In addition, we fix the parameters of the between-disease covariance matrix as σ 2 1 = 0.25, σ 2 2 = 0.16, σ 2 3 = 0.09, ρ 12 = 0.7, ρ 13 = 0.5 and ρ 23 = 0.1.In Scenario 2, D = 15 independent samples are generated from multivariate Normal distributions with precision matrices equal to Ω vec(Θ d ) = Ω are considered en each subdivision.Here, the variance parameters are fixed to σ 2 1 = 0.5, σ 2 2 = 0.4 and σ 2 3 = 0.3, while similar values to the ones estimated with the partition models in the case study presented in the next section are used as correlation coefficients (see Table B.1 in Appendix B).Note that we increase the variance parameters in Scenario 2 to get stronger smoothing effects in each subdivision.

Results: Scenario 1
Table 1 compares the true values of model parameters in Scenario 1 (variance parameters and correlation coefficients) against average values of posterior mean estimates over the 100 simulated data sets.In addition, estimated standard errors, simulated standard errors (derived from the sample variance of the parameter estimates) and empirical coverages of the 95% credible intervals are also displayed.Note that for the partition models, these posterior marginal distributions are obtained by using the CMC algorithm described in Section 3.3.In term of model parameters, multivariatie models give very accurate estimates of the real values, both in terms of posterior mean and posterior standard deviation estimates (note that nearly identical values are obtained from estimated and simulated standard errors).As expected, slightly better results are obtained when fitting the global model, as this is the true generating model in Scenario 1. Regarding partition models, the higher the neighbourhood order, the more similar the CMC estimates of the correlation coefficients are to those of the global model.
Table 2 displays average values of model selection criteria (posterior mean deviance D(θ), effective number of parameters p D , DIC and WAIC) for the global and the scalable models, as well as the accuracy of the relative risk estimates quantified by the mean absolute relative bias (MARB), the mean relative root mean square errors (MRRMSE) and empirical coverages of the 95% credible intervals for the risks.Note that the MARB and MMRMSE are defined for each small area i and disease j as ij and R(r) ij denote the true value and the posterior median estimate of the relative risks for the s-th data set (s = 1, . . ., 100).Model selection criteria point towards partition models, though differences are mild.Regarding MARB, MMRMSE and 95% coverage values, differences between the global and the partition models are practically negligible.

Results: Scenario 2
In contrast to the previous scenario, it should be noted that in Scenario 2 we cannot compare the global estimates of the model parameters against the true values of the variance parameters and between-disease correlations, since different values have been used to generate the risk surfaces in each subdomain.However, we can compare the model's performance in terms of model selection criteria and risk estimation accuracy (see Table 3).As expected, the Disjoint model is the one showing the best performance according to these measures, as this is the true generating model assumed for Scenario 2. Although slightly worse MARB and MRRMSE values are obtained for 1st/2nd-order neighbourhood models, the partition models still outperform the Global model.
We are also interested in analyzing if the partition models are able to recover the local between-disease covariance structures of the true generating process.In Table B.1 (Appendix B) we compare these values against the average values of posterior mean estimates of local parameters in each subdivision over the 100 simulated data sets for the Disjoint model.For almost every subdivision, very accurate estimates are obtained for both variance parameters and correlation coefficients.For the latter, the median value of the empirical coverage of the 95% credible intervals is 0.95 (with Q 1 = 0.93 and Q 3 = 0.97).As expected, these estimates get worse as the neighbourhood order of the models increases, since the estimated local correlation structures are affected by the ones of the adjacent subdivisions.Even so, the median values of the empirical coverage of the 95% credible intervals for the between-disease correlations are 0.89 (with Q 1 = 0.84 and Q 3 = 0.92) and 0.86 (with Q 1 = 0.79 and Q 3 = 0.90) for 1st-order and 2nd-order neighbourhood models, respectively.All the results are shown in

Case study
In this section we jointly analyse mortality data for lung, colorectal, and stomach cancer in men in the 7907 municipalities of mainland Spain (excluding Baleares and Canary Islands and the autonomous cities of Ceuta and Melilla) during the period 2006-2015 using the new proposal.
During the ten years of the study, a total of 162,602 deaths from lung cancer (corresponding to codes C33-C34 of the International Classification of Diseases-10), 82,967 from colorectal cancer (C17-C21) and 33,170 from stomach cancer (C16) were registered for male population of mainland Spain, which correspond to global rates of 76.48, 39.02 and 15.60 deaths per 100,000 male inhabitants, respectively.

Model fitting and model selection
We fit the disjoint model (k = 0) and the k-order neighbourhood model for k = 1, 2, 3 in R-INLA using D = 15 subdivisions of the spatial domain.These subdivisions are also of interest as they correspond to Autonomous Regions of Spain (NUTS2 level from the European nomenclature of territorial units for statistics, shown in Figure B.1 in Appendix B).In these partitions, the highest value of I d (number of municipalities) is 2245 and corresponds to the Autonomous Region of Castilla y León, a rather vast territory from central to northwestern Spain with about 5% of the total Spanish population.Although this sub-region is large, we maintain this subdivision as it represents the administrative division of Spain into Autonomous Regions.We also fit the multivariate spatial M-models over the entire spatial domain (global model), and compare the results with those obtained with the new proposal.Previously, univariate models were also fitted to each disease using a BYM2 spatial prior.The covariance matrix of this prior cope with both spatial structured variability and unstructured variability.Results (not shown here to conserve space) show that most of the variability is spatially structured.Since the computational cost of this prior makes it difficult its use in a multivariate setting, and most of the variability is spatially structured, we fit the joint multivariate proposal given in Equation ( 6) by considering an iCAR prior for the spatial random effects.
For the partition models, we distribute the submodels over 2 machines with four proces-sors Intel Xeon Silver 4108 and 192GB RAM on each machine (Ubuntu 20.04.4 LTS operative system), using the simplified Laplace approximation strategy in R-INLA (Lindgren and Rue, 2015) (stable version INLA 22.05.07,R version R-4.1.2) and simultaneously running 3 models in parallel on each machine using the bigDM package (Adin et al., 2022).4 displays the posterior mean deviance D(θ), the effective number of parameters p D , the DIC, and the WAIC for the global and the scalable models together with the computing time.The total time for the scalable models is obtained by adding the running time and the merging time.The running time refers to the elapsed time for all the submodels fitted with R-INLA, and the merging time refers to the combination (when necessary) of the posterior distributions of the risks, the approximation of the DIC/WAIC values, and the computation of global estimates of the between-diseases correlation coefficients using the proposed CMC algorithm.As expected, the computational cost raises as the neighbourhood order (k) increases, though the scalable proposal is faster than the global model for all values of k.The greatest reduction in time in comparison with the global model is obtained for k = 0, being the global model about 5.5 times slower.When the neighbourhood order increases, the difference in computing time is less pronounced.The global model is about 4.3, 3.8, and 3.6 times slower than the scalable models with k =1, 2, and 3, respectively.Regarding model selection criteria, scalable Bayesian models outperform the global model.The greater reduction in DIC and WAIC is obtained for the 1st-order neighbourhood model.However, increasing the neighbourhood order may improve the between-disease correlation estimates.

Joint analysis of male mortality from three types of cancer in Spain
In this subsection, the spatial patterns of lung, colorectal, and stomach cancer mortality risks in men are examined in the municipalities of continental Spain using the scalable multivariate proposal presented in Section 3.
We begin with a comparison of the estimated risks obtained with the global model, the disjoint model (k = 0) and the k-order neighbourhood models (k = 1, 2 and 3). Figure 1 displays dispersion plots of the posterior median estimates of the relative risks obtained with the partitioned models versus those obtained with the global model.The left, central and right columns correspond to lung, colorectal and stomach cancer, respectively.The neighbourhood order in the partition models are represented in the different rows.The largest differences are observed between the global and the disjoint model.This is expected because areas in the border of a subdivision do not borrow strength from neighbouring areas located in a contiguous subdivision.As the neighbourhood order k increases the risk estimates are more similar to the global model.Figure 2 displays the spatial patterns of lung cancer mortality risks (top) and the posterior probabilities of risk exceedance (bottom), P (R ij > 1|O) , obtained with the global and the disjoint models.To save space, maps for colorectal and stomach cancer are provided in Figure B.2 and Figure B.3 (Appendix B).Though differences in risks estimates are observed in the dispersion plots, it is harder to appreciate them on the maps.
Multivariate models borrow information from nearby areas and the different diseases subject to analysis.In addition to this strength, multivariate models present additional advantages over univariate counterparts, such as the possibility of estimating correlations between the spatial patterns of the diseases.Moderate to high correlations may suggest the existence of underlying risk factors affecting the diseases under study, which in turns implies connection between them.This information may be crucial to better understand diseases such as cancer in which known risk factors only explain a small percentage of the cases.Spatial patterns may be associated to factors like access to treatment or life style that might have an impact on mortality.
Posterior distributions of the between-disease correlations obtained in the different partitions with k = 0, 1 and 2 are displayed in Figure 3 together with correlations for whole Spain obtained with the CMC algorithm and with the global model.Here, ρ 1.2 , ρ 1.3 , and ρ 2.3 denote the correlation parameters between lung and colorectal, lung and stomach, and colorectal and stomach cancer, respectively.Summary statistics (mean, median, mode, standard deviation, 2.5 and 97.5 percentiles) of the between-disease posterior correlations are also shown in Table 5.In general, the posterior distributions estimated with the CMC algorithm for the partition models are very similar to those obtained with the global model.Similar to the posterior estimates of the relative risks, closer values to the global model are observed as the neighbourhood order k increases.
Finally, Figure 4 displays a map with the posterior medians and standard deviations of the between-diseases correlations ρ 1,2 (left), ρ 1,3 (center), and ρ 2,3 (right), for the different subdivisions (Autonomous Regions) obtained with the 1st-order neighbourhood partition model.Not only does the partition model provide correlation for the complete spatial domain (whole Spain), but it also gives correlation for the different subdivisions.This is an advantage over the global models as we add information at different administrative divisions.Moreover, the variability in the posterior medians of the correlations across the subdivisions may indicate a lack of stationarity that the global model cannot cope with, and hence the advantages of the partition models.

Discussion
Spatial areal models have a long tradition in epidemiology to study the geographical pattern of a disease.While initially focused on modelling a single disease, spatial models have evolved into a multivariate framework with two notable objectives: to improve estimates by borrowing strength from other diseases, in addition to borrowing information from neighbouring areas, and to estimate latent correlations between the spatial patterns of the diseases under study to address the connections between them and to hypothesize common risk factors.Research on spatial multivariate models has received considerable attention in recent years, although their use is not yet widespread in epidemiology mainly because (i) the implementation of multivariate models in available software requires advanced computing skills and (ii) computational issues are accentuated when the number of small areas is large as computing time may become prohibitive.Vicente et al. (2020b) and Vicente et al. (2021) provide an implementation of multivariate CAR and P-splines in R-INLA that can be used by a wide audience without advanced computer skills.
In this paper, we present a new approach to analyse multivariate areal count data when the number of small areas is very large.In particular, we combine the methodology proposed by Figure 3: Posterior distributions of the estimated between-disease correlations with the global, and k = 0, 1, 2-order neighbourhood models, using an iCAR prior for spatial random effects.
Orozco-Acosta et al. ( 2021) for high-dimensional disease mapping with a modification of the multivariate approach given by Botella-Rocamora et al. (2015) to avoid overparametrization, obtaining a scalable Bayesian modelling approach to multivariate disease mapping.Our proposal begins with the partition of the spatial domain into subregions with a reduced number of small areas, so that spatial multivariate models can be fitted simultaneously (using both parallel or distributed computation strategies) in each of these regions, reducing computational time and avoiding memory and storage problems.Dividing the whole spatial domain into disjoint regions may induce border effects as the areas in the limits of a given subdivision do not borrow information from neighbouring areas located in a different subregion.To overcome this issue, Table 5: Descriptive statistics of the estimated between-disease correlations with the global, and k = 0, 1, 2-order neighbourhood models, using an iCAR prior for spatial random effects.
ρ Model mean sd q .025q .5 q .975mode  Figure 4: Maps of posterior medians of between-disease correlations and standard deviation (in brackets) for the different subdivisions obtained with the 1st-order neighbourhood partition model.Correlations between lung and colorectal cancer are displayed on the left (ρ 1,2 ), the central map displays the correlations between lung and stomach cancer (ρ 1,3 ), and the map on the right displays the correlation between colorectal and stomach cancer (ρ 2,3 ).
we consider k-order neighbourhood models that incorporate neighbouring areas to those regions located on the partition boundary.Finally, variance parameters and between-disease correlations for the whole area are obtained by means of an adaptation of a consensus Monte Carlo algorithm.
The correlation coefficients indicate potential geographic factors related (or not) to the different diseases.If the covariance structure is separable, the variance parameters measure the amount of smoothing for each disease.
In addition to the CMC algorithm, we have also considered the Weierstrass rejection sampler (WRS) proposed by Wang and Dunson (2014) to recover the parameters of interest for the whole study region (results not shown to save space).In this algorithm, the posterior of the target distribution in the whole area is approximated by combining posterior samples of the subdivisions using rejection sampling.Though it was originally proposed to combine posterior draws from independent MCMC subset chains, it can be adapted to other Bayesian estimation techniques such as INLA through the R package weierstrass (available at https://github.com/wwrechard/weierstrass).In general, very similar posterior marginal estimates are obtained with both algorithms.
One of the key issues with partition models is to choose the neighbourhood order.Here we use model selection criteria such as DIC and WAIC.Our conclusions are that, in general, the larger the neighbourhood order, the more similar the partition model is to the global model.However, increasing too much the neighbourhood order, the benefits of our proposal in terms of computational time vanish.Overall, first or second order neighbourhood models are appropriate.From the simulation study, we conclude that even when the underlying generating process is the Global model, the partition models are very competitive in terms of risk estimation accuracy.Moreover, the global between-disease correlation coefficients are well recovered with the partition models.If the geographical distribution and correlation structure of the underlying process varies across the whole map (which seems very realistic in practice), better results are obtained with our modelling proposals than with the usual global model.In conclusion, we could argue that partition models have several advantages over a global model.First, they speed up computations and alleviate memory and storage problems.Second, we kill two birds with one stone, as we can provide a global spatial pattern for the whole region and local patterns for the subdivisions, which in our case are of great interest.
In our case study, we use an administrative division of the municipalities of continental Spain corresponding to D = 15 Autonomous Regions.This partition is a natural choice as Autonomous Regions in Spain are responsible for developing and implementing health policies, and life style may change from region to region.Having estimates using these subdivision may discover associations between diseases that might be associated to specific plans in those regions, the different life styles or other geographical factors having a local influence.This could explain the differences observed in the between-disease correlations in the different subdivisions.On the other hand, this partition may have some inconveniences.For example, the Region of Castilla y León has 2245 municipalities, still a large number.To overcome this problem, we have also fitted the partition model using a finer subdivision based on 47 provinces rather than on Autonomous Regions.In general results are similar, though the global between-disease correlations are better recovered with the partition based on Autonomous Regions.
The M-models for multivariate disease mapping described in this paper are implemented in the R package bigDM, which also includes several scalable spatial and spatio-temporal Poisson mixed models for high-dimensional areal count data in a fully Bayesian setting using the integrated nested Laplace approximation (INLA) technique.The package also contains a vignette to replicate the data analysis described in Section 5 using simulated cancer mortality data for the Spanish municipalities, in order to preserve the confidentiality of the original data.

A Appendix
In this Appendix we briefly explain how to implement the Bartlett decomposition in R-INLA.This requires that the hyperparameters have support on R. So, we will reparameterise the elements c j described in Equation ( 5) as θ j = log(c j ), j = 1, . . ., J, and the log priors for c j are given as the corresponding log priors for θ j , ∀j = 1, . . ., J.
For each c 2 j , ∀j = 1, . . ., J, we assign a chi-square distribution with J + 2 − j + 1 degrees of freedom, so the log prior for θ j is log π( where f j (•) is the probability density function (pdf) of c 2 j .This expression is obtained as follows.

Code
The R-INLA code to assign log prior distributions to the hyperparameters of the M-models (elements of the A matrix) can be checked in the Mmodel icar() function of the bigDM package.

B Appendix
In this Appendix we include additional tables and figures regarding the simulation study (Section 4) and the results of the joint analysis of mortality data for lung, colorectal and stomach cancer (case study of Section 5).
d) Ij ) is the vector of relative risks corresponding to disease j within the subregion d.Finally, 1 I d is a column vector of ones of length I d (the number of areas within partition d), I = D d=1 I d , and 1 and 3.2 is required.Extending the ideas in Orozco-Acosta et al. (2021) to the multivariate framework, we compute approximate DIC values by drawing samples from the posterior marginal distribution of the Poisson means.Denoting by C s , s = 1, ..., S, to the posterior simulations of µ ij = E ij • R ij (the mean of the Poisson distribution), approximate values of the mean deviance (D(C)) and the deviance of the mean (D(C)) can be respectively calculated as D(C) = 1 S S s=1 − log (p(O|C s )) ; D(C) = −2 log p(O|C) , with C = 1 S S s=1 C s .where p(O|•) denotes the likelihood function of a Poisson distribution.Then, the DIC is obtained as DIC = 2 D(C) − D(C).Similary, approximate WAIC values are computed as (see Gelman et al.(p(O ij |C s ))] .
iCAR is the spatial precision matrix of the areas within subdomain d = 1, . . ., D, and different between-disease covariance matrices Ω (d) b

Figure 1 :
Figure 1: Dispersion plots of the posterior median estimates of relative risks for lung (left column), colorectal (central column) and stomach (right column) cancer mortality data obtained with the partitioned model (k = 0, 1, 2, 3 from top to bottom) versus the global model.

Figure 2 :
Figure 2: Maps of posterior median estimates of mortality relative risk for lung cancer (top) and posterior exceedance probabilities P (R ij > 1|O) (bottom) in continental Spain.
Figure B.1 displays the map of the administrative division of Spain into Autonomous Regions.

Figure
Figure B.1: Map of the administrative division of Spain into Autonomous Regions.

Figure B. 2 :
Figure B.2: Maps of posterior median estimates of mortality relative risk for colorectal cancer (top) and posterior exceedance probabilities P (R ij > 1|O) (bottom) in continental Spain.

Figure B. 3 :
Figure B.3: Maps of posterior median estimates of mortality relative risk for stomach cancer (top) and posterior exceedance probabilities P (R ij > 1|O) (bottom) in continental Spain.

Table 1 :
Table B.2 and Table B.3 in Appendix B. Average values of posterior mean, posterior standard deviation (SD), simulated standard errors (sim) and empirical coverage of the 95% credible intervals (EC) for model parameters based on 100 simulated data sets for Scenario 1.

Table 2 :
Average values of model selection criteria (mean deviance, effective number of parameters, DIC and WAIC) and risk estimation accuracy (MARB, MRRMSE and empirical coverage -EC-of the 95% credible intervals) based on 100 simulated data sets for Scenario 1.

Table 3 :
Average values of model selection criteria (mean deviance, effective number of parameters, DIC and WAIC) and risk estimation accuracy (MARB, MRRMSE and empirical coverage -EC-of the 95% credible intervals) based on 100 simulated data sets for Scenario 2.

Table 4 :
Model selection criteria and computational time, in minutes, for multivariate models with iCAR spatial prior using the simplified Laplace approximation strategy if INLA.
Table B.1, Table B.2 and Table B.3 compares the true values of model parameters (local correlation coefficients in each subdivision) against average values of posterior mean estimates over the 100 simulated data sets for Scenario 2.

Table B .
1: Disjoint model.Average values of posterior mean, posterior standard deviation (SD), simulated standard errors (sim) and empirical coverage of the 95% credible intervals (Cov) for local estimates model parameters based on 100 simulated data sets for Scenario 2.

Table B .
2: 1st-order neighbourhood model.Average values of posterior mean, posterior standard deviation (SD), simulated standard errors (sim) and empirical coverage of the 95% credible intervals (Cov) for local estimates model parameters based on 100 simulated data sets for Scenario 2. Table B.3: 2nd-order neighbourhood model.Average values of posterior mean, posterior standard deviation (SD), simulated standard errors (sim) and empirical coverage of the 95% credible intervals (Cov) for local estimates model parameters based on 100 simulated data sets for Scenario 2.