1 Introduction

Research on methodology for the spatial (and spatio-temporal) analysis of areal count data has grown tremendously in the last years, and statistical models have proven an essential tool for studying the geographic distribution of data in small areas. The main objective of these techniques is to smooth standardized mortality (incidence) ratios or crude rates to discover geographic patterns of the phenomenon under study. These models and methods have been mainly applied in epidemiology to analyse the incidence and mortality of chronic diseases such as cancer, but some recent research has demonstrated their applicability to the spatial and spatio-temporal analysis of crimes (see for example Li et al. 2014), and in particular crimes against women (see for example Vicente et al. 2018, 2020a). Although research on single disease analysis has been very fruitful and abundant since the seminal work of Besag et al. (1991), joint modelling of multiple responses provides several advantages. Firstly, it improves smoothing by borrowing strength between diseases. Secondly, and perhaps more importantly, it allows to establish relationships between different diseases, such as similar or completely different geographical distributions, i.e., correlations between spatial patterns. This is crucial, as these correlations may indicate associations with common underlying risk factors and certain (usually unknown) connections between the different diseases. The joint analysis employs multivariate spatial models that can handle both the spatial correlation within diseases and the correlation between diseases.

There is a considerable amount of research on Bayesian multivariate spatial models for count data, most of the proposals relying on Markov chain Monte Carlo (MCMC) algorithms for estimation and inference. However, their use in practice is still limited due to a lack of “ easy-to-use” implementations of the models in statistical packages and the computational burden of most of the proposals that preclude practitioners from exploiting their advantages over univariate counterparts. According to MacNab (2010), there are two approaches to multivariate modelling in disease mapping. The first one uses shared-component models (see Knorr-Held and Best 2001; Held et al. 2005), where pair-wise dependence between diseases is not a testable hypothesis, but it is assumed. In the second one, pair-wise correlation between diseases is a testable assumption and the interest is in estimating such correlations. Hereafter in the paper, multivariate models refer to this second approach. A comprehensive review of the subject can be found in the work of MacNab (2018) which discusses the three main lines in the construction of multivariate proposals based on Gaussian Markov random fields. Namely, a multivariate conditionals-based approach (Mardia 1988), a univariate conditionals-based approach (Sain et al. 2011), and a coregionalization framework (Jin et al. 2007). Regarding the latter, Martinez-Beneito (2013) derives a general theoretical setting for multivariate areal models that covers many of the existing proposals in the literature. However, this procedure is unaffordable for a moderate to large number of diseases due to the high computational cost of the MCMC algorithms. Botella-Rocamora et al. (2015) reformulate the Martínez-Beneito framework and present the so called M-models as a simpler and more computationally efficient alternative. This approach makes it possible to increase the number of diseases in the model at the expense of the identifiability of certain parameters. Recently, Vicente et al. (2020b) consider the M-models-based approach to analyse in space and time different crimes against women in India. These authors estimate the M-models using integrated nested Laplace approximations (INLA) and numerical integration for Bayesian inference (see Rue et al. 2009) and implement the procedure using the ’rgeneric’ construction in R-INLA (Lindgren and Rue 2015). The result is a “ ready-to-use” function for a wide audience with limited programming skills.

Several alternatives to Gaussian Markov random fields have been also proposed in the disease mapping literature. A very attractive modelling approach is the use of splines to smooth risks (Goicoa et al. 2012). Research on multivariate spline models for fitting spatio-temporal count data is not so abundant and focuses on multivariate structures to deal with the spatial and temporal dependence for one response measured in several time periods (see for example MacNab 2016; Ugarte et al. 2010, 2017). Very recently, Vicente et al. (2021) propose multivariate P-spline models to study the spatio-temporal evolution of four crimes against women. Unfortunately, inference for these multivariate proposals (and also for univariate approaches) become unfeasible when the number of areas is very large, and the scalability of the procedures is an issue.

New directions in disease mapping points towards developing new methods for Bayesian inference when the number of small areas is very large (MacNab 2022). Creating computationally efficient methods for large data sets is one of the greatest challenges in the field of univariate and multivariate spatial statistics. Several methods for massive geostatistical data (point-referenced) have been already proposed (see for example Cressie and Johannesson 2008; Lindgren et al. 2011; Nychka et al. 2015; Katzfuss 2017; Katzfuss and Guinness 2021, among others). However, in the case of areal (lattice) count data, research on the scalability of statistical models is not so abundant. Recently, Orozco-Acosta et al. (2021, 2023) propose a scalable Bayesian modelling approach for univariate high-dimensional spatial and spatio-temporal disease mapping data. They propose to divide the spatial domain into D subregions where independent models can be fitted simultaneously. To avoid the border effect in the risk estimates, k-order neighbours are added to each subregion so that some areal units will have several risk estimates. Finally, a unique posterior distribution for these risks is obtained by either computing the mixture distribution of the estimated posterior probability density functions or by selecting the posterior marginal risk estimate corresponding to the original domain to which the area belongs. This proposal reduces computational time and, in contrast to fitting a single model to the whole domain, it allows different degree of spatial smoothness over the areas within the different subdomains.

The main objective of this paper is to present a new approach to fit order-free multivariate spatial disease mapping models in domains with a very large number of small areas avoiding high RAM/CPU usage, and making it accessible to users with limited computing facilities. In particular, we combine the Orozco-Acosta et al. (2021, 2023) “ divide-and-conquer” approach with a modification of the Botella-Rocamora et al. (2015) M-models to avoid overparametrization. Our approach allows for statistical inference in the subdivisions of the study domain using local homogeneous models, which seems more appropriate than a single global model when the number of small areas is large. Then, we are able to retrieve the posterior distributions of the correlations between the spatial patterns of each disease in the whole spatial domain, as well as in each of the subdivisions. We have implemented the methodology in INLA to reduce computational burden through our R package bigDM (Adin et al. 2023), that also implements recent high-dimensional univariate proposals.

The rest of the article has the following structure. Section 2 reviews the M-models to fit multivariate data. In Sect. 3 we present the new methodology to make the multivariate models scalable. In Sect. 4, we conduct a simulation study to compare the performance of this new modelling approach with a single multivariate spatial M-model fitted to the whole domain. Finally, in Sect. 5, we use the new proposal to jointly analyse lung, colorectal and stomach cancer male mortality in Spanish municipalities. The paper closes with a discussion.

2 M-models for multivariate disease mapping

Let us assume that the area of interest is divided into I contiguous small areas and data are available for J diseases. Let \(O_ {ij}\) and \(E_ {ij}\) denote the number of observed and expected cases respectively in the i-th small area (\(i=1, \ldots , I\)) and for the j-th disease (\(j=1, \ldots , J\)). Conditional on the relative risks \(R_{ij}\), the number of observed cases in the i-th area and the j-th disease is assumed to follow a Poisson distribution with mean \(\mu _{ij}=E_{ij} \cdot R_{ij}\), that is,

$$\begin{aligned} O_{ij}| R_{ij}\sim & {} Poisson(\mu _{ij}=E_{ij} \cdot R_{ij}), \\ \log \mu _{ij}= & {} \log E_{ij}+\log R_{ij}. \end{aligned}$$

Here \(E_{ij}\) is computed using indirect standardization as \(E_{ij}=\sum _{k}n_{ijk}\cdot m_{jk}\), where k is the age-group, \(n_{ijk}\) is the population at risk in area i and age-group k for the j-th disease, and \(m_{jk}\) is the overall mortality (or incidence) rate of the j-th disease in the total area of study for the k-th age group. The log-risk is modelled as

$$\begin{aligned} \log R_{ij}=\alpha _j + \theta _{ij}, \end{aligned}$$
(1)

where \(\alpha _j\) is a disease-specific intercept and \(\theta _{ij}\) is the spatial effect of the i-th area for the j-th disease. Following the work by Botella-Rocamora et al. (2015), we rearrange the spatial effects into the matrix \({\varvec{\Theta }}=\lbrace \theta _{ij}: i=1, \ldots , I; j=1, \ldots , J \rbrace \) to better comprehend the dependence structure. The main advantage of the multivariate modelling is that dependence between the spatial patterns of the different diseases can be included in the model, so that latent associations between diseases can help to discover potential risk factors related to the phenomena under study. These unknown connections can be crucial to a better understanding of complex diseases such as cancer.

The potential association between the spatial patterns of the different diseases are included in the model considering the decomposition of \({\varvec{\Theta }}\) as

$$\begin{aligned} {\varvec{\Theta }}= \Phi {\textbf{M}}, \end{aligned}$$
(2)

where \({\Phi }\) and \({\textbf{M}}\) deal with within and between-disease dependencies, respectively. We refer to Eq. (2) as the M-model. In the following, we briefly describe the two components of the M-model.

The matrix \({\Phi }\) is of order \(I \times K\) and it is composed of stochastically independent columns that follow a spatially correlated distribution. Usually, \(K=J\), although J and K may be different (see Corpas-Burgos et al. 2019, for a discussion). To deal with spatial dependence, different spatial priors have been considered in the literature, most of them based on the well known intrinsic conditional autoregressive (iCAR) prior (Besag 1974). Namely, the proper CAR (pCAR), a proper version of the iCAR; the Besag et al. (1991) prior (BYM), which combines iCAR and exchangeable random effects; the Leroux et al. (1999) prior (LCAR) that models spatially structured and spatially unstructured variability through a weighted sum of the iCAR precision matrix and the identity, or a modified version of the BYM model denoted as BYM2 (Dean et al. 2001; Riebler et al. 2016). In summary, the columns of \({\varvec{\Phi }}\) follow multivariate Normal distributions with mean \({\textbf{0}}\) and precision matrix \({\varvec{\Omega }}\) whose expression depends on the spatial prior. In this paper, we consider the iCAR prior for the columns of \({\varvec{\Phi }}\), and hence the precision matrix is \({\varvec{\Omega }}_{\textrm{iCAR}} = \tau {\textbf{Q}}\), where \({\textbf{Q}}\) is the usual spatial neighbourhood matrix defined as \(Q_{il}=1\) if the i-th and the l-th areas are neighbours (share a common border) and 0 otherwise, \(Q_{ii}=n_i\), with \(n_i\) is the number of neighbours of the i-th area, and \(\tau \) is the precision parameter. We choose the iCAR prior because in the real case study all the variability is spatially structured.

On the other hand, \({\textbf{M}}\) is a \(K \times J\) nonsingular but arbitrary matrix and it is responsible for inducing dependence between the different columns of \({\varvec{\Theta }}\), i.e, for inducing correlation between the spatial patterns of the diseases. In Eq. (2), the cells of \({\textbf{M}}\) act as regression coefficients of the log-relative risks on the underlying patterns captured in \({\varvec{\Phi }}\) and are treated as fixed effects with a Normal prior distribution with mean 0 and a large (and fixed) variance \(\sigma ^2\). Note that assigning this type of priors to the cells of \({\textbf{M}}\) is equivalent to considering a Wishart prior to \({\textbf{M}}'{\textbf{M}}\), i.e., \( {\textbf{M}}'{\textbf{M}} \sim Wishart(J, \sigma ^2 {\textbf{I}}_J)\).

The multivariate approach allows the estimation of the correlation between the spatial patterns of the diseases, an interesting and useful feature, as a high positive correlation would support the hypotheses of common risk factors, and hence connections between diseases. The covariance matrix between the spatial patterns is obtained as \({\textbf{M}}'{\textbf{M}}\). For further details see Botella-Rocamora et al. (2015).

For notation purposes and to incorporate the dependencies between different diseases in the model, we introduce the \(\textrm{vec}(\cdot )\) operator. Let \({\textbf{A}}=({\textbf{A}}_1,\ldots ,{\textbf{A}}_J)\) be an \(I \times J\) matrix with \(I\times 1\) columns \({\textbf{A}}_j\), for \(j=1,\ldots ,J\). The \(\textrm{vec}(\cdot )\) operator transforms \({\textbf{A}}\) into an \(IJ\times 1\) vector by stacking the columns one under the other, that is, \(\textrm{vec}( {\textbf{A}} )=({\textbf{A}}'_1,\ldots ,{\textbf{A}}'_J)'\). Using this notation, the multivariate Model (1) can be expressed in matrix form as

$$\begin{aligned} \log {\textbf{R}} = \left( {\textbf{I}}_J \otimes {\textbf{1}}_I \right) {\varvec{\alpha }}+ \textrm{vec} \left( {\varvec{\Theta }}\right) , \end{aligned}$$
(3)

where \({\varvec{\alpha }}= (\alpha _1,\ldots ,\alpha _J)'\), \({\textbf{R}}=({\textbf{R}}'_1,\ldots ,{\textbf{R}}_J)'\), \({\textbf{R}}_j = (R_{1j},\ldots ,R_{Ij})'\), \(j=1,\ldots ,J\), and \({\textbf{I}}_J\) and \({\textbf{1}}_I\) are the \(J \times J\) identity matrix and a column vector of ones of length I respectively. Once the between-diseases dependencies are incorporated into the model, the resulting prior distributions for \(\textrm{vec} \left( {\varvec{\Theta }}\right) \) with Gaussian kernel has a precision matrix given by

$$\begin{aligned} {\varvec{\Omega }}_{\textrm{vec}({\Theta })}=\left( \textbf{M}^{-1} \otimes \textbf{I}_I \right) \, \textrm{Blockdiag}({\varvec{\Omega }}_{1},\ldots ,{\varvec{\Omega }}_{J}) \, \left( \textbf{M}^{-1} \otimes \textbf{I}_I \right) '.\nonumber \\ \end{aligned}$$
(4)

Recall that this precision matrix accounts for both within and between-disease dependencies: the \({\varvec{\Omega }}_{1},\ldots ,{\varvec{\Omega }}_{J}\) matrices control the within-diseases spatial structure and the matrix \({\textbf{M}}\) deals with the between-diseases variability. Note that if \({\varvec{\Omega }}_{1} = \ldots = {\varvec{\Omega }}_{J}= {\varvec{\Omega }}_{w}\), the covariance structure is separable and can be expressed as \({\varvec{\Omega }}_{\textrm{vec}({\varvec{\Theta }})}^{-1}={\varvec{\Omega }}_{b}^{-1} \otimes {\varvec{\Omega }}_{w}^{-1}\), where \({\varvec{\Omega }}_{b}^{-1}={\textbf{M}}'{\textbf{M}}\) and \({\varvec{\Omega }}_{w}^{-1}\) are the between- and within-disease covariance matrices, respectively. Note that in our case \({\varvec{\Omega }}_{w}^{-1}={\varvec{\Omega }}_{\textrm{iCAR}}^{-1}\) and the precision parameter \(\tau \) is set to 1 for identifiability issues. This M-model based framework includes both separable and non-separable covariance structures, and can accommodate different spatial dependency structures with different within-disease covariance matrices.

2.1 Model fitting, identifiability issues and prior distributions

Traditionally, MCMC techniques have been used for Bayesian model fitting and inference. However, they can be computationally very demanding. On the other hand, the INLA method (see Rue et al. 2009) has turned out to be very popular in recent years. It is designed for latent Gaussian fields and is based on integrated nested Laplace approximations and numerical integration. Many models used in practice are implemented in R-INLA (Lindgren and Rue 2015), and others can be implemented by means of generic functions with some extra-programming work. The M-model based approach is not directly available in R-INLA, but it can be implemented using the ’rgeneric’ construct (see for example Vicente et al. 2020b). In this paper, we use INLA for model fitting and inference.

Spatial models usually present identifiability issues which are generally overcome using sum-to-zero constraints on the spatial random effects (see Eberly and Carlin 2000; Goicoa et al. 2018, for details). In the multivariate setting, these constraints are considered for all the diseases in the model. Additionally, the M-models bring about new identifiability issues. As pointed out by Botella-Rocamora et al. (2015), any orthogonal transformation of the columns of \({\varvec{\Phi }}\) and of the rows of \({\textbf{M}}\) in Eq. (2) causes an alternative decomposition of \({\varvec{\Theta }}\), and therefore neither \({\varvec{\Phi }}\) nor \({\textbf{M}}\) are identifiable and inference on them should be ruled out. However, \({\varvec{\Theta }}\) and the covariance matrix \({\textbf{M}}'{\textbf{M}}\) are perfectly identifiable, so inference is confined to those quantities. It is worth noting that the decomposition of the between-diseases covariance matrix as \({\varvec{\Omega }}_{b}^{-1}={\textbf{M}}'{\textbf{M}}\) avoids dependence on the order in which the diseases are introduced into the model, but it leads to an overparameterization problem. In the M-model proposal, \(J \times J\) parameters are used to estimate the covariance matrix even though only \(J\times (J+1)/2\) parameters are required. In their paper, Botella-Rocamora et al. (2015) put independent Normal priors with mean 0 and large and fixed variance \(\sigma ^2\) on each entry of the matrix \({\textbf{M}}\) and they show that this is equivalent to assigning a Wishart prior to the covariance matrix, i.e., \({\textbf{M}}'{\textbf{M}} \sim Wishart(J, \sigma ^{2}{\textbf{I}}_J)\).

To avoid the overparameterization of the covariance matrix we propose to use the Barlett decomposition of Wishart matrices (see, for example, Peña and Irie 2022). In more detail, if \({\varvec{\Omega }}_{b}^{-1}\) is the \(J \times J\) between-disease covariance matrix with \({\varvec{\Omega }}_{b}^{-1}\sim \textrm{Wishart}(\upsilon , {\textbf{V}})\), then the Bartlett decomposition of \({\varvec{\Omega }}_{b}^{-1}\) is the factorization

$$\begin{aligned} {\varvec{\Omega }}_{b}^{-1}={\textbf{L}}{\textbf{A}}{\textbf{A}}^{'}{\textbf{L}}^{'} \end{aligned}$$

where \({\textbf{L}}\) is the Cholesky factor of \({\textbf{V}}\), and

$$\begin{aligned} {\textbf{A}}= \begin{bmatrix} c_1 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ n_{21} &{}\quad c_2 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ n_{31} &{}\quad n_{32} &{}\quad c_3 &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ n_{J1} &{}\quad n_{J2} &{}\quad n_{J3} &{}\quad \cdots &{}\quad c_J\\ \end{bmatrix}, \end{aligned}$$
(5)

whose diagonal elements are independently distributed as \(\chi ^2\) random variables and the off-diagonal elements are independently distributed as Normal random variables. More precisely, \(c^2_j \sim \chi ^2_{\upsilon -j+1}\) and \(n_{jl} \sim \textrm{N}(0,1)\) for \(j,l=1,\ldots ,J\) with \(j>l\). Using this decomposition, only \(J\times (J+1)/2\) hyperparameters (cells of \({\textbf{A}}\)) are needed to estimate the covariance matrix \({\varvec{\Omega }}_{b}^{-1}\). Note that if \({\textbf{V}}={\textbf{I}}_J\), then \({\textbf{L}}={\textbf{I}}_J\). Finally, to avoid order dependence with the diseases, we introduced \({\textbf{M}}\) into Eq. (4) as the eigen-decomposition of \({\varvec{\Omega }}_{b}^{-1}\). Chung et al. (2015) consider a family of Wishart densities for the prior of the covariance matrix and recommend the use of \(\upsilon =J+2\) degrees of freedom to make the prior a little bit more informative. In this work we follow this recommendation. Details on how to implement this in R-INLA can be found in Appendix A.

3 Scalable Bayesian models for high-dimensional multivariate disease mapping

The M-model approach can be computationally intensive when the number of areas (I) is very large. Besides, a single homogeneous model may be questionable when the number of areas grows. These limitations highlight the need for new methods. Here, we propose to use a divide and conquer strategy partitioning the spatial domain (\({\mathfrak {D}}\)) into D subregions, so that local multivariate spatial models can be simultaneously fitted in the different subregions. In each subregion, we consider the prior distribution with Gaussian kernel and precision matrix given in Eq. (4) to deal with within-disease spatial variation and between-disease correlations.

3.1 Disjoint models

A natural way to think of partitions is to consider subregions based on administrative subdivisions of the area of interest, for example provinces, states or counties. Given a partition of the spatial domain \({\mathfrak {D}}\), each geographic unit belongs to a single subregion, i.e. \({\mathfrak {D}}=\cup _{d=1}^{D}{\mathfrak {D}}_{d}\) where \({\mathfrak {D}}_{i} \cap {\mathfrak {D}}_{j} = \varnothing \) for \(i\ne j\). Then, the log-risks of the models in each subregion d (\(d=1,\ldots ,D\)) are expressed in matrix form as

$$\begin{aligned} \begin{aligned} \log {{\textbf {R}}}^{(d)}=&{} \left( {{\textbf {I}}}_J \otimes {{\textbf {1}}}_{I_{d}} \right) {\varvec{\alpha }}^{(d)} + \text {vec} \left( {\varvec{\Theta }}^{(d)} \right) , \\ \text {vec} \left( {\varvec{\Theta }}^{(d)} \right) \sim&{} \text {N}\left( {{\textbf {0}}}, {\varvec{\Omega }}_{\text {vec}\left( {\varvec{\Theta }}^{(d)} \right) } \right) , \\ {\varvec{\Omega }}_{\text {vec}\left( {\varvec{\Theta }}^{(d)} \right) }=&{} \left[ \left( {{\textbf {M}}}^{(d)} \right) ^{-1} \times {{\textbf {I}}}_{I_{d}} \right] \, \text {Blockdiag} \\ {}&\left( {\varvec{\Omega }}^{(d)}_{1},\ldots ,{\varvec{\Omega }}^{(d)}_{J} \right) \, \left[ \left( {{\textbf {M}}}^{(d)} \right) ^{-1} \times {{\textbf {I}}}_{I_{d}} \right] ' \end{aligned} \end{aligned}$$
(6)

where for each subregion d, \({\varvec{\alpha }}^{(d)}\!=\! (\alpha ^{(d)}_1,\ldots ,\!\alpha ^{(d)}_J)'\) and \(\alpha ^{(d)}_j\) is a disease-specific intercept, \({\textbf{R}}^{(d)}\!=\! \left( {\textbf{R}}^{(d)'}_{1}, \! \cdots , \!{\textbf{R}}^{(d)'}_{J} \right) '\), and each \({\textbf{R}}^{(d)}_{j}=(R_{1j}^{(d)},\ldots , R_{Ij}^{(d)})'\) is the vector of relative risks corresponding to disease j within the subregion d. Finally, \({\textbf{I}}_{I_d}\) is the identity matrix of order \(I_d\) and \({\textbf{1}}_{I_{d}}\) is a column vector of ones of length \(I_{d}\) (the number of areas within partition d), \(I=\sum _{d=1}^{D}I_d\), and \({\varvec{\Theta }}^{(d)}=\lbrace \theta ^{(d)}_{ij}: i=1, \ldots , I_d; j=1, \ldots , J \rbrace \) is the matrix of spatial effects in partition d including both within and between-disease dependence structure. In more detail, this model can be expressed as

$$\begin{aligned} \begin{aligned}{}&{} \begin{pmatrix} \log {{\textbf {R}}}^{(1)} \\ \vdots \\ \log {{\textbf {R}}}^{(d)} \\ \vdots \\ \log {{\textbf {R}}}^{(D)} \\ \end{pmatrix} = {{\textbf {I}}}_J \otimes \begin{pmatrix} {{\textbf {1}}}_{I_{1}} \\ {} &{}{} \ddots \\ {} &{}{} &{}{} {{\textbf {1}}}_{I_{d}} \\ {} &{}{} &{}{} &{}{}\ddots \\ {} &{}{} &{}{} &{}{} &{}{} {{\textbf {1}}}_{I_{D}} \\ \end{pmatrix} \begin{pmatrix} {\varvec{\alpha }}^{(1)} \\ \vdots \\ {\varvec{\alpha }}^{(d)} \\ \vdots \\ {\varvec{\alpha }}^{(D)} \end{pmatrix} \\ {}&+ \begin{pmatrix} \text {vec} \left( {\varvec{\Theta }}^{(1)} \right) \\ \vdots \\ \text {vec} \left( {\varvec{\Theta }}^{(d)} \right) \\ \vdots \\ \text {vec} \left( {\varvec{\Theta }}^{(D)} \right) \\ \end{pmatrix} \end{aligned} \end{aligned}$$

where the precision matrix of the multivariate Normal random effect vector \(\left( \textrm{vec} {\varvec{\Theta }}^{(1)'}, \ldots , \textrm{vec} {\varvec{\Theta }}^{(D)'} \right) '\) is a block-diagonal matrix of dimension \(IJ \times IJ\) whose blocks correspond to the precision matrices \({\varvec{\Omega }}_{\textrm{vec}\left( {\varvec{\Theta }}^{(d)} \right) }\), \(d=1,\ldots ,D\). The full domain log-risk is just the union of the posterior estimates of each subregion, i.e., \(\log {\textbf{R}} =\left( \log {\textbf{R}}^{(1)'},\cdots ,\log {\textbf{R}}^{(D)'} \right) '\).

3.2 Models with overlapping partitions

Disjoint partitions might suffer from border effects as areas in the boundary of a given partition would not borrow information from neigbouring areas from a contiguous subdivision. Consequently, the risk estimates in those areas may not be correct. This inconvenience can be solved by considering an alternative modelling approach in which k-order neighbours are added to each subregion of the partition, so that border areas have neighbours from other subregion of the partition. In this case, the entire spatial region \({\mathfrak {D}}\) is divided into a set of overlapping subregions and some small areas belong to more than one subdivision, i.e., \({\mathfrak {D}}=\cup _{d=1}^{D}{\mathfrak {D}}_{d}\) and \({\mathfrak {D}}_{i} \cap {\mathfrak {D}}_{j} \ne \varnothing \) for neighbouring subregions. Similar to the disjoint Model (6), D submodels will be simultaneously fitted. However, as \(\sum _{d=1}^{D} I_{d}>I\), the final risk \({\textbf{R}}=({\textbf{R}}'_1,\ldots ,{\textbf{R}}'_J)'\) with \({\textbf{R}}'_j = (R_{1j},\ldots ,R_{Ij})'\), \(j=1,\ldots ,J\), is no longer the union of the posterior estimates obtained for each submodel as areas located in the borders of the spatial partition would have more than one estimated posterior distribution.

Two different strategies can be considered to obtain a unique posterior estimate of the relative risk for those areas in more than one subregion. Orozco-Acosta et al. (2021) propose to calculate the mixture distribution of the estimated posterior probability density functions of the relative risks in the different subdivisions, with weights proportional to the conditional predictive ordinate (CPO) values (Pettit 1990). To compute the mixture, suppose that area i belongs to m(i) subregions of the spatial domain \({\mathfrak {D}}\) and let \(f^{(1)}_{ij}(x),\cdots ,f^{(m(i))}_{ij}(x)\) be the posterior estimates of the probability density functions of the j-th disease in the i-th area. Then the mixture distribution of \(R_{ij}\) can be written as

$$\begin{aligned} f_{ij}(x)=\sum _{k=1}^{m(i)} w_{k} f^{(k)}_{ij}(x), \quad \text{ with } \quad w_k = \dfrac{CPO_{ij}^k}{\mathop {{\mathop {\sum }\nolimits _{k=1}^{m(i)}}}\limits CPO_{ij}^k} \end{aligned}$$

where \(CPO_{ij}^k\) is the conditional predictive ordinate of area i and disease j obtained in partition k, so that \(w_{k} \ge 0\) and \(\sum _{k=1}^{m(i)} w_{k}=1\) (see for example Lindsay 1995; Frühwirth-Schnatter 2006).

More recently, Orozco-Acosta et al. (2023) consider using the posterior marginal distribution of the relative risk estimated from its original partition. Based on the results obtained from a simulation study, they show that this strategy outperforms the use of mixture distributions in terms of risk estimation accuracy and true positive/negative rates. In this paper, this is also the default strategy used to obtain unique posterior distributions for each relative risk \(R_{ij}\).

3.3 Between-disease correlations and variance parameters

Besides increasing the effective sampling size and improving risk smoothing, one of the main advantages of multivariate disease mapping models is that they take into account correlations between the spatial patterns of the different diseases, that is, they reveal connections between them. Fitting a single multivariate model to the region of interest provides correlations between the diseases in the whole study domain thus revealing overall relationships. In addition, it also provides the diagonal elements of the between-disease covariance matrix, hereafter referred to as variance parameters. In the case of separable covariance structures (the Kronecker product of between and within disease covariance matrices) these parameters control the amount of smoothing within diseases. By dividing the spatial domain into subregions, we obtain the posterior distributions of these parameters in each of the subdivisions and we retrieve the between disease correlations and variances for the entire region. Hence, partition models provide additional information by revealing local connections between diseases in the subdivisions, which are usually based on administrative divisions.

To obtain global estimates of the parameters of interest in the overall study domain from the partition models, we adapt the consensus Monte Carlo (CMC) algorithm originally proposed by Scott et al. (2016). The idea behind consensus Monte Carlo is to divide the data into shards (in our case, the shards corresponds to different subdivisions of the spatial domain), give each shard to a worker machine which does a full Monte Carlo simulation from a posterior distribution given its own data, and then combine the posterior simulations from each worker (or submodel) to produce a set of global draws representing the consensus belief among all the workers. Here, we briefly describe how to adapt the ideas behind the CMC algorithm to our case.

Let \({{\varvec{\psi }}}=({{\varvec{\rho }}},{{\varvec{\sigma }}}^2)^{'}\) denotes the vector with the parameters of interest where \({{\varvec{\rho }}}=(\rho _{12},\ldots ,\rho _{J-1,J})^{'}\) contains the between-disease correlations and \({{\varvec{\sigma }}}^2=(\sigma ^2_1,\ldots ,\sigma ^2_J)^{'}\) are the diagonal elements of the between-disease covariance matrix, and let \({\psi }_{kd}\) denote the local estimate of the k-th parameter of \({{\varvec{\psi }}}\) in each subdomain \({\mathfrak {D}}_{d}\), \(d=1,\ldots ,D\). We first extract samples of size S from the posterior marginal estimates of \(\psi _{kd}\) denoted as \(\psi _{kd}^s\) for \(k=1,\ldots ,J \times (J+1)/2\), \(d=1,\ldots ,D\) and \(s=1,\ldots ,S\). Then, we combine the draws using weighted averages

$$\begin{aligned} {\tilde{\psi }}_k^s=\sum \limits _{d=1}^D w_d \psi _{kd}^s, \quad \text{ for } s=1,\ldots ,S \end{aligned}$$

where \(w_d\) are normalized weights inversely proportional to the posterior marginal variances of \(\psi _{kd}\). Finally, we approximate the posterior marginal density function of the parameter \(\psi _k\) from the combined draws \({\tilde{\psi }}_k^s\).

3.4 Model selection criteria

Two of the most widely used criteria to compare Bayesian models are the deviance information criterion (DIC) (Spiegelhalter et al. 2002) and the Watanabe-Akaike information criterion (WAIC) (Watanabe 2010). However, with partition models, it is not straightforward to get these quantities as we fit as many models as subdivisions. Hence, we need a procedure to estimate these quantities from the scalable models described in Sects. 3.1 and 3.2.

Extending the ideas in Orozco-Acosta et al. (2021) to the multivariate framework, we compute approximate DIC values by drawing samples from the posterior marginal distribution of the Poisson means. Denoting by \({\textbf{C}}^{s}\), \(s=1,...,S\), to the posterior simulations of \(\mu _{ij}=E_{ij}\cdot R_{ij}\) (the mean of the Poisson distribution), approximate values of the mean deviance \(\overline{D({\textbf{C}})}\) and the deviance of the mean \(D(\overline{{\textbf{C}}})\) can be respectively calculated as

$$\begin{aligned} \overline{D({\textbf{C}})}= & {} \dfrac{1}{S}\sum _{s=1}^{S} - \log \left( p({\textbf{O}} \vert {\textbf{C}}^{s}) \right) ; \\ D(\overline{{\textbf{C}}})= & {} -2\log \left( p({\textbf{O}} \vert \overline{{\textbf{C}}}) \right) , \, \\ \textrm{with} \, \overline{{\textbf{C}}}= & {} \dfrac{1}{S}\sum _{s=1}^{S} {\textbf{C}}^{s}, \end{aligned}$$

where \(p({\textbf{O}}|\cdot )\) denotes the likelihood function of a Poisson distribution. Then, the DIC is obtained as

$$\begin{aligned} \textrm{DIC} = 2 \, \overline{D({\textbf{C}})} - D(\overline{{\textbf{C}}}). \end{aligned}$$

Similarly, approximate WAIC values are computed as (see Gelman et al. 2014)

$$\begin{aligned} \textrm{WAIC}= & {} -2 \sum _{i=1}^{I}\sum _{j=1}^{J} \log \left( \dfrac{1}{S}\sum _{s=1}^{S} p(O_{ij} \vert {\textbf{C}}^{s}) \right) \\{} & {} \quad + 2\sum _{i=1}^{I}\sum _{j=1}^{J} \textrm{var} \left[ \log \left( p(O_{ij} \vert {\textbf{C}}^{s}) \right) \right] . \end{aligned}$$

4 Simulation study

We conduct a simulation study to compare the performance of the different M-models described in Sect. 2. Specifically, our interest relies on comparing the fit of a single model to the whole domain (hereafter referred to as the global model) and the partition models, in terms of parameter estimates and relative risk estimation accuracy. The \(I=7907\) municipalities of continental Spain and \(J=3\) diseases are used as the simulation template because this imitates the case study presented in Sect. 5.

Two different scenarios have been considered to recover the possible underlying generating process of spatially correlated disease risks. In the first scenario, samples are generated from a fixed covariance structure based on the spatial neighbourhood graph of the whole area under study, that is, the global model is used as the generating model. In contrast, in the second scenario, independent samples for each partition (Spanish Autonomous Regions, see Fig. 5 in Appendix B) are generated using the covariance structures of the partition, that is, the Disjoint model is used as the data generating mechanism. Further details are given below.

4.1 Data generation

One advantage of multivariate models is their ability to reveal relationships between different diseases in terms of correlations between their underlying spatial patterns. To evaluate how well these correlation parameters are estimated, we start by sampling from a multivariate Normal distribution with precision matrix \({\varvec{\Omega }}_{\textrm{vec}({\varvec{\Theta }})}={\varvec{\Omega }}_b \otimes {\varvec{\Omega }}_{iCAR}\). Here, the elements of the between-disesease covariance matrix are fixed, that is,

$$\begin{aligned} \Omega ^{-1}_b= & {} \begin{pmatrix} \sigma _1 &{} &{} \\ {} &{} \sigma _2 &{} \\ {} &{} &{} \sigma _3 \end{pmatrix} \begin{pmatrix} 1 &{}\quad \rho _{12} &{}\quad \rho _{13} \\ \rho _{21} &{}\quad 1 &{}\quad \rho _{23} \\ \rho _{31} &{}\quad \rho _{32} &{}\quad 1 \\ \end{pmatrix} \begin{pmatrix} \sigma _1 &{} &{} \\ {} &{} \sigma _2 &{} \\ {} &{} &{} \sigma _3 \end{pmatrix}\\= & {} \begin{pmatrix} \sigma _1^2 &{}\quad \sigma _{12} &{}\quad \sigma _{13} \\ \sigma _{21} &{}\quad \sigma _2^2 &{}\quad \sigma _{23} \\ \sigma _{31} &{}\quad \sigma _{32} &{}\quad \sigma _3^2 \\ \end{pmatrix} \end{aligned}$$

where \(\sigma ^2_j\) are variance parameters, and \(\rho _{kl}=\rho _{lk}\) are between-disease correlation coefficients. Note that \(\sigma _{kl}\) denotes the covariances between each pair of diseases. Then, for each sample of \(\textrm{vec}({\varvec{\Theta }}^{r})\), \(r=1,\ldots ,100\) we compute the relative risks \(R_{ij}^r\) following Eq. (3). Finally, we generate \(O_{ij}\) counts for area i and disease j using a Poisson distribution with mean \(\mu _{ij}^r = E_{ij} \cdot R_{ij}^r\), where \(E_{ij}\) are the expected number of cases of our case study data (lung, colorectal and stomach cancer mortality in Spanish males).

In Scenario 1, the neighbourhood graph of all the 7907 municipalities is used to define the spatial precision matrix \({\varvec{\Omega }}_{iCAR}\) (the global model is used to generate the data). In addition, we fix the parameters of the between-disease covariance matrix as \(\sigma _1^2=0.25\), \(\sigma _2^2=0.16\), \(\sigma _3^2=0.09\), \(\rho _{12}=0.7\), \(\rho _{13}=0.5\) and \(\rho _{23}=0.1\). In Scenario 2, \(D=15\) independent samples are generated from multivariate Normal distributions with precision matrices equal to \({\varvec{\Omega }}_{\textrm{vec}({\varvec{\Theta }}^{d})}= {\varvec{\Omega }}_b^{(d)} \otimes {\varvec{\Omega }}_{iCAR}^{(d)}\), where \({\varvec{\Omega }}_{iCAR}^{(d)}\) is the spatial precision matrix of the areas within subdomain \(d=1,\ldots ,D\), and different between-disease covariance matrices \({\varvec{\Omega }}_b^{(d)}\) are considered en each subdivision ( the disjoint model, k = 0, is used to generate the data). Here, the variance parameters are fixed to \(\sigma ^2_1=0.5\), \(\sigma ^2_2=0.4\) and \(\sigma ^2_3=0.3\), while similar values to the ones estimated with the partition models in the case study presented in the next section are used as correlation coefficients (see Table 6 in Appendix B). We increase the variance parameters in Scenario 2 to get stronger smoothing effects in each subdivision. Note that the variance parameters are the same in all the subdivisions, but they cannot be considered as global variance parameters because the covariance structures, based on the neighbourhood matrices, are different. Hence, in this scenario we do not have true parameter values for the global model.

4.2 Results: Scenario 1

Table 1 compares the true values of model parameters in Scenario 1 (variance parameters and correlation coefficients) against average values of posterior mean estimates over the 100 simulated data sets. In addition, estimated standard errors, simulated standard errors (derived from the sample variance of the parameter estimates) and empirical coverages of the 95% credible intervals are also displayed. Note that for the partition models, these posterior marginal distributions are obtained by using the CMC algorithm described in Sect. 3.3. In terms of model parameters, multivariate models give very accurate estimates of the real values, both in terms of posterior mean and posterior standard deviation estimates (note that nearly identical values are obtained from estimated and simulated standard errors). As expected, slightly better results are obtained when fitting the global model, as this is the true generating model in Scenario 1. Regarding partition models, the higher the neighbourhood order, the more similar the CMC estimates of the correlation coefficients are to those of the global model.

Table 2 displays average values of model selection criteria (posterior mean deviance \(\overline{D({\varvec{\theta }})}\), effective number of parameters \(p_D\), DIC and WAIC) for the global and the partition models, as well as the accuracy of the relative risk estimates quantified by the mean absolute relative bias (MARB), the mean relative root mean square errors (MRRMSE) and empirical coverages of the 95% credible intervals for the risks. Note that the MARB and MMRMSE are defined for each small area i and disease j as

$$\begin{aligned} \text{ MARB}_{ij}= & {} \left| \frac{1}{100}\sum _{r=1}^{100} \dfrac{{\hat{R}}_{ij}^{(r)}-R_{ij}^{(r)}}{R_{ij}^{(r)}} \right| \quad \text{ and } \\ \text{ MMRMSE}_{ij}= & {} \sqrt{\frac{1}{100} \sum _{r=1}^{100} \left( \dfrac{{\hat{R}}_{ij}^{(r)}-R_{ij}^{(r)}}{R_{ij}^{(r)}} \right) ^2} \end{aligned}$$

where \(R_{ij}^{(r)}\) and \({\hat{R}}_{ij}^{(r)}\) denote the true value and the posterior median estimate of the relative risks for the r-th data set (\(r=1,\ldots ,100\)). Model selection criteria point towards partition models, though differences are mild. Regarding MARB, MMRMSE and 95% coverage values, differences between the global and the partition models are practically negligible.

Table 1 Average values of posterior mean, posterior standard deviation (SD), simulated standard errors (sim) and empirical coverage of the 95% credible intervals (EC) for model parameters based on 100 simulated data sets for Scenario 1
Table 2 Average values of model selection criteria (mean deviance, effective number of parameters, DIC and WAIC) and risk estimation accuracy (MARB, MRRMSE and empirical coverage -EC- of the 95% credible intervals) based on 100 simulated data sets for Scenario 1

4.3 Results: Scenario 2

In contrast to the previous scenario, it should be noted that in Scenario 2 we cannot compare the global estimates of the model parameters against the true values of the variance parameters and between-disease correlations, since different values have been used to generate the risk surfaces in each subdomain and we do not have true global values. However, we can compare the model’s performance in terms of model selection criteria and risk estimation accuracy (see Table 3). As expected, the Disjoint model (\(k=0\)) shows the best performance according to these measures, as this is the true generating model in Scenario 2. In terms of MARB and MRRMSE, partition models also outperform the Global model.

We are also interested in analyzing if the partition models are able to recover the local between-disease covariance structures of the true generating process. In Table 6 (Appendix B) we compare these values against the average values of posterior mean estimates of local parameters in each subdivision over the 100 simulated data sets for the Disjoint model. For almost every subdivision, very accurate estimates are obtained for both variance parameters and correlation coefficients. For the latter, the median value of the empirical coverage of the 95% credible intervals is 0.95 (with \(Q_1=0.93\) and \(Q_3=0.97\)). As expected, these estimates get worse as the neighbourhood order of the models increases, since the estimated local correlations correspond to enlarged subdivision rather than the subdivisions themselves. Even so, the median values of the empirical coverage of the 95% credible intervals for the between-disease correlations are 0.89 (with \(Q_1=0.84\) and \(Q_3=0.92\)) and 0.86 (with \(Q_1=0.79\) and \(Q_3=0.90\)) for 1st-order and 2nd-order neighbourhood models, respectively. All the results are shown in Tables 7 and 8 in Appendix B.

Table 3 Average values of model selection criteria (mean deviance, effective number of parameters, DIC and WAIC) and risk estimation accuracy (MARB, MRRMSE and empirical coverage -EC- of the 95% credible intervals) based on 100 simulated data sets for Scenario 2
Table 4 Model selection criteria and computational time, in minutes, for multivariate models with iCAR spatial prior using the simplified Laplace approximation strategy if INLA

5 Case study

In this section we jointly analyse mortality data for lung, colorectal, and stomach cancer in men in the 7907 municipalities of mainland Spain (excluding Baleares and Canary Islands and the autonomous cities of Ceuta and Melilla) during the period 2006-2015 using the new proposal. During the ten years of the study, a total of 162,602 deaths from lung cancer (corresponding to codes C33-C34 of the International Classification of Diseases-10), 82,967 from colorectal cancer (C17-C21) and 33,170 from stomach cancer (C16) were registered for male population of mainland Spain, which correspond to global rates of 76.48, 39.02 and 15.60 deaths per 100,000 male inhabitants, respectively.

5.1 Model fitting and model selection

We fit the disjoint model (\(k=0\)) and the k-order neighbourhood model for \(k=1, 2, 3\) in R-INLA using \(D=15\) subdivisions of the spatial domain. These subdivisions are also of interest as they correspond to Autonomous Regions of Spain (NUTS2 level from the European nomenclature of territorial units for statistics, shown in Fig. 5 in Appendix B). In these partitions, the highest value of \(I_{d}\) (number of municipalities) is 2245 and corresponds to the Autonomous Region of Castilla y León, a rather vast territory from central to northwestern Spain with about 5% of the total Spanish population. Although this subregion is large, we maintain this subdivision as it represents the administrative division of Spain into Autonomous Regions. We also fit the multivariate spatial M-models over the entire spatial domain (global model), and compare the results with those obtained with the new proposal.

Previously, univariate models were also fitted to each disease using a BYM2 spatial prior. The covariance matrix of this prior copes with both spatial structured variability and unstructured variability. Results (not shown here to conserve space) show that most of the variability is spatially structured. Since the computational cost of this prior makes it difficult its use in a multivariate setting, and most of the variability is spatially structured, we fit the joint multivariate proposal given in Eq. (6) by considering an iCAR prior for the spatial random effects.

For the partition models, we distribute the submodels over 2 machines with four processors Intel Xeon Silver 4108 and 192GB RAM on each machine (Ubuntu 20.04.4 LTS operative system), using the simplified Laplace approximation strategy in R-INLA (Lindgren and Rue 2015) (stable version INLA_22.05.07, R version R\(-\)4.1.2) and simultaneously running 3 models in parallel on each machine using the bigDM package (Adin et al. 2023).

Table 4 displays the posterior mean deviance \(\overline{D({\varvec{\theta }})}\), the effective number of parameters \(p_D\), the DIC, and the WAIC for the global and the scalable models together with the computing time (in minutes). The total time for the scalable models is obtained by adding the running time and the merging time. The running time refers to the elapsed time for all the submodels fitted with R-INLA, and the merging time refers to the combination (when necessary) of the posterior distributions of the risks, the approximation of the DIC/WAIC values, and the computation of global estimates of the between-diseases correlation coefficients using the proposed CMC algorithm. As expected, the computational cost raises as the neighbourhood order (k) increases, though the scalable proposal is faster than the global model for all values of k. The greatest reduction in time in comparison with the global model is obtained for \(k=0\), being the global model about 5.5 times slower. When the neighbourhood order increases, the difference in computing time is less pronounced. The global model is about 4.3, 3.8, and 3.6 times slower than the scalable models with \(k=\)1, 2, and 3, respectively. Regarding model selection criteria, scalable Bayesian models outperform the global model. The greater reduction in DIC and WAIC is obtained for the 1st-order neighbourhood model. However, increasing the neighbourhood order may improve the between-disease correlation estimates.

5.2 Joint analysis of male mortality from three types of cancer in Spain

In this subsection, the spatial patterns of lung, colorectal, and stomach cancer mortality risks in men are examined in the municipalities of continental Spain using the scalable multivariate proposal presented in Sect. 3.

We begin with a comparison of the estimated risks obtained with the global model, the disjoint model (\(k=0\)) and the k-order neighbourhood models (\(k=1\), 2 and 3). Figure 1 displays dispersion plots of the posterior median estimates of the relative risks obtained with the partitioned models versus those obtained with the global model. The left, central and right columns correspond to lung, colorectal and stomach cancer, respectively. The neighbourhood order in the partition models are represented in the different rows. The largest differences are observed between the global and the disjoint models. This is expected because areas in the border of a subdivision do not borrow strength from neighbouring areas located in a contiguous subdivision. As the neighbourhood order k increases, the risk estimates are more similar to the global model. Figure 2 displays the spatial patterns of lung cancer mortality risks (top) and the posterior probabilities of risk exceedance (bottom), \(P(R_{ij}>1 \vert {\textbf{O}})\), obtained with the global, the disjoint (k = 0) and the partition models (k = 1, 2, 3). To save space, maps for colorectal and stomach cancer are provided in Figs. 6 and 7 (Appendix B). Though differences in risks estimates are observed in the dispersion plots, it is harder to appreciate them on the maps.

Fig. 1
figure 1

Dispersion plots of the posterior median estimates of relative risks for lung (left column), colorectal (central column) and stomach (right column) cancer mortality data obtained with the partitioned model (\(k=0,1,2,3\) from top to bottom) versus the global model

Fig. 2
figure 2

Maps of posterior median estimates of mortality relative risk for lung cancer (top) and posterior exceedance probabilities \(P(R_{ij}>1 \vert {\textbf{O}})\) (bottom) in continental Spain

Multivariate models borrow information from nearby areas and the different diseases. Additionally, they present other advantages over univariate counterparts, such as the possibility of estimating correlations between the spatial patterns of the diseases. Moderate to high correlations may suggest the existence of underlying risk factors affecting the diseases under study, which in turns implies connection between them. This information may be crucial to better understand diseases such as cancer in which known risk factors only explain a small percentage of the cases. Spatial patterns may be associated to factors like access to treatment or life style that might have an impact on mortality.

Fig. 3
figure 3

Posterior distributions of the estimated between-disease correlations with the global, and \(k = 0,1,2\)-order neighbourhood models, using an iCAR prior for spatial random effects

Table 5 Descriptive statistics of the estimated between-disease correlations with the global, and \(k=0,1,2\)-order neighbourhood models, using an iCAR prior for spatial random effects

Posterior distributions of the between-disease correlations obtained with the disjoint (k = 0) and the partition models (k = 1, 2) are displayed in Fig. 3 together with correlations for whole Spain obtained with the CMC algorithm and with the global model. Here, \(\rho _{1.2}\), \(\rho _{1.3}\), and \(\rho _{2.3}\) denote the correlation parameters between lung and colorectal, lung and stomach, and colorectal and stomach cancer, respectively. Summary statistics (mean, median, mode, standard deviation, 2.5 and 97.5 percentiles) of the between-disease posterior correlations are also shown in Table 5. In general, the posterior distributions estimated with the CMC algorithm for the partition models are very similar to those obtained with the global model. Similar to the posterior estimates of the relative risks, closer values to the global model are observed as the neighbourhood order k increases.

Fig. 4
figure 4

Maps of posterior medians of between-disease correlations and standard deviation (in brackets) for the different subdivisions obtained with the 1st-order neighbourhood partition model. Correlations between lung and colorectal cancer are displayed on the left (\(\rho _{1,2}\)), the central map displays the correlations between lung and stomach cancer (\(\rho _{1,3}\)), and the map on the right displays the correlation between colorectal and stomach cancer (\(\rho _{2,3}\))

Finally, Fig. 4 displays a map with the posterior medians and standard deviations of the between-diseases correlations \(\rho _{1,2}\) (left), \(\rho _{1,3}\) (center), and \(\rho _{2,3}\) (right), for the different subdivisions (Autonomous Regions) obtained with the 1st-order neighbourhood partition model. Partition models can provide the correlations over the whole study domain, but also the correlations for the different subdivisions. This is an advantage over the global models as we add information at different administrative divisions. Moreover, the variability in the posterior medians of the correlations across the subdivisions may indicate a lack of stationarity that the global model cannot cope with, and hence the advantages of the partition models. When the number of small areas is large, the use of a global model with one single precision (smoothing) parameter may be questionable while local models add more flexibility to deal with the spatial heterogeneity across the map.

6 Discussion

Spatial areal models have a long tradition in epidemiology to study the geographical pattern of a disease. While initially focused on modelling a single disease, spatial models have evolved into a multivariate framework with two notable objectives: to improve estimates by borrowing strength from other diseases and neighbouring areas, and to estimate latent correlations between the spatial patterns of the diseases under study to address the connections between them and to hypothesize common risk factors. Research on spatial multivariate models has received considerable attention in recent years, although their use is not yet widespread in epidemiology mainly because (i) the implementation of multivariate models in available software requires advanced computing skills and (ii) computational issues are accentuated when the number of small areas is large as computing time may become prohibitive. Vicente et al. (2020b, 2021) provide an implementation of multivariate CAR and P-splines in R-INLA that can be used by a wide audience without advanced computer skills.

In this paper, we present a new approach to analyse multivariate areal count data when the number of small areas is very large. In particular, we combine the methodology proposed by Orozco-Acosta et al. (2021) for high-dimensional disease mapping with a modification of the multivariate approach given by Botella-Rocamora et al. (2015) to avoid overparameterization, obtaining a scalable Bayesian modelling approach to multivariate disease mapping. Our proposal begins with the partitioning of the spatial domain into subregions with substantially fewer small areas. The multivariate models can then be fitted simultaneously (using both parallel or distributed computation strategies) in each of these regions, reducing computational time and avoiding memory and storage problems. Dividing the whole spatial domain into disjoint regions may induce border effects as the areas in the limits of a given subdivision do not borrow information from neighbouring areas located in a different subregion. To overcome this issue, we consider k-order neighbourhood models that incorporate neighbouring areas to those regions located on the partition boundary. Finally, variance parameters and between-disease correlations for the whole area are obtained by means of an adaptation of a consensus Monte Carlo algorithm. The correlation coefficients indicate potential geographic factors related (or not) to the different diseases. If the covariance structure is separable, the variance parameters measure the amount of smoothing for each disease. In addition to the CMC algorithm, we have also considered the Weierstrass rejection sampler (WRS) proposed by Wang and Dunson (2013) to recover the parameters of interest for the whole study region (results not shown to save space). In this algorithm, the posterior of the target distribution in the whole area is approximated by combining posterior samples of the subdivisions using rejection sampling. Though it was originally proposed to combine posterior draws from independent MCMC subset chains, it can be adapted to other Bayesian estimation techniques such as INLA through the R package weierstrass (available at https://github.com/wwrechard/weierstrass). In general, very similar posterior marginal estimates are obtained with both algorithms.

One of the key issues with partition models is to choose the neighbourhood order. Here we use model selection criteria such as DIC and WAIC. Our conclusions are that, in general, the larger the neighbourhood order, the more similar the partition model is to the global model. However, increasing too much the neighbourhood order, the benefits of our proposal in terms of computational time vanish. Overall, first or second order neighbourhood models are appropriate. From the simulation study, we conclude that even when the underlying generating process is the Global model, the partition models are very competitive in terms of risk estimation accuracy. Moreover, the global between-disease correlation coefficients are well recovered with the partition models. If the geographical distribution and correlation structure of the underlying process varies across the whole map (which seems very realistic in practice), better results are obtained with our modelling proposals than with the usual global model.

Very recently, a new hybrid approximate procedure that uses the Laplace method with a low-rank variational Bayes correction has been proposed as part of the R-INLA project (Van Niekerk and Rue 2021; Van Niekerk et al. 2023). The latest versions of the R-INLA package allow to run the models using this new approximation strategy (named as “compact" mode) resulting in a substantial reduction in computational time. This new approximation method appears to be very promising. However, further research is necessary to explore its accuracy in estimating hyperparameters, such as between-disease correlations.

Moreover, when there is a large number of areas, the suitability of a global homogeneous model (with a single precision/smoothing parameter) for the entire study region may be doubtful. Instead, implementing various local homogeneous models can provide increased flexibility in capturing the spatial heterogeneity present across the map.

In conclusion, it can be argued that partition models offer several advantages over a global model. Firstly, they accelerate computations through the classical integrated nested Laplace approximations and alleviate storage and memory problems. Secondly, they offer a dual benefit. Even if the global model is appropriate, we can provide both a global spatial pattern for the entire region and local patterns for the subdivisions, which is particularly beneficial for our case. Lastly, it’s worth noting that as the number of diseases grows, so does the number of hyperparameters in the covariance matrix, resulting in a greater computational burden. This issue warrants further research.

In our case study, we use an administrative division of the municipalities of continental Spain corresponding to \(D=15\) Autonomous Regions. This partition is a natural choice as Autonomous Regions in Spain are responsible for developing and implementing health policies, and life style may change from region to region. By utilizing subdivisions, we can obtain estimates that reveal associations between diseases which may be linked to specific policies, different lifestyles, or other geographical factors that have a local impact. This could potentially explain the observed differences in between-disease correlations across subdivisions. However, this partition may have some disadvantages. For instance, the Region of Castilla and León comprises 2245 municipalities, which is still a large number. To address this issue, we have also employed a finer partition based on 47 provinces rather than Autonomous Regions. Although the overall results are similar, the partition based on Autonomous Regions yields better recovery of the global between-disease correlations.

Fig. 5
figure 5

Map of the administrative division of Spain into Autonomous Regions

The M-models for multivariate disease mapping described in this paper are implemented in the R package bigDM, which also includes several scalable spatial and spatio-temporal Poisson mixed models for areal count data in a fully Bayesian setting using INLA. The package also contains a vignette to replicate the data analysis described in Sect. 5 using simulated data to preserve the confidentiality of the original data.