1 Introduction

In Italy, the Census has been carried out every 10 yearsFootnote 1 since 1871 (Unification of Italy), creating a unique and exceptional information patrimony, at the level of microdata such as the family unit and the house, surveyed in a universal and direct way. Until 2011 the Census was carried out with the classic method,Footnote 2 while in 2018 Istat started the permanent one in Italy, with the assistance of the administrative archives, losing its universal nature, passing to continuous sample surveys (Cervellera et al. 2021). For this reason, we wanted to use in our article the immense patrimony of the Territorial Bases of the Istat 2011 census, as a geographic and spatial data warehouse of the entire open-sourceFootnote 3 census survey, since Istat has not yet updated the bases themselves to the data of the new permanent censuses carried out so far. Eurostat bases the territorial administrative classification in NUTS (Nomenclature of Territorial Units for Statistics)Footnote 4 according to the European legislation governing official EU statistics; three territorial levels are regulated: NUT1 (State), NUT2 (Region) and NUT3 (Sub-region—Province in Italy). Therefore, the National Institutes of the European Statistical System, including the Italian Istat, are obliged to define the statistics at a minimum level of NUT3. Life expectancy is a very important indicator (e.g., for the quality of health and life) that Istat, like many other European institutes, provides on a NUT3 provincial basis (Cervellera and Cusatelli 2022). Knowing these data at the sub-provincial level, by the municipality and sub-municipal areas (e.g., ACE—census areas) would be very important for citizens and public institutions (Gennaro et al. 2022), and small-area disaggregation analyses are usually performed in the USA, by the Census Bureau, down to very granular sub-municipal levels (5000–7000 inhabitants).Footnote 5 In the next section, we present the data to be processed and the methodology that will allow, in Sect. 3, linear modelling of the spatial dependence, explain the results in Sect. 4, and conclude this article with the proposal of some future developments.

2 Materials and methods

The spatial and geographical structure of the Territorial Bases starts from the smallest and main granule (Down), which are the Census Sections (about 403,000), to the Census Areas (CEA), the Sub-Municipal Areas (SMA: municipalities, districts, etc.), the Localities and the areas and administrative limits of Municipalities, Provinces, and Regions (Top).

The 152 variables detected have high information on 5 Levels,Footnote 6 in addition to the area descriptors, which are homogeneous in Buildings, Families, Foreigners, Households and Population. Many of these variables will be excellent predictors of biometric functions, particularly life expectancy at birth e°.

Type

Count

Descriptive areas

12

Building structure (B)

31

Family structure (FS)

9

Foreigners (FO)

15

House Structure (H)

9

Population structure (P)

76

Total

152

At the territorial level of spatial granularity, however, the structure is as follows:

 

Census year

  

Roofing area

1991

2001

2011

Regions

20

20

20

Provinces

95

103

110

Municipalities

8100

8101

8092

Sub Municipal Areas11

70,742

60,482

60,447

Census sections

323,616

382,534

402,677

Census population

56,778,031

56,995,744

59,433,744

  1. In municipalities of Bari, Bologna, Brescia, Cagliari, Catania, Ferrara, Firenze, Foggia, Genova, Livorno, Messina, Milano, Modena, Monza, Napoli, Padova, Palermo, Parma, Perugia, Pescara, Prato, Ravenna, Reggio Calabria, Reggio Emilia, Rimini, Roma, Salerno, Sassari, Siracusa, Taranto, Torino, Trieste, Venezia, and Verona.

The data of the biometric functions and life expectancy at birth are taken from the ISTAT elaborations of the mortality tables, up to the maximum provincial level (Down) of 2011, homogeneously pre-testing the model and the structure. To derive indicators with a Top–Down methodological approach of the highest territorial units (Top) to the bases (Down), using their information to structure an autoregressive spatial correlation of spatial dependence of the territorially adjacent base nodes. The derivation of the disaggregated indicator makes use of the method proposed by Chow and Lin (1971): it is a technique designed and used for temporal disaggregation also known as temporal distribution. Temporal disaggregation is the process of deriving data from low-frequency (Top, e.g., years) to high-frequency data (Down, e.g., quarters and months). Since the results of the Chow-Lin method depend on the information on a different variable, the method can be considered as an indirect approach, while the dynamic time autoregressive dependence, can derive spatial and areal dependence, on cross-sectional and panel data, in an autoregressive form on linking nodes between disaggregation areas, correlated with spatial weights matrices (W matrix) and linking matrices between levels and indicators (C matrix), by using Cran's R package of Spatial DependenceFootnote 7 with the Spatial Autoregressive Regression (SAR) model. Polasek and Sellner (2008) have definitively adapted this method and it has been effectively extended to cross-sectional data based on a spatial autoregressive model, for panel data and for spatial flow models. An implicit assumption of the Chow-Lin approach is the summability of disaggregated variables to aggregate variables, a property that holds for so-called intensive variables. This paper shows how to extend the Chow–Lin spatial approach for cross-sectional data to non-extended or intensive variables, such as growth rates. A widely used and congenial method in applied econometrics, consisting of introducing a spatial autocorrelation term into a classical multivariate regression model (Giacalone 2021).

To derive the indicators at a territorially disaggregated level from the corresponding aggregate level indicators, three basic assumptions are defined that the model must respect:

  1. 1.

    Structural similarity: the aggregate and disaggregate models are structurally similar, which implies that the relationships between the variables considered are the same at the aggregate and disaggregate level, with the consequence that the regression parameters are the same in the two models;

  2. 2.

    Error similarity: spatially correlated errors have the same structure at both aggregate and disaggregate levels. This is equivalent to saying that the spatial correlations are not significantly different at the two levels;

  3. 3.

    Reliable indicators: the variables used as regressors have good predictive power at both aggregate and disaggregate levels, i.e., the goodness-of-fit measures of the regression are significantly different from zero.

The model proposed by Chow and Lin is built as a temporal disaggregation modelFootnote 8 of time series components, so it is modified and adapted to cross-sectional and spot analyses, already conducted in the econometric field by Bollino and Polinori (2007), by Mazziotta and Vidoli (2009a, b).

The model is characterized, on the one hand, by a functional relationship between synthetic indicators at the provincial level and a series of explicative variables observable at the disaggregated level (and, obviously, also at the aggregated level), a relationship which has been subjected to a first verification in previous work, again with reference to the infrastructural endowment, and, on the other hand, by a methodology of inference of the unknown parameters at higher (Top), Provincial and Regional, levels. The model assumes that at the disaggregated level the simple econometric relation of a linear model of the following type is valid:

$$y_{Down} = \beta_{Down} X_{Down} + \epsilon_{Down}$$

yDown is the vector of indicators at the disaggregated level, XDown is the matrix (n*k) of observable predictor variables at the disaggregated level. The dimensions of XDown are the number of levels of Down areas equal to p, if they are provinces with respect to the Top Region, m if they are municipalities with respect to its Top Province and l if they are Census Areas, while the number of explanatory variables as predictors of the model always remains k.

For structural similarity, the indicators shall be:

$$I_a = \frac{{\sum_{i = 1}^n I_d }}{n} \to \left\{ {I_{Reg} = \frac{{\sum_{i = 1}^p I_{prov} }}{p};I_{Prov} = \frac{{\sum_{i = 1}^m I_{Mun} }}{m};I_{Mun} = \frac{{\sum_{i = 1}^l I_{CEA} }}{l}} \right\}$$

For each spatial disaggregation of spatial indicators, it is also assumed that C is a matrix of dimension (nDown*NTop), where n is the number of Italian provinces, capable of transforming the disaggregated observations into those of a higher level (denoted by N), whatever the aggregation operator used. In particular, if the sum operator is adopted, regional estimates are obtained by summing the corresponding provincial levels (ya = ∑ yd) and the generic element Ci,j will be constructed as: Ci,j = 1, if province i ∈ region j 0, otherwise

$$C_{Reg,Prov} = \left[ {\begin{array}{*{20}l} {\frac{1}{{p_{Reg} }}} \hfill & \cdots \hfill & 0 \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill \\ 0 \hfill & \cdots \hfill & {\frac{1}{{p_{Reg} }}} \hfill \\ \end{array} } \right]C_{Prov,Mun} = \left[ {\begin{array}{*{20}l} {\frac{1}{{m_{Prov} }}} \hfill & \cdots \hfill & 0 \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill \\ 0 \hfill & \cdots \hfill & {\frac{1}{{m_{Prov} }}} \hfill \\ \end{array} } \right]C_{Mun,CEA} = \left[ {\begin{array}{*{20}l} {\frac{1}{{l_{SMA} }}} \hfill & \cdots \hfill & 0 \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill \\ 0 \hfill & \cdots \hfill & {\frac{1}{{l_{SMA} }}} \hfill \\ \end{array} } \right]$$

and the regional estimates will be obtained by averaging the provincial estimates (ya = E(yd)).

Under the following aggregate constraints ya = Cyd, Xa = CXd and εa = Cεd.

$$Prov \Rightarrow Reg:\left\{ {y_{Reg} = y_{Prov} C_{Reg,Prov} ,X_{Reg} = X_{Prov} C_{Reg,Prov} \wedge \varepsilon_{Reg} = C_{Reg,Prov} \varepsilon_{Prov} } \right\}$$
$$Mun \Rightarrow Prov:\left\{ {y_{Prov} = y_{Mun} C_{Prov,Mun} ,X_{Prov} = X_{Mun} C_{Prov,Mun} \wedge \varepsilon_{Prov} = C_{Prov,Mun} \varepsilon_{Mun} } \right\}$$
$$SMA \Rightarrow Prov:\left\{ {y_{Mun} = y_{SMA} C_{Mun,SMA} ,X_{Mun} = X_{SMA} C_{Mun,SMA} \wedge \varepsilon_{Mun} = C_{Mun,SMA} \varepsilon_{SMA} } \right\}$$

In the past literature, Polasek and Sellner (2008) presented an advance, or rather an interesting generalization of the model, consisting of the introduction of a spatial autocorrelation term in a classical multivariate regression model. From an application point of view, this means that the level of the dependent variable Y in each area depends not only on the independent variables considered but also on the level of the same variable Y in the surrounding areas.

In fact, if one assumes the existence of spatial correlation effects not only in the levels of competitiveness between provinces but also and especially within very similar provinces, one can hypothesize (Anselin 1988) that, given a matrix of spatial weights WN and a spatial lag parameter ρ ∊ [0,1], a "mixed autoregressive and spatial regressive relationship" is verifiable at the disaggregated level. This model is called SER, Spatial Estimated Regression:

$$y_a = \rho_d W_N + \beta_d X_d + \varepsilon_d \;{\text{with}}\; \varepsilon_d \ N\left[ {0,\rho_d^2 I_N } \right]$$

by development in series \((I_N - \rho_d W_N )^{ - 1}\)

$$E\left( {y_d | X_d } \right) = (1 - \rho_d W_N + \rho_d^2 W_{d }^2 + \cdots ) X_d \beta_d$$

With \(R_N = (1 - \rho_d W_N )\).

$$y_a = R^{ - 1} \beta_d X_d + R^{ - 1} \varepsilon_d \;{\text{with}}\; \varepsilon_d \ N\left[ {0,\sigma_d^2 (R_N^T R_N )^{ - 1} } \right]$$
$${\Sigma }_d = \sigma_d^2 (R_N^T R_N )^{ - 1}$$

\(y_a = \rho_d W_N y_a + \beta_d CX_d + \varepsilon_d \;{\text{with}}\; \varepsilon_d \ N\left[ {0,{\Sigma }_a I_N } \right]\)

we get \(\hat{\rho }_q\) and \(\hat{\sigma }_a^2\) consistent with the assumptions of structural similarity.

$$\rho_a = \hat{\rho }_q , \beta_d = \hat{\beta }_a \;{\text{and}}\; \sigma_d^2 = \hat{\sigma }_a^2 .$$

Regarding the estimation of βa according to the classical Chow-Lin approach, we obtain.

$$\hat{\beta }_a = \left( {X_a^T \left( {C{\hat{\Sigma }}_d C^T } \right)^{ - 1} X_a } \right)^{ - 1} X_a^T (C{\hat{\Sigma }}_a C^T )^{ - 1} y_a$$

and we can finally estimate the disaggregated dependent variable y at the administrative Down level, first of the Municipality, and then of the SCA with the 2011 Census and territorial bases.

$$\hat{y}_d = R_N^{ - 1} \hat{\beta }_a X_d + {\hat{\Sigma }}_d \left( {C{\hat{\Sigma }}_d C^T } \right)^{ - 1} (y_a - C\hat{R}_N^{ - 1} C^T \hat{\beta }_a X_a ).$$

Istat calculates mortality tables, biometric functions, and life expectancy, at birth (e°) and at all ages, or age classes, each year, by provincial aggregates, onlineFootnote 9 from 1974 to 2021. Life expectancy is a particularly important indicator of an area's quality of life. The lack of granularity of indication that Istat limits to only the provincial aggregate (i.e., ISTAT 2006) limits its power, as it would be a most useful tool in welfare and public health policies if made available to municipal and sub-municipal administrative levels. Life expectancy has a highly variable structure both territorially and by sex discrimination. Women have a higher e° than males, on average by almost 4 years (in Italy), due to essentially biological and then social differences.

Since the census variables (as of 2011) are generally structured into sum data, Males and Females, and Males-only data (whereby the Females figure is derived by difference), it was considered to evaluate the auto regression spatial model only for Males on e°, thus skimming the variables referring to that sex (Figs. 1, 2, 3, 4, 5, 6).

Fig. 1
figure 1

Regional and provincial nodes

Fig. 2
figure 2

Municipal nodes in Province of Taranto and Apulia Region

Fig. 3
figure 3

Boxplot of regional residuals (1 = GLM model, 2 = SER Model)

Fig. 4
figure 4

Boxplot of provincial residuals (1 = GLM model, 2 = SER Model)

Fig. 5
figure 5

GLM Model residual distribution

Fig. 6
figure 6

SER Model residual distribution

The analysis data used were the Istat territorial bases,Footnote 10 where there are all the geographical reference shape files, based on the administrative units of work, from the Top of the Regions up to the Municipalities and the SMAs, for the largest municipalities. The granular basis of all data was the census section, for Istat data, which through linked operations on mixed queriesFootnote 11 produced the reference shape files for analysis in R Cran. The e° of the reference administrative levels, down to the province, were also linked.

For the choice of the best explanatory variables as predictors of the dependent variable (e°), a series of simple, concatenated linear regressions of life expectancy at birth in 2011 was run, starting first with all 152 explanatory variables, gradually eliminating those less representative, where the coefficient was zero or very close to zero, since the model with a very high number of variables does not allow, in R, the definition of the S.E, T test and P value. Regression models were estimated on the provincial data. At the first step of the 152 variables, those referring to Males were first selected and then reduced for those with coefficient significantly equal to zero, with a 95% confidence level.

$$\hat{e}_{Mun}^o = \hat{\beta }_1 P_1 + \hat{\beta }_2 P_2 + \cdots + \hat{\beta }_{31} E_{31} + \varepsilon_{Prov}$$

The remaining 67, 2 of code H, 2 of code P, 2 of code FS and 2 of code FO, in the next step were reduced to 20, more significant: 16 of code P and 4 of code FO, up to 99.9% confidence level.

Using an autoregressive spatial model (SAR) is useful both for analyzing the spatial relationships between observations of different areal levels, and for scaling knowledge to lower levels, with extrapolation of sub-area indicators. The SAR model considers observations at a specific location as influenced by observations in the spatial vicinity. This allows you to capture the spatial dynamics within the data.

The relationship between the spatial model and the temporal model, in terms of frequency conversion, allows greater usability of implementation and application of the SAR model, with the synchronization of spatial and temporal data, so that each spatial observation is associated with a moment in time specific, allowing the definition of time series from spatial distribution, however within the limits of the qualitative validity of the data and in the face of not high and homogeneous levels of variability between geographical units.

With a semi-parametric P-Spline model, such as space–time ANOVA, for spatial data, one can include a uniform space–time trend, a spatial lag of the dependent and independent variables, a time lag of the dependent variable and its lag spatial and an autoregressive noise of the time series. Specifically, we consider a spatio-temporal ANOVA model, disaggregating the trend into spatial and temporal main effects, as well as second and third-order interactions between them.

Having assessed the goodness of fit of the SAR model to the data, we use goodness-of-fit measures and statistical tests to determine whether the model can satisfactorily explain the spatial and temporal variations in the data.

Interpret SAR model coefficients to understand how spatial relationships influence temporal dynamics. For example, suppose you have a positive value in a SAR coefficient. In that case, this suggests that observations in the spatial vicinity have a positive effect on the value of the variable over time.

3 A spatial dependence linear modelling

Spatial autocorrelation measures the degree to which a phenomenon of interest is related to itself in space (Ayuga-Téllez et al. 2011). In other words, similar values appear close to each other, or clusters, in space (positive spatial autocorrelation) or close values are dissimilar (negative spatial autocorrelation). Zero spatial autocorrelation indicates that the spatial pattern is random (Drago and Hoxhalli 2020). We can express the existence of spatial autocorrelation with the following moment condition:

$${\text{Cov}}\left( {{\text{y}}_{\text{i}} ,{\text{y}}_{\text{j}} } \right) \, \ne \, 0{\text{ for i}} \ne {\text{j}}$$

with yi and yj being observations of a spatially localized random variate at position (i,j), one should either estimate N, from the N covariances of the N observations themselves, or perform heavy iterative computational methods. Alternatively, applying spatial econometric analysis methods, theory is extensively elaborated by Anselin and Bera (1998) and Arbia (2014) and the practical aspect is an updated version of Anselin (2003). We introduce some restrictions in defining for each data point a relevant "neighborhood set", which in spatial econometrics is operationalized through the matrix of spatial weights. The matrix usually denoted by W of size N × N is positive and symmetric denoting in the first of each observation those places that belong to its surroundings set as non-zero elements (Anselin and Bera 1998), Arbia (2014), with characterization:

$$W_{i,j} \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\;j \in N\left( i \right)} \hfill \\ 0 \hfill & {\,} \hfill \\ \end{array} } \right.$$

N(i) is the set of (spatial) neighbours with position j, with diagonal values equal to 0. The criteria of spatial specification of proximity are various, but the main onesFootnote 12 are essentially two: the criterion "Rook", where two units are neighbours if they share a side, and the criterion "Queen", where, instead, the two units are neighbours if they share a side or an edge.

The "queen" model, compared to the "tower" model, allows the establishment of more links between adjacent areal nodes, especially in territorial situations where the geometries are very variable in terms of areas, dimensions, and shapes. This is very noticeable at the lower levels, such as for the provinces and municipalities, which are our main analysis objectives. The “queen” model guarantees better levels of quality of determination and validity of the model and results.

In addition to the neighbourhood location criterion, there is the distance evaluation criterion, within ranges defined by j \(\in N\left( i \right) {\text{se}} d_{i,j} < d_{max}\) determined a priori. In our study we use the queen criterion of the R packages spdep, spatialreg, rgdal, maptools, leaflet and RcolorBrewer, to get the weights matrix we use two functions, poly2nb and nb2listw, the first one that builds a list of neighbours, if the queen = TRUE option is specified it will be built using the queen criterion, the second one for calculating w for the spatial weights. In this article, the method used of node proximities was only the "Queen" one, at the Regional, Provincial, Municipal and SMA levels.

Processing was done with R codes in R Studio version 1.4.1717: the principle package is spdep (spatial dependence: weighting schemes, statistics and model), a collection of functions to create spatial weights matrix objects from polygon contiguities, from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial autocorrelation, including global Moran's Test; package spData to import and use territorial bases, census files and ESRI files; sp for generic spatial data analysis. The script of the software is in appendix.

4 Results

The application of SER at the regional level, shows only a live improvement of lower variability within the first and third quartiles, but this is since the number of regions is only 20, and the distance nodes created are few, there are 2 regions (Sardinia and Sicily) isolated, so only 62% of the nodes are non-zero. Much more significant is the analysis of nodes and weights at the provincial level, which instead generates a relevant number of nodes, with a good matrix of W (provincial) weights, with as many as 490 valid, non-zero nodes, generating 95% non-zero Wp weights. The difference in the nodes and weights generated, brings much more efficiency in the provincial level, with a very strong attenuation of the variability and covariability of the predictors, which is strongly attenuated in the SER spatial autoregressive model set up. This poses a strong quality of the produced estimates, which at the provincial level can be evaluated in the estimation error, compared to the actual value of e° that at the provincial level is detected.

The elaboration on all Regions (20) determined neighbour's nodes of 62 number by nonzero, percentage nonzero weights of 15.5, an average number of links of 3.1 and 2 regions with no links: the Islands of Sicily and Sardinia. The elaboration on all Provinces (110) determined neighbour's nodes of 490 number by nonzero, percentage nonzero weights of 4.05, an average number of links of 4.45 and 0 provinces with no links. The elaboration on all Municipalities (8092) determined neighbour's nodes of 47,638 number by nonzero, percentage nonzero weights of 0.07, an average number of links of 5.88 and 14 Municipalities with no links (a little municipal island). The graphical representation of the total nodes on municipalities is very dense, so we prefer to report an example on a single region (Apulia, with 258 nodes of municipalities, 1364 links and 1 municipality with no links (Tremiti Island). The application of the Chow-Lin type Top Down territorial disaggregation method, to the life expectancy indicator at birth and e° detected by Istat up to the administrative level of the provinces, from the life tables, allows to make an assessment of the robustness and efficiency, through the comparison between the variable elaborated by Istat, up to the provincial level, with the estimate of the indicator and the residuals of both linear regression (with lm package of R) and of autocorrelated spatial regression (lagsarm of R).

Among the best predictors of the census and income variables, 8 were selected which demonstrated, compared to others, a good interaction in terms of active regression on the estimation variable, with no know term. Figure 7 shows the spatial structure of life expectancy at birth on a municipal basis: it assumes a very linear and continuous trend, with a very detailed level of information; the low variability and the Moran Test show a strong robustness of the disaggregated indicator.

Fig. 7
figure 7

Estimated e° with linear SAR model, in municipalities

The SAR model used was developed on 2011 census data and variables, with integration of tax data on disposable income, and defined with a two-step analysis.

All census variables were preliminarily tested to determine the main components based on the p-value and the model determination index. The census variables that presented the best relationships were: P2, P18, P32, P44, P54, P64 and P66:

The proxy on economic conditions, detected by the Remx income available at various levels, has shown great reliability. The model has no intercept:

$$\hat{e}_{Mun}^o = \hat{\beta }_1 disp_{income} + \hat{\beta }_2 P_2 + \hat{\beta }_3 P_{18} + \hat{\beta }_3 P_{18} + \hat{\beta }_4 P_{32} + \hat{\beta }_4 P_{44} + \hat{\beta }_5 P_{54} + \hat{\beta }_6 P_{64} + \hat{\beta }_7 P_{66} + \varepsilon_{Prov}$$

Coefficients:

Estimate Std. Error t value Pr( >|t|).

Remp 5.917e − 03 1.591e − 04 37.196 < 2e − 16***

P18 3.730e − 03 9.600e − 04 3.885 0.000188***

P2 − 1.179e − 03 3.984e − 04 − 2.959 0.003889**

P32  − 1.286e − 03 2.408e − 03 − 0.534 0.594462.

P44 6.720e − 03 2.255e − 03 2.981 0.003644**

P54 5.885e − 04 2.988e − 04 1.969 0.051792.

P64 1.157e − 03 3.684e − 04 3.141 0.002242**

P66 7.105e − 04 6.773e − 04 1.049 0.296770.

5 Managerial implications

The managerial implications of this paper are that a top–down disaggregation method using linear self-regressive spatial models can be used to estimate life expectancy at the municipal level using census data from 2011 in Italy. This method utilizes census variables and accident data as predictors and is based on the assumptions of structural similarity, error similarity, and reliable indicators. The use of this method allows for a more detailed understanding of life expectancy at the local level and can inform policy decisions related to public health and well-being in specific areas. Additionally, the use of census data and the ability to estimate life expectancy at sub-municipal levels may also provide valuable information for businesses and organizations looking to make decisions about investments or operations in specific regions. Moreover, life expectancy at sub-municipal levels can have a variety of impacts on businesses and organizations considering environmental energy efficiency (Li et al. 2022) and excessive consumption (Yang et al. 2022). For example, areas with higher life expectancies may have more older residents who are more likely to be at home during the day and therefore more likely to use energy, while areas with lower life expectancies may have a higher proportion of working-age individuals who are away from home during the day and therefore less likely to use energy. Additionally, areas with lower life expectancies may have lower economic resources, making it more difficult for residents to invest in energy-efficient technologies or make other changes to reduce energy consumption. These factors could influence decisions made by businesses and organizations regarding energy efficiency and consumption.

6 Limitations and further research

There are several limitations to research applying a methodology of "Top–Down spatial disaggregation, using census data from 2011 in Italy." Data availability: The study relies on census data from 2011, which may not be representative of the current population or may not include all relevant information. Additionally, the use of ISTAT elaborations of annual mortality tables may not include all necessary data for the analysis. Further research could use more recent data and explore other data sources to improve the robustness of the results. Assumptions: The study relies on several assumptions, such as structural similarity and error similarity, that may not always hold true in reality. Further research could explore alternative methods or models that do not rely on these assumptions. Spatial correlation: The study assumes that errors are spatially correlated and that this correlation structure is the same at both aggregate and disaggregate levels. Further research could use more advanced models to capture spatial correlation, such as spatial econometric models. Model limitations: The study uses a linear self-regressive spatial model, which may not be appropriate for all cases and could lead to oversimplification of the data. A limitation of the model is that it assigns equal weight, and always, to all adjacent nodes, on the "queen" and "rook", therefore the same Wi,j. By introducing a discriminant on the distance between nodes on Wi,j, benefits could be created in terms of the validity of the SAR model. Given that all relationships between nodes generate, however, interactions and dependencies both between the nodes and in the border sub-nodal units, the weight of each node cannot be eliminated or significantly reduced. Considering the barycentric distances of the nodes \(d_{i,j} > 0\):

$$W_{i,j} \left\{ {\begin{array}{*{20}l} {\frac{{2 - \frac{{\max d_{i,j} }}{{d_{i,j} }} }}{2}} \hfill & {{\text{if}} j \in N\left( i \right)} \hfill \\ 0 \hfill & {\,} \hfill \\ \end{array} } \right.$$

Future research could consider using alternative models such as non-linear models. Scale of analysis: The study focuses on municipal levels, which may not be granular enough to capture all the relevant variations in life expectancy. Further research could consider sub-municipal level (Census Area) where the variations are also present. It is important to note that these limitations should be considered when interpreting the results and conclusions of the study. Future research could aim to address these limitations by using more recent data, incorporating more variables, and using more sophisticated models. Moreover, the next step would be useful to analyse country-specific contributions to the increase of the best-practice life expectancy (Nigri et al. 2022). For example, could be useful in additional multi-country clustering-based forecasting of healthy life expectancy (HLE). In fact, according to Levantesi et al. (2023), the HLE is an indicator that measures the number of years individuals at a given age are expected to live free of disease or disability.

7 Conclusions

The determination of granular spatial data, particularly public health data such as life expectancy, is essential for both decision makers and public institutions, and for researchers. Istat, like many other institutes in Europe, has all the information to directly provide data on life expectancy on a municipal basis, without the need to estimate any breakdown, but unfortunately it does not make them public. While, at the processing level, it could use this method to spatially disaggregate data from the municipal level to the sub-municipal level, with much more precision than we could do in this application, as we have moved from the provincial level (NUT3) to the municipal level. Integrating life expectancy data in small areas into the official public statistics of the National Statistical Institutes would result in much deeper and more precise levels of health knowledge. In the present paper, we address an innovative topic and application framework for measuring, by the definition of territorial indicators such as the biometric function of citizens life expectancy, the performance at a highly disaggregated level. Moreover, we apply a methodology of Top–Down spatial disaggregation, offering a perspective that has been pursued to a limited extent. Considering a decentralized country such as Italy as a case study and micro-territorial information, a composite indicator of citizens life expectancy at the municipal and sub-municipal level was first proposed. Then, using a spatially disaggregated methodology, derived from the well-known Chow–Lin techniques, a municipality-level indicator was estimated to identify the citizens life expectancy. This method avoids artificial assumptions, and thus provides objective results that successfully realize self-regressive linear models in the spatial context (Tang et al. 2021). The potentially very high number of explanatory variables in spatial regression needs to be addressed by some dimension reduction method (Xia and An 1999). At present time, projection pursuit (PP) appears to be the most studied dimension reduction method. Some review on the topic might be found in Sun (2006), Jee (2009) and Loperfido (2018, 2019). To the best of our knowledge, however, there are no papers applying PP to spatial data. We are currently investigating an extension of the works of Galeano et al. (2006) and Loperfido (2020), who applied PP to multivariate time series.