1 Introduction

Throughout the twentieth century, mortality in most industrialized countries became concentrated at older ages and noncommunicable diseases became the prevailing cause of death. A female survival advantage emerged (Goldin and Lleras-Muney 2019; Hollingshaus et al. 2019) and the male–female gap widened from the beginning of the 1920s until the 1970s and then it narrowed (Austad 2006; Zarulli et al. 2020, 2021). The reduction in the gender gap was primarily due to narrowing male–female differences in cardiovascular disease-related mortality, smoking-related mortality, and mortality due to accidents, violence, and suicide, especially in European countries (Meslé 2004; Trovato & Heyen 2006; Nusselder et al. 2010; Horiuchi et al. 2013; Klenk et al. 2016). Existing literature has also shown that, with convergence taking place, men have experienced more rapid gains in survival than women (Oksuzyan et al. 2010; Meslé and Vallin 2011; Thorslund et al. 2013); furthermore, although decreasing gender differences in life expectancy persist over time, in the last decades, the contribution of age older than 80 years to the gender difference in life expectancy is increasing (Beltrán-Sánchez et al. 2015; Zarulli et al. 2020, 2021). However, whether the shift toward older age translates into a more compressed or more dispersed age- and cause-specific gender gap in life expectancy is still not clear. On the one hand, some studies argue that old-age deaths are decreasing at advanced ages; on the other hand, others argue that old-age deaths are becoming more dispersed with age (Bongaarts 2005; Canudas-Romo 2008; Zuo et al. 2018; Janssen and de Beer 2019). The overall scenario is even more complicated because different causes of death may have different age-specific distributions in their contribution to the gender gap in survival.

This study aimed to investigate the presence of common underlying patterns, across multiple countries, over time. However, this type of analysis faces the aforementioned problems of complexity and multidimensionality. Therefore, we adopted a Functional Data Analysis (FDA) approach (Ramsay and Silverman 2005), a method that only recently appeared in demographic analyses and is very promising with respect to the analysis of highly multidimensional problems. The basic idea behind the FDA is that discrete observations arising from data can be expressed in the form of a function (functional data) that represents the entire measured function as a single observation, and then modeling and prediction information can be drawn from a collection of functional data by applying statistical concepts from multivariate data analysis (Ramsay and Silverman 2005; Ullah and Finch 2013). Such data are typically interpreted to reflect the influence of certain smooth functions that are assumed to underlie and generate observations. In contrast to other statistical techniques, FDA takes advantage of additional information that could be implied by the smoothness of underlying functions (Ramsay and Silverman 2005; Ullah and Finch 2013). FDA methods for modelling and forecasting data across a range of health and demographic issues have significant advantages for better understanding trends, risk factor relationships, and the effectiveness of preventive measures (Erbas et al. 2007; Hyndman and Ullah 2007; Ullah and Finch 2013). This approach has become popular in demographic applications, especially in forecasting mortality, fertility, and migration (Erbas et al. 2007; Hyndman and Ullah 2007; Hyndman and Booth 2008; Hyndman and Shang 2009; Ullah and Finch 2010). Recent studies have used FDA to explore changes in age-specific mortality over time in low-mortality countries (Léger and Mazzuco 2021) and developing countries (Shair et al. 2019) as well as to analyze cause-specific mortality trends across countries (Stefanucci and Mazzuco 2022), successfully enhancing the accuracy of the analyses and providing innovative insights.

In this study, FDA was applied to the relative age-cause contributions to the gender gap in life expectancy to study the changes in age- and cause-specific mortality of several countries over time. More specifically, we applied a Functional Principal Component Analysis (FPCA) to the age profiles of cause-specific contributions to identify the main components of the distribution of age-specific contributions according to causes of death and thereby summarize them with a few components that will be easier to interpret.

2 Causes-of-death contributions to the gender gap in life expectancy: review of the literature

Examining gender differences in cause-specific mortality is important because it provides additional insights into the mechanisms that create a gender gap in overall mortality (Rogers et al. 2010). With respect to women, men experience a higher risk of death from almost all causes (Rogers et al. 2010). Men are more likely than women to engage in risky and aggressive behaviors, such as tobacco and alcohol consumption, as well as risky driving and eating habits; these differences generally decrease with age (Gjonça et al. 2005; Rogers et al. 2010). For instance, compared with women, men’s greater exposure to smoking for longer periods of time contributes to various smoking-related diseases, such as lung cancer, emphysema, and cardiovascular diseases, which increase the risk of overall mortality.

The three most acknowledged causes of death that determine the overall gender gap in mortality are cancer, cardiovascular diseases, and external causes of death.

Lung cancer is one of the most common cancers, and smoking has played a crucial role in its prevalence (Ferlay et al. 2018). Smoking patterns vary across Europe, with men generally starting smoking earlier and having higher prevalence rates than women (Janssen 2020). This trend has shown variations in countries such as the United States and Sweden (Beltrán-Sánchez et al. 2015). The effect of smoking on gender differences in mortality has been significant, with smoking contributing to a large proportion of the gender gap in European mortality. However, the contribution of smoking to gender gap has declined over time (Janssen 2020; Janssen and van Poppel 2015; Luy and Wegner-Siegmundt 2015; Meslé 2004). The prevalence of lung cancer among men has either stabilized or declined, whereas it has continued to increase among women in recent years (Stefanucci and Mazzuco 2022). In addition to lung cancer, breast, colorectal, and prostate cancers have contributed to the overall cancer burden in Europe (Ferlay et al. 2018). Trends in cancer incidence and mortality vary by gender, age group, period, and country (Zatoński and Didkowska 2008; Gjonça et al. 2005).

In addition to cancer-related mortality, cardiovascular diseases are a major contributor to the gender gap in survival, with excessive male mortality rates observed in several countries (Klenk et al. 2016; Beltrán-Sánchez et al. 2015). However, recent studies have reported a decline in cardiovascular diseases-related mortality in both men and women across Europe (Di Girolamo et al. 2020; Sanders 2018; Conti et al. 2003). Alcohol consumption has been closely linked to cardiovascular disease mortality, particularly in Eastern European countries, and has contributed to gender differences in life expectancy (Pinkhasov et al. 2010; Trias-Llimós et al. 2018; Tarkiainen et al. 2012; Meslé et al. 2002).

External causes of death, such as accidents and unhealthy lifestyles, also play a substantial role in the mortality differences between women and men. Studies examining the external causes of death in Eastern European countries have identified unhealthy lifestyle and accidents account for more than one third of the overall gender gap in life expectancy (Luy and Wegner-Siegmundt 2015; Trias-Llimòs and Janssen 2018). Finally, women have a higher lifetime prevalence of mental health disorders, such as depression, whereas men are more likely to suffer from substance use and impulsive disorders (Ho and Hendi 2018; Ramsay et al. 2020; Kuehner 2017; Boyd et al. 2015).

Several studies have focused on the contributions of the causes of death to the gender gap in mortality rates to explain the recent narrowing of the gender gap in life expectancy (Glei et al. 2010; Yang et al. 2010; Nusselder et al. 2010; Désesquelles et al. 2012; Zatonski and Bhala 2012; Rasul 2013; Klenk et al. 2016; Chisumpa and Odimegwu 2018). Numerous studies have decomposed the gender gap in life expectancy according to age and cause of death (Meslé 2004; Canudas-Romo 2008; Luy and Wegner-Siegmundt 2015; Trias-Llimós and Janssen 2018; Hollingshaus et al. 2019; Aburto et al. 2020; Sagna et al. 2020). A few studies have analyzed the evolution of the distributions of the contributions of both ages and causes, and how these would determine the overall gender gap in life expectancy over time (Gómez-Redondo and Boe 2005; Tarkiainen et al. 2012; Sundberg et al. 2018; Sagna et al. 2020; Feraldi and Zarulli 2022). Nevertheless, the gender gap in life expectancy has undergone complex and poorly understood evolution.

Arriaga’s age- and cause-specific decomposition (Arriaga 1984) is one of the most common decomposition techniques and represents a useful tool for estimating mortality differences for specific causes of death and age groups, contributing to the total difference in life expectancy (Preston et al. 2001, pages 84–86). However, traditional demographic methods of decomposition, such as Arriaga’s method, when simultaneously applied to a broad array of countries over many years, provide an innumerable set of data points that are difficult to summarize in a compact but complete picture. Hence, the custom focuses only on a subset of the data or produces repeated similar analyses and plots by time and/or by country for the age-cause combinations. This makes interpretation and investigation of the underlying general patterns cumbersome.

Even the Truncated Cross-sectional Average Length of life (TCAL) indicator (Canudas-Romo and Guillot 2015; Nepomuceno and Canudas-Romo 2019; Canudas-Romo et al. 2020; Sauerberg and Canudas-Romo 2022), a common measure that summarizes the evolution of cohort- and period-specific information on cause-specific mortality, does not explain time trends (i.e., reducing dimensionality by collapsing one component).

3 Aim and contribution

If we consider the time trend of the relative age-cause contributions, we have not only points but also functions. Therefore, in this study, we analyzed the changes in the contributions of age and causes of death (which were estimated using Arriaga’s age- and cause-specific composition tool) from a functional perspective, which has been shown to be a more informative approach and has significant advantages (Ramsay and Silverman 2002; Léger and Mazzuco 2021).

First, FDA has the advantage of generating models that can be described by continuous smooth dynamics, representing a useful tool for expressing curves in a desirable reduced-dimensional space, which allows for accurate estimates of parameters for use in the analysis phase (Ullah and Finch 2013; Ramsay 1988). The FDA has been used to model age-cause contributions to the gender gap because it has the advantage of being less affected by outliers than traditional statistical methods. The FDA uses all the information in the data to generate a smooth linear function, which helps reduce the impact of period shocks and other random fluctuations in the data. This makes it easier to identify trends and patterns in the data that may not be apparent using other methods.

Second, FDA is a useful tool not only for data smoothing, but also because, by transforming the observed discrete data into functional data, it is possible to apply FPCA to synthesize the temporal dynamics by cause and age of the GGLE with few components, which is not as simple as traditional methods, such as Arriaga’s decomposition.

Therefore, FPCA allows the synthesis of large datasets, thereby increasing the interpretability of the data. We propose to apply FDA to the relative age-cause contributions to the Gender Gap in Life Expectancy (GGLE) to describe the global trend of age- and cause-specific mortality in several countries over time. We defined the proportional variable as the age profiles of the cause-specific contributions to the GGLE for each country and year. FDA was applied to the age profiles of each cause of death. Using relative rather than absolute contributions makes it possible to appreciate the changes (increase/decrease) in the age-cause contributions regardless of the overall trend in the GGLE. Considering the GGLE varies over time (increase/decrease), the absolute age-cause contributions (i.e., years) mostly follow the trend of the overall gender gap. However, the contributions of certain causes and age groups may show the opposite trend in relative terms with respect to the trend in the overall GGLE. Additionally, relative contributions allow to appreciate differences between countries and over time in a standardized way, thus abstracting from the potential differences in extent related to absolute measurements.

More specifically, we proposed an FPCA of the age profiles of the relative contribution of 20 countries to identify the main components of the distribution of age-specific contributions according to the causes of death.

Notably, the cause-of-death data suffer from several irregularities. Small numbers that can be observed for less frequent causes of death raise statistical issues concerning accuracy and thereby question the usefulness of these data (Ullah and Finch 2013). Data on the causes of death also suffer from variations in coding protocols. The underlying causes of death were identified from the death certificates and coded according to the International Classification of Diseases (ICD). The coding of an underlying cause of death depends on (i) the information available on the death certificate, (ii) the interpretation of explicit coding or selection rules, and (iii) the individual deliberations of the coders (Treurniet et al. 2004; Harteloh et al. 2010). Additionally, the ICD varies with time and place; therefore, coding could hinder the interpretation of trends in time or differences between populations. Through curve smoothing, FDA facilitates effective and appropriate data noise reduction (Ramsay 1988; Ullah and Finch 2013; Léger and Mazzuco 2021). Although methods to harmonize different ICDs already exist (see Meslé and Vallin 1996), some discontinuities (see Mazzuco 2022, Figure 5) and missing data (see Stefanucci and Mazzuco 2022) persist in the causes-of-death data, even within the same ICD. Dealing with the causes of death likely leads to the handling of sparse data, and consequently, irregularities in estimates. In such cases, the smoothing properties of FDA become useful. Finally, by applying FPCA, the age pattern of cause-specific contributions to GGLE can be summarized with a few components that are easier to interpret. This also allows for an analysis of how these components evolve over time for the considered countries, similar to what Léger and Mazzuco (2021) performed with overall age-specific mortality. They showed that between 1960 and 2018, age-specific mortality rates in 32 developed countries followed the same evolution pattern (i.e., reduction in infant mortality, increase in premature mortality, and shift and compression of deaths), but with different timings. FPCA is useful for dimension reduction of the curves. In particular, the eigenfunctions allowed the identification of the main directions of variability of the age-cause contribution profiles with respect to the mean curve. The corresponding scores of the functional components were used to characterize the age-cause contribution profiles in a reduced dimensional space.

Since the most recent period is characterized by low mortality for both men and women, as well as greater gender differences in old age, we can expect that the overall GGLE would be mainly determined by a different distribution of mortality across ages rather than by overall differences between women and men in specific causes of death. Additionally, in light of the ongoing reduction in the female–male mortality gap in the most recent period, the distinctive GGLEs observed across countries could be determined by different compositions of causes of death.

The results of this study contribute to research related to the measurement of population changes, providing important insights into the main contributing factors of gender differences in mortality, specifically highlighting the changes in the role of age and causes of death in determining the overall GGLE across European and extra-European countries over a long period. This study illustrates the advantages derived from the demographic application of FDA. To the best of our knowledge, this is the first attempt to analyze age- and cause-specific contributions to the gender gap in life expectancy using an FDA approach.

4 Data and methods

4.1 Data

Cause-specific mortality data were retrieved according to gender, 5-years age interval, and year from the WHO Mortality Database (WHO 2022), and life tables were obtained from the Human Mortality Database (HMD 2022). The HMD and WHO Mortality Databases are open-source projects: the former offers harmonized data on a constructed series of mortality rates, life tables, death counts, and population exposures, whereas the latter is an archive of the causes of death information for several countries. Although the longest time series for many countries started in 1950 and ended in 2019, information is available only from 1959 to 2015. To consider the same time window for each country, we focused on the years 1959–2015 and only considered countries for which data were available in this time frame. Our final analysis focused on 20 countries: Denmark, Finland, Norway, Iceland, Sweden, Austria, Belgium, Switzerland, France, Ireland, the Netherlands, the United Kingdom (UK), Italy, Spain, Hungary, Poland, Australia, Canada, Japan, and the United States (US). These attempts differ in several respects. However, the aim of this study was to find a common underlying pattern in the age-related contribution to GGLE, even though these countries may have experienced changes in the composition of GGLE at different times.

To deal with the different classification of causes of death overtime (from ICD 7th to ICD 10th revision), we followed Canudas-Romo et al. (2020) and use broad classes of causes which are consistent across the revisions (Stefanucci and Mazzuco 2022): (1) lung cancer, (2) other neoplasms, (3) diseases of circulatory system, (4) external causes of death, (5) respiratory diseases, (6) diseases of digestive system, (7) endocrine, nutritional and metabolic diseases, and (8) infectious and parasitic diseases (see Table 2 in the Appendix for the details). Given that old age are the most important age in determining the difference in life expectancy (Zarulli et al. 2020; Zarulli et al. 2021 PNAS; Feraldi and Zarulli 2022) and to avoid problems related to the cause of death data quality (Ullah and Finch 2013), which are particularly common at very old ages, the analysis was restricted to age groups from 50 to 85 years.

4.2 Functional principal component analysis (FPCA)

Absolute age- and cause-specific contributions to the GGLE (female–male) were obtained for each country and over time applying Arriaga’s age- and cause-specific decomposition technique, combining life tables from the HMD and cause-specific mortality data from the WHO mortality database (Arriaga 1984). First, the difference in life expectancy was decomposed according to the contribution of each age group. In the second step, the contributions from each age group were divided into contributions from each specific cause of death. The total contribution from any given cause was obtained by summing the contributions of each cause of death across all age groups. Accordingly, the total gender gap in life expectancy was obtained by summing the contributions across all age groups and causes of death.

The FDA was applied to the age-specific relative contributions to the GGLE separately for each cause of death. Relative contributions were computed by dividing the absolute age- and cause-specific contributions by the overall GGLE (years). Accordingly, the sum of the relative contributions across all age groups and causes of death is equal to one, the sum of the absolute contributions across all age groups and causes of death is equal to the overall GGLE (in years). Discrete age-specific relative contribution data x(t1),…, x(tN) were assumed to be independent realizations drawn from the same continuous stochastic process X(t) (Ramsay and Silverman 2002). To obtain the functional representation, each X(t) was approximated using a basis expansion of cubic B-spline functions (1), and the B-spline basis coefficients were estimated using the ordinary least squares method, minimizing the sum of squared residuals (Ramsay and Silverman 2002). B-spline functions are commonly used for nonperiodic functional data because they divide the interval over which the function is approximated into subintervals separated by knots. Each subinterval has a polynomial of a specified order and adjacent polynomials join smoothly at knots. The B-splines ensure the continuity of the curve at the knots. The underlying functions are usually smoothed to capture the structural components of the data and reduce noise. The number of knots was used in the regression splines as a regulation parameter with fewer knots resulting in smoother curves.

A relatively large number of knots and a roughness penalty acting on the associated give the B-spline functions great flexibility in approximating complex curves with a relatively small number of parameters (see Léger and Mazzuco (2021) and Ramsay et al. (2009) for more details on B-spline functions). B-spline functions capture the underlying structure of the data while minimizing the effect of noise more accurately than conventional splines using a monomial basis; consequently, they are generally preferred over other bases (Ramsay et al. 2009). Therefore,

$$X\left( t \right) = \mathop \sum \limits_{j = 1}^{p} c_{k} \phi_{k} \left( t \right)$$
(1)

where ϕk are p known basis functions and ck are the corresponding coefficients to be estimated. In order to maintain the data structure, we used a sequence of p = 8 equally distributed knots (i.e. one for each 5-years age interval between 50 and 85). Afterward, we performed FPCA separately for each cause of death to synthetize the variability of the curves and to identify the main components of the distributions of age-specific contributions according to causes of death, across countries. FPCA is the extension of the more classical multivariate PCA to functional data: for a generic curve xi(t) we can obtain the approximation (2)

$$x_{i} \left( t \right) = \mathop \sum \limits_{l = 1}^{\infty } c_{i,l} \phi_{l} \left( t \right)$$
(2)

where ci,l are the principal component scores and ϕl(t) are the eigenfunctions or harmonics. Therefore, the information on the curve xi(t) was then synthesized by the first q terms.

In addition, the selected FPCs were included as covariates in a multiple regression analysis of GGLE to determine how different causal compositions of GGLE impact the overall gender differences in life expectancy (LE). All analyses were conducted using R package fda (Ramsay et al. 2011).

5 Results

5.1 Changes in the overall GGLE over time

Figure 1 shows the absolute GGLE (years) over time for 20 countries and highlights the different patterns in the GGLE across countries. As expected, in all the countries the GGLE trend occurred in an inverted U-curve fashion, suggesting that GGLE undergoes three phases of growth, peak and stability, and decline (Liu et al. 2013).

Fig. 1
figure 1

Absolute (years) GGLE at age 50 over time, by country (dashed lines represent the years 1975 and 1995). (Color figure online)

However, the timing of these phases varies by country. The largest changes in the GGLE were observed around 1975 and 1995 (three time-intervals: 1959–1975, 1976–1995, and 1996–2015). Based on when GGLE started to decrease, we identified three patterns across countries: (a) “Early decreasing GGLE,” (b) “Late decreasing GGLE,” and (c) “Very late decreasing GGLE”.Footnote 1

While the GGLE in the first group of countries (Fig. 1 panel a) started to decrease shortly after 1975, the GGLE in the second group of countries (Fig. 1 panel b) was quite constant between 1975 and 1995 and started to decrease thereafter. Owing to the small population size, the GGLE in Iceland showed large variability. In this case, the smoothing properties of the FDA proved to be useful. In Spain, Japan, Hungary, and Poland (Fig. 1 panel c), the GGLE increased until 1995 and started to decrease very later than other countries. In a separate analysis, the results of which are not reported for the sake of brevity, the different trends in the absolute GGLE over time across countries were confirmed by computing correlation coefficients between GGLE and time, within each time interval separately.

The results of this study highlight the narrowing gender gap in life expectancy in recent decades; gains in life expectancy have been larger for males than for females. The reduction in the gender gap occurred at different times and at different pace across countries.

5.2 Contributions of causes of death to the GGLE

Lung cancer, other neoplasms, diseases of the circulatory system, and external causes of death (minor) made the largest contributions to the GGLE in all countries, accounting for more than 75% of the overall gap. Figure 2 shows the average relative cause-specific contribution to the GGLE over time for the years in which data were available for all countries. Between 1959 and 2015, diseases of the circulatory system made the largest contribution (approximately 43.0% on average), followed by lung cancer and other neoplasms (12.2–12.9%), and external causes of death and respiratory diseases (6.9–9.8%). Each remaining group of causes contributed less than 5.0% of the overall GGLE over the entire study period (see Table 3 in the Appendix for the list of causes included in each group).

Fig. 2
figure 2

Relative cause-specific contribution to the GGLE over time: average of 20 countries (red lines represent the years 1975 and 1995). Note Bars in 1997, 1998, 2000, and 2005 are missing because data were not available for all countries

The contribution of diseases of the circulatory system slightly increased within the first interval (1959–1975) but strongly decreased afterward, from 48.9% on average between 1959 and 1995 to 36.7% in 2015. The average contribution of lung cancer to the GGLE increased until 1975 and decreased within the last interval (1996–2015), whereas it remained constant at approximately 14.3% from 1976 to 1995 (intermediate time interval). Starting around 1975, the average contribution of other neoplasms to the GGLE significantly increased from 6.7% (on average in 1959–1975) to 21.8% in 2015. The contribution of external causes of death slightly decreased within the first two intervals (from approximately 9.0% in 1959 to less than 6.0% in 1985), but slightly increased afterward (approximately 8.0% in 2015). The contributions of respiratory and digestive diseases remained constant over time, whereas the differences owing to infectious diseases decreased. Finally, the contribution of endocrine, nutritional, and metabolic diseases to the GGLE appeared very late, with a small contribution of 2.9% in 2015. These trends were confirmed by computing the correlation coefficients between the relative cause-specific contributions across countries and time for each time interval. We do not present these results here for brevity.

This analysis suggests that the narrowing of the gender gap in LE is mainly driven by decreasing differences in circulatory system diseases. Gender differences in mortality due to external causes and infectious diseases also contributed to the narrowing of the gap, although their contribution to GGLE was lower. Conversely, gender differences in mortality due to other neoplasms contributed to the widening of the gap throughout the analyzed period. Finally, lung cancer widened the gap until 1975, whereas its contribution to GGLE decreased within the last period (1996–2015).

5.3 Functional principal component analysis

Almost the entire variability in age-specific contributions to the GGLE was explained by the first two functional principal components for all causes of death (more than 90%). The first component describes the level (greater/smaller contribution); that is, the extent of the cause-specific contribution to the overall gender gap in mortality. The second component captures the age pattern (location of the curves across ages) and describes the shift in age-specific contributions to older age. A classical way to interpret FPCs is to plot the group mean function (solid curve in Fig. 3) as well as the functions obtained by adding (+ curve) and subtracting (- curve) to the mean function twice the square root of the principal component variance (Ramsay and Silverman 2002).

Fig. 3
figure 3

Effect of the first two FPCs on age-specific contributions to the GGLE for lung cancer: overall mean and mean ± a suitable multiple of the principal component weight function

Regarding lung cancer (Fig. 3), the first FPC captured the extent of the contribution of lung cancer mortality to the overall gender differences. With respect to the overall mean contribution (solid line), the (+) curve shows a higher cause-specific contribution, whereas the (-) curve shows a lower cause-specific contribution. The largest contribution (i.e., modal age at contribution) was observed around the age of 65 years, and variations in FPC1 involved all ages. For a generic country, a high score for this component suggests a high contribution of lung cancer to the GGLE. The second FPC describes the locations of the curves across ages. With respect to the overall contribution mean (solid line), the (+) curve denotes higher contributions of older age and lower contributions from younger age. Conversely, the (-) curve denotes lower contributions of older age and higher contributions of younger age. Indeed, the modal age at contribution was young in the (-) curve and older in the (+) curve. Similarly, Fig. 4 shows that larger (smaller) values of the first FPC denote a higher (lower) contribution of external causes of death to the GGLE. Compared with lung cancer (Fig. 3), the contributions of external causes of death were mainly concentrated at younger ages. Variations in FPC2 mainly involve ages older than 60 years; therefore, the larger the score of the second component, the larger is the contribution of external causes of death at older ages. Contributions at younger ages (below 60 years) were similar regardless of the score of the second component. As the contributions of external causes of death are mainly located at younger ages, a high score of the second component indicates that the contribution of external causes of death spread over a wider age range (Fig. 4), that is, an expansion of the differences.

Fig. 4
figure 4

Effect of the first two FPCs on age-specific contributions to the GGLE for external causes of death: overall mean and mean ± a suitable multiple of the principal component weight function

With regard to lung cancer, for which the largest age contributions are located at older ages, the shifts toward older ages might be accompanied by a concentration of contributions over a narrower age range (Fig. 3), that is, compression of the differences. For the other causes of death, FPCA revealed similar results; the first and second components captured the entire variability of the contributions and described two distinctive dimensions (level and age pattern). The FPCA results related to all other causes of death are reported in the Appendix.

5.4 Multiple regression analyses

To analyze the effect of FPCs on GGLE simultaneously, we estimated multiple linear regression models between GGLE (dependent variables) and both the first and second FPCs for each cause of death (independent variables). For each of the three time intervals (1959–1975, 1976–1995, and 1996–2015), regression models were estimated separately and time (year) was included as a covariate to control for overall GGLE trends. Accordingly, these models allowed us to interpret the effect of FPC1 (extent of cause-specific contribution) and FPC2 (shift of age-specific contributions toward older ages) regardless of the overall trend over time in the GGLE in each period. Table 1 presents the estimated parameters and p-values (significance level = 0.01) of the three models. All models captured a large portion of the variability in the GGLE, especially in the last study period (adj-R2 = 65.0%, 69.0%, and 86.0% for the first, second, and last periods, respectively).

Table 1 Multiple linear regression between GGLE and both first and second FPC, for each cause of death, in three time intervals

In the first period (between 1959 and 1975), GGLE significantly increased over time (0.06, p < 0.01), whereas no significant variation was observed during the second time interval between 1976 and 1995 (p = 0.22). The GGLE significantly decreased over the last period, that is, 1996–2015 (− 0.04, p-value ≤ 0.01).

In the first period, positive contributions to the GGLE were observed for circulatory and endocrine diseases (positive parameters for FPC1: 1.67 and 31.38, respectively), whereas negative contributions were observed for other neoplasms, external causes of death, and infectious diseases (− 9.54, − 5.53, and − 13.29, respectively).

Compared with the external causes of death, the contributions of other neoplasms, circulatory diseases, and infectious diseases were mainly concentrated in older age groups (see Fig. 4 and figures in the Appendix). Negative parameters for FPC2 (− 21.32, − 4.74, and − 19.03 for other neoplasms, circulatory diseases, and infectious diseases, respectively) suggest that the shift toward older ages in the distributions of contributions of these causes involved a comparison of the differences over a narrower age range, which in turn corresponded to a reduction in the overall difference. Conversely, the distribution of the external causes of death contributions were mainly located at younger ages, and the shift in the distributions involved an expansion of the differences over a larger age range. Indeed, the FPC2 scores for external causes of death significantly contributed to a widening of the overall gender gap (17.17, p-value ≤ 0.01). Finally, changes in the distribution of lung cancer contributions, as well as respiratory and digestive diseases, did not significantly affect GGLE in 1959–1975 (p-values > 0.01 for both FPCs).

In the intermediate as well as in the last period, while the effects of other neoplasms and diseases of circulatory system became not statistically significant, the age pattern of the contributions of these causes still contributed to narrow the overall GGLE. The contributions of external causes of death switched from having a negative effect to having a positive effect on the GGLE.

Between 1976 and 1995, respiratory and digestive diseases showed negative effects. Since the contributions of other respiratory diseases were already mainly concentrated at older ages, the shift in the distributions of age-specific contributions toward older ages involved a slight compression of the distributions over a narrower age interval (Fig. 7 in the Appendix), thereby lowering the overall GGLE. However, in the last period (1996–2015), the shift in the distribution of age-specific contributions toward older ages widened the GGLE (FPC2: 6.22). During this period, while the effects of lung cancer were not statistically significant, the shift toward older age and the slight compression of the age-specific consequences of lung cancer contributed to the narrowing of the overall GGLE.

Notably, the evolution of GGLE was affected not only by the first component (the overall cause-specific contribution) but also by the second component (the age-specific location of the overall contribution) for other neoplasms, circulatory system diseases, external causes of death, and respiratory diseases. Additionally, the role of age-specific contributions varied over time; for some causes, the first or second component had a negative impact (e.g., infectious diseases) or a positive one (e.g., external causes).

6 Discussion and conclusion

Our results provide new insights into the ongoing dynamics of the considered populations in terms of the contribution of age and cause of death to the overall GGLE. While some of these came as no surprise, we found evidence that has not been highlighted in the literature.

The existing literature on the evolution of the gender gap in LE demonstrates that from the beginning of the 1920s, in most industrialized countries, the gap widened until the 1970s and then started to narrow (Glei and Horiuchi 2007; Oksuzyan et al. 2008, 2010). The increase and decrease in the mortality–gender gap were mainly due to the smoking epidemic (Preston and Wang 2006; Pampel 2010). In addition to some exceptions (e.g., Sweden and the US), men began smoking considerably earlier than women (Meslé and Vallin 2011; Beltrán-Sánchez et al. 2015). Differences in life expectancy between women and men have begun to narrow down in many countries in Northern Europe, North America, and Oceania (Meslé 2004). Compared to other European countries, in most Eastern European countries, GGLE generally declined later (around mid-2000s) (Meslé 2004; Trias-Llimòs and Janssen 2018). Japan constituted a special case, since differences in LE between women and men increased gradually and started to decline only in the most recent period (Meslé 2004; Glei and Horiuchi 2007; Liu et al. 2013; Katanoda et al. 2015; Feraldi and Zarulli 2022). In accordance with previous studies, our findings highlight that, in the most recent period, gains in LE have been larger for males than for females, narrowing the gender gap (Oksuzyan et al. 2009; Meslé and Vallin 2011; Thorslund et al. 2013). In Spain, Japan, and most Eastern European countries, the life expectancy of men has improved at a slower pace than that of women (García González and Grande 2018; Gjonça et al. 2005). Additionally, our findings showed that the narrowing of the gender gap in LE is mainly driven by decreasing differences in cardiovascular diseases. Progress in reducing mortality due to cardiovascular diseases is faster in men than in women, although it is slower at older ages than at younger ones (Meslé 2004; Gjonça et al. 2005; Feraldi and Zarulli 2022). This reduction in cardiovascular mortality may be attributed to medical advances and improved lifestyle habits. Medical and technical advances related to cardiovascular health and mortality may have favored survival among men to a greater extent than among women, but less so at the oldest age (Thorslund et al. 2013). However, trends in gender ratios of cardiovascular mortality are worse in men than in women (Gjonça et al. 2005).

The average contribution of lung cancer to the GGLE increased until 1975 and started to decrease within the last time interval (1996–2015). Some authors have suggested that mortality from lung cancer (mainly caused by smoking) follows a pattern with three phases: a period of diffusion, a peak, and then a period of decline (Pampel 2003a, 2003b). Traditionally, men adopted smoking behaviors earlier than women; therefore, the consequences of smoking first appeared among men. When the GGLE was at its maximum, men were more affected than women; thus, the gender differences in mortality were larger. Mortality rates for women with lung cancer have increased one or two decades later, resulting in a reduction in the gap. Although countries experienced smoking epidemics at different times (e.g., earlier in England and Wales and later in France), in the 1990s, there was an overall reduction in lung cancer mortality among men than women (Pampel 2003a, 2003b; Gjonça et al. 2005).

Observational comparative analyses of worldwide mortality trends have shown that, although cardiovascular disease mortality rates were twice as high as cancer mortality rates in 1970, cardiovascular disease mortality rates have declined significantly in the past three decades, whereas cancer mortality rates have been on the rise (Weir et al. 2016; Taksler and Rothberg 2017). Accordingly, our study also showed the increasing contribution of other neoplasms to the overall GGLE over time. This is commonly associated with improved early detection, prevention, and treatment measures for cardiovascular diseases coupled with the increasing longevity of the general population, which has led to a high prevalence of cancer (a disease of aging). In particular, it has been shown that males are at higher risk of all cancer incidence and mortality for a vast majority of sites (ReFaey et al. 2021). Neoplasms, (including lung cancer) are the second leading causes of death worldwide (Ferlay et al. 2013; Sitki Copur 2019). Excess male deaths are evident for colorectal, esophageal, liver, and skin cancers, which suggest disproportionate increase in relevant risk factors such as high-fat diet, alcohol consumption, Hepatitis B virus and/or herpes simplex virus infections, and UV exposure among men (Yang et al. 2012). Consequently, the contribution of cancer mortality to the gender gap in life expectancy has become more critical over time, especially in recent decades (Beltrán-Sánchez et al. 2015; Weir et al. 2016; Taksler and Rothberg 2017).

As found in other studies on OECD countries (Waldron et al. 2005; Backhans et al. 2012), our results highlight a reduction in the contribution of external causes to the GGLE over time. The combination of the convergence of gender roles and women’s role expansion led to increased risk exposure and more risk-prone behaviors (e.g., regarding drug use and improvements in medical care or public health measures), which in turn determined increase in external-cause mortality among women. However, although very small, we find slightly increasing contributions in the last period.

The contributions of endocrine, nutritional, and metabolic diseases to the gender gap in life expectancy began in the late 1990s. This was also found in a recent study based on 22 European countries from 1959 to 2015 (Stefanucci and Mazzuco 2022), which showed that endocrine, nutritional, and metabolic disease-related deaths have increased in recent times for men (suggesting that poor dieting and alcohol consumption are increasingly impacting men’s health). Finally, the contribution of infectious diseases to the overall gender gap may be due to the widely reported decline in mortality from infectious diseases (independent of increases in national income) in both women and men over the last century (Mackenbach 2013; Mackenbach and Looman 2013).

This study reveals that age- and cause-specific contributions to GGLE act almost entirely on two dimensions: level (greater/smaller contribution) and age pattern (location of the curves across ages), even considering the heterogeneity of space and time contained in the dataset. Remarkably, these two components explain almost all the variability in these contributions. Notably, in the last period, it is not the “quantum” of the cause-specific contributions that matters, but the “timing,” that is, location across the age spectrum. Overall, in the last period, GGLE was affected more by the composition of the causes of death than in previous periods. This is plausible if one considers how far the mortality reduction from the main causes of death has gone for both genders, and that, until the next revolutionary medical breakthrough, further changes in the GGLE are more likely to occur from a redistribution over the age range of mortality differences.

This study demonstrated the advantages of the demographic application of the FDA. One of the key advantages of the FDA is its ability to smooth complex data, such as the relative contributions of age and causes of death to the GGLE in several countries over time, helping reduce the impact of random fluctuations and other irregularities that can arise when dealing with the causes of death and variations in coding protocols. Changes in the ICD can make it difficult to analyze mortality data owing to different causes of death over time. The FDA approach through statistical smoothing can help mitigate the impact of these changes by reducing the noise in the data. This makes it easier to identify trends and patterns. Additionally, by transforming discrete data into functional data, it is possible to apply FPCA to synthesize the temporal dynamics by cause and age of the GGLE for several countries with few components, which is not possible with traditional methods (e.g., Arriaga).

When analyzing a functional dataset, it is favorable to have a large number of observations for each sample curve (Aguilera and Aguilera-Morillo 2013). Considering that the utility of FPCA is to synthesize complex patterns, it may be beneficial to include multiple attempts to obtain a sufficiently sizeable amount of information. However, it is worth mentioning that applying the FDA to countries with different characteristics simultaneously could lead to the pooling of countries with different patterns. However, separating the analyses during these three periods may mitigate this issue. Conversely, by applying the FDA to demographic data, researchers can gain a more accurate and nuanced understanding of trends and patterns in population dynamics over time.

This study has a number of limitations. Comparisons of cause-specific mortality between countries over time may be affected by differences in coding practices. We attempted to overcome this by choosing broad classes of causes that were consistent across ICD revisions (Canudas-Romo et al. 2020). Additionally, due to the lack of reliable cause-of-death data for age groups older than 85 years, the analysis was restricted to age groups from 50 to 85 years. Thus, shifts in the distributions toward very old age might not be completely captured. Since these very old ages are becoming increasingly important in determining the GGLE (Zarulli et al. 2020), disaggregated mortality data at ages older than 85 years would help better focus on the relevant age contributions to the gender gap. Finally, regression models were conducted for three time intervals (1959–1975, 1976–1995, and 1996–2015) that included time (years) as a covariate to control for the overall GGLE trend. The purpose of the models was to isolate the effects of FPCs in each period regardless of the overall trend over time in the GGLE. Although this approach allowed for a more accurate interpretation of the impact of FPCs in each period, since countries transition during different time points, we might mix attempts with different patterns. However, this issue applies only to the second period (as shown in Fig. 1).

Despite these limitations, the added value of this study is that it analyzes, for the first time, age- and cause-specific contributions to the GGLE using an FDA approach, which could prove useful in deepening our understanding of complex demographic phenomena. Therefore, we encourage the application of this approach to population studies.

Table 2 Classes of causes of death with related International Classification of Diseases (ICD) codes for each revision
Table 3 List of causes of death in each group