1 Introduction

The analysis of climate variability is essential to make reliable long-term predictions. According to NOAA National Centers for Environmental Information, 2021 was ‘the fourth-warmest year in the 127-year record’ for the contiguous US, estimating a warming trend of + 1.60ºF/100 years and a precipitation trend of + 1.88in/100 yr (NOAA National Centers for Environmental information, 2022). In addition, it was detected extreme atmospheric events -wet and dry- in that region for the same year. In this context, the Palmer Drought Severity Index (PDSI) showed a trend of + 0.34/century, finding an increase in the drought risk in several areas such as in the Southwest and Southeast (Ge et al. 2016; Apurv and Cai, 2021), the Northwest and the Northern Great Plains (Ge et al. 2016).

Given that the climatological time series carry implicit long memory properties, for addressing an adequate study of climate trends, we will consider the long-range dependence of observations—or long memory processes—which implies ‘that even the most distant past still influences the current and future climate’ (Franzke et al. 2020). In this sense, the existence of a warming trend in average temperature is consistent with previous studies based on the fractional integration approach which also find significant positive trends in the Northern Hemisphere for that time scale (Gil-Alana, 2003, 2005, 2012, 2018; Gil-Alana and Sauci, 2019). Nevertheless, there is no wide consensus about whether there is persistence in the precipitation process (Yang and Fu, 2019), which seems to be dependent on the latitude, the climatical characteristics of each station, and the degree of homogeneity of the series (Potter, 1979; Tyralis et al. 2018). All this without forgetting that the long memory or long-range dependence properties may be affected by cross-sectional aggregation (Vera-Valdés, 2021), or by scales (Graves et al. 2017; Franzke et al. 2020).

The purpose of the paper is to look at the temperature and precipitation anomalies in the US, both aggregated and disaggregated by states in order to determine if there are significant time trends in the data, over the monthly period of 1895:01 to 2021:10, using a model of form as in the following equation:

$$\begin{array}{ccc}y_t=a+\beta t+x_t,&\left(1-B\right)^d\;x_t=u_t,&u_t=\rho\;u_{t-12}+\varepsilon_t.\end{array}$$
(1)

where yt refers to the observed data; α and β are unknown parameters, namely the constant (intercept) and the linear time trend coefficient; t is a time trend; B indicates the backshift operator; d is a real value that indicates the number of differences to be adopted in xt to achieve I(0) stationarity; xt shows the regression errors, assumed to be thus integrated of order d or I(d), which implies that ut is short memory or I(0); in addition, given the possible seasonality of the monthly series analyzed, a seasonal AR(1) process is assumed for the I(0) disturbances ut, where ρ is the (monthly) seasonality indicator, and εt is a white noise process.

Note that the estimation of β is crucial and it is clearly determined by the type of assumptions made of xt. Most articles impose d = 0 or alternatively d = 1. However, our results show that d is between 0 and 1. With our paper being the first of its kind for the aggregate US and its states over the longest possible sample period, which helps us avoid sample selection bias, our results also have novel economic implications. In fact, the novelty of this paper is more its application, and we do not aim to provide any theoretical econometric model innovation. In the paper we want to estimate time trends but simultaneously allowing for the possibility of strong dependence or long memory, noting that not taking this into account may produce inconsistent estimates of the deterministic terms.

In this regard, we use the well-established Autoregressive Fractionally Integrated Moving Average (ARFIMA) model to study long-memory persistence, and trends of long-spans of aggregate and state-level data of the US on temperature and precipitation anomaly. The objective is to highlight the heterogeneity in the underlying long-memory, persistence, and trend estimates of the aggregate and regional data to emphasize the fact that when analyzing these properties of climate-related variables of the US, we cannot generalize the findings obtained for the overall US to the regions, i.e., states. This is important since this has important implications for policymaking in terms of heterogeneous degree of responses at the state level to the issue of climate change as measured by these two variables under investigation.

The structure of this paper is as follows. Section 2 presents a brief summary of the literature. Section 3 indicates the methodology applied while Section 4 describes the dataset used, and presents the results. Finally, Section 5 discusses and concludes the paper.

2 A review of the literature

According to the World Meteorological Organization (WMO 2020), there is a 20% chance that by 2024 we will exceed 1.5 °C; therefore, if the current rate of increase in greenhouse gas concentrations is maintained, the increase in temperature by the end of this century will exceed the limit established in the Paris Agreement to limit global warming to 1.5 or 2 °C above pre-industrial levels (WMO, 2021).

Not even the industrial and economic slowdown caused by COVID-19 slows global warming as the persistence of carbon dioxide (CO2) in the atmosphere is very prolonged and therefore the reduction in emissions in 2021 is not likely to lead to a decrease in atmospheric concentrations of CO2 that drive the rise in global temperature (WMO, 2020).

The study, evaluation and trend of climate change have a great interest reflected in the numerous scientific studies (Bloomfield, 1992; Folland et al. 2018; Brunetti et al. 2001; etc.). However, there is no common criterion either on the modelling of the most appropriate climatological time series, nor on the deterministic nature of the term tendential in temperature time series.

As for modelling, one option is to consider that the time series of temperatures are stationary I(0) (Bloomfield and Nychka, 1992; Woodward and Gray, 1993) or non-stationary Models I(1) (Woodward and Gray, 1995; Stern and Kaufmann, 2000; Mann, 2004; Hamdi et al. 2018). Other studies consider wavelet analysis that allows the analysis of very large data sets being very robust against the presence of deterministic trends, in addition to allowing their detection and identification (Abry and Veitch, 1998), detrended fluctuation analysis based on a generalization of the analysis of fluctuation without trend (Kantelhardt et al. 2001) or spectral analysis where the correlations of several daily surface meteorological parameters are analyzed by partially complementary methods that are effective on different time scales (Weber and Talkner, 2001).

A common way to study the evolution of temperature is by diagnosing the nature (stochastic or deterministic) of the term trend in time series, without reaching conclusive results. While studies confirm stochastic behavior (Kallache et al. 2005; Cohn and Lins, 2005; Koutsoyiannis and Montanari, 2007; Hamed, 2008) others find a positive, deterministic, and statistically significant trend (Bloomfield and Nychka, 1992; Vogelsang and Franses, 2005; Fatichi et al. 2009).

Many studies focus on standard regressions over time trying to test whether the time trend coefficient is significantly positive and where the errors follow a short memory process or I(0).

Time series study, using power spectral density (PSD) analysis, often gives false results due to the highly non-stationary nature of rainfall signals (Matsoukas et. al. 2000; Kantelhardt et al. 2006). To avoid this, some authors have used trendless fluctuation analysis (DFA), and its multifractal generalization, the multifractal DFA (MF-DFA) (Jiang. et al. 2017; Philippopoulos et al. 2019; Kalamaras et al. 2019; Gómez-Gómez et al. 2021) and its multifractal generalization (Kantelhardt et al. 2002). Nevertheless, these techniques can lead to more variability and bias by overestimating or underestimation fractal parameters (Maraun et al. 2004; Stadnitski, 2012; Roume et al. 2019; etc.), especially in ‘short series of persistent noise’ (Delignieres et al. 2006). This could be due to the intrinsic characteristics of DFA method itself (Carpena et al. 2017) and the data transformations that need to be performed in this approach (Stadnitski, 2012).

In contrast, the main advantage offered by the autoregressive fractional integrated moving average (ARFIMA) approach (Granger and Joyeux, 1980; Hosking, 1981) is that the differentiation parameter d can be a real number, which allows a more accurate description of correlation not only at long-term but also at short-term (Huang et al. 2022). So, ARFIMA analysis and fractional integration in general provide efficient estimations and less variability (Roume et al. 2019; Bhardwaj et al. 2020) that could improve and complement the analysis realized by classical algorithms (Delignieres et al. 2006; Torre et al. 2007, and others).

The literature is very extensive, and the behavior of long memory in the study of temperature series should not be neglected (Lenti and Gil-Alana, 2021). In fact, long memory, and specifically fractional differentiation, has been widely used in the analysis of temperatures (Gil-Alana, 2005, 2006, 2017; Vyushin and Kushner, 2009; Zhu et al. 2010; Rea et al. 2011; Franzke, 2012; Yuan et al. 2013). Gil-Alana (2018) studied the time trend coefficients of temperatures in the US 48 states from 1895 to 2017 using techniques based on fractional integration in the untrended series. The results are more accurate trend estimates than those obtained with other methods that assume I(0) seasonality and I(1) non-seasonality.

Gil-Alana and Sauci (2019) assess the fractional persistence of average temperature and anomalies using monthly US data for the period 1895–2017. Their results show positive and significant trend coefficients for 38 out of 48 states, observing a high degree of persistence in most of the series. In particular, the states of Rhode Island, New Jersey, and North Caroline exhibit the greatest increase above 2.70 °C/100 years. The present study extends this analysis to a longer US temperature dataset and includes moreover time series of precipitation.

3 Methodology

Taking into account the monthly structure of the series under examination, and in order to test both the existence of trends and the degree of dependence, we examine the model given by Eq. (1), that is, including a linear trend, an I(d) model, and a seasonal AR structure.

In this context, there are three parameters of interest, β, that indicates the increase in the value of the series per unit of time (months); d, referring to the degree of dependence or persistence, and showing long memory if that parameter is significantly positive; and ρ, the seasonal AR coefficient, dealing with the seasonally (monthly) structure.

Focusing on the long memory property, this is a feature of time series data that implies that observations are very dependent even if they are separated in time. Among the many models describing this type of behavior, a very common one is that based on fractional differentiation, which is described by the second equality in Eq. (1) and that satisfies this long memory property is d is positive. Being a real value, it allows us to consider different alternatives such as I(0) or short memory (if d = 0), stationary long memory (0 < d < 0.5); nonstationary though mean-reverting behavior (0.5 ≤ d < 1); unit roots (d = 1) or even explosive behaviours (d > 1).

The long memory feature on fractional integration can be easily seen from the Binomial expansion of (1 – B)d which is:

$$\left(1-B\right)^d=\sum_{j=0}^\infty\begin{pmatrix}d\\j\end{pmatrix}\left(-1\right)^j\;B^j=1-d\;B+\frac{d\;\left(d-1\right)}2B^2-\dots$$

and thus, the higher the value of the differencing parameter d is, the higher the association between observations, even if they are far apart. Robinson (1978) and Granger (1980) justified the presence of long memory based on the aggregation of heterogeneous autoregressive (AR) processes, and fractional integration was first introduced in the literature by Granger and Joyeux (1980) and Hosking (1981), being widely used in the context of aggregated data since the late 90 s (Baillie, 1996; Hsueh and Pan, 1998; Gil-Alana and Robinson, 1997; Parke, 1999; etc.).

The estimation of the model is conducted by means of an approximation of the likelihood function, the Whittle function, expressed in the frequency domain, and we use a technique that is a testing method proposed in Robinson (1994) and that is very appropriate in our case, since it does not impose stationarity in the series unlike most of the classical long memory procedures.

Robinson (1994) proposes the following regression model,

$$\begin{array}{cc}{\mathrm y}_t=\beta^Tz_t+x_t;&t=1,\;2,\dots,\end{array}$$
(2)

where zt is a (kx1) vector of exogenous regressors (or deterministic terms) and the regression errors, xt, are described as:

$$\left(1-B\right)^{d_1}\;\left(1+B\right)^{d_2}\prod\nolimits_{j=3}^m\left(1-2\;\cos\;w_r^j\;B+B^2\right)^{d_j}\;x_t=u_y,$$
(3)

where d is a (mx1) vector of real-value parameters, where the first component d1 refers to the long run or zero frequency, and the rest of the terms (dj, j > 1) refer to the orders of integration at non-zero frequencies. \({w}_{j}^{r}=\frac{{2\pi r}_{j}}{T}\); and \({r}_{j}=\frac{T}{{S}_{j}}.\) Thus, rj refers to the frequency with a pole or singularity in the spectrum of xt, and sj indicates the number of periods per cycle, while Robinson (1994) proposed to test the null hypothesis:

$${H}_{O}:d={d}_{o}$$
(4)

in the model given by (2) and (3) for any real value-vector do, and he showed that the test statistics, say \(\widehat{\mathrm{R}}\) has a \({\chi }_{\mathrm{m}}^{2}\)-null limit distribution. In the empirical work carried out in the following section, we suppose m = 1, and thus, we only consider the long run or zero frequency. Thus, the limiting distribution is \({\chi }_{1}^{2}\).

4 Data and empirical results

4.1 Data

For our analyses, we use monthly data on temperature and precipitation anomalies (relative to the base period of 1901–2000) for the aggregate US and its 48 contiguous states (i.e., except for Alaska and Hawaii) over the monthly period of 1895:01 to 2021:10. The data is sourced from the National Oceanic and Atmospheric Administration (NOAA).Footnote 1

4.2 Aggregated data results

Table 1 presents the estimates of the integration order d in Eq. (1) for the two aggregated time series. We display the results under the three classical assumptions in the unit root literature, i.e., (i) with no deterministic components, i.e., imposing that α = β = 0 a priori in (1); (ii) including only an intercept or a constant, i.e., with β = 0 a priori; and (iii) including both an intercept and a (linear) time trend, i.e., with both parameters α and β estimated from the data. Together with the estimates of the differencing parameter d we also present in the tables the confidence intervals for the non-rejection values of d at the 5% level using the tests of Robinson (1994).

Table 1 Estimates of fractional differentiation parameter, d aggregated data

We report in boldface in Table 1 the selected model according to the best specification of the deterministic terms. This selection has been made based on the significance of the estimated coefficients in (1). Thus, if both deterministic terms, i.e., α and β are statistically significantly different from zero, we adopt that model; if β is found to be insignificant, we choose the model with only an intercept, while if both are statistically insignificant, we adopt the model with no deterministic terms. We see in the table that for the two series the time trend is required and the estimated value of d is 0.13 in the two series. Table 2 displays the coefficients based on the selected model. We see that the estimates of β are significantly positive in the two series, being much higher in the case of temperatures than in precipitation. The estimated time trends are graphically displayed in Fig. 1.

Table 2 Estimated coefficients in the regression model: aggregated data
Fig. 1
figure 1

Time series plots and estimated trends

4.3 Disaggregated data by states

We start reporting the results for the temperatures (see, Tables 3 and 4). The first observation from Table 3 is that the model with a time trend is preferred in the majority of the cases. In fact, there are only seven states where the model does not require either a constant or a time trend. They are Alabama, Arkansas, Kentucky, Louisiana, Mississippi, Oklahoma, and Tennessee, which are all geographically related in the Southeastern part (see Fig. 2). Focusing on the selected models, in Table 4, we observe that the estimate of the differencing parameter is significantly positive in all cases, ranging from 0.09 (Nebraska and Kansas) and 0.10 (Missouri, Montana, Oklahoma and Wyoming) to 0.18 in Florida and Michigan. Figure 3 provides a graphical summary of the results relating the differencing parameter. Tentatively, the degree of persistence seems to be correlated in a negative manner with climate change–related risks and how prepared the states are in terms of climate change, i.e., what measures they are undertaking to slow down the process of climate change. In this regard, the reader is referred to a non-academic analysis that was conducted by a private company dealing with homeowners insurance namely Policygenius. In particular, see, https://www.policygenius.com/homeowners-insurance/best-and-worst-states-for-climate-change/. The company has developed what it calls the 2021 Policygenius Best & Worst States for Climate Change Index.Footnote 2 To calculate this index, a ranking was provided for each of the contiguous 48 states on several climate change-related factors.Footnote 3 Then a score out of 100 was created for each state based on these rankings. A higher score means a better outlook in a low or high-emissions future, and a lower score means a worse outlook. We observe that the lowest degrees of persistence seem to take place in the central part of the US (Nebraska and Kansas). Finally, the seasonal AR coefficient seems not to be much significant in any of the US states.

Table 3 Order of integration (d) in the temperature anomaly: results by state
Table 4 Coefficients in the selected models. Temperature anomaly: results by state
Fig. 2
figure 2

Time trend for temperature anomaly, based on results of Table 4

Fig. 3
figure 3

Estimate of d for temperature anomaly, based on results of Table 6

Moving to the precipitation (see, Tables 5 and 6), we first observe that the time trend is now insignificant in a larger number of states, in particular, in 21 states: Arizona, California, Colorado, Connecticut, Florida, Georgia, Idaho, Kansas, Missouri, Montana, Nebraska, Nevada, New Mexico, North Caroline, North Dakota, Oregon, South Caroline, Texas, Utah, Washington, and Wyoming. Among the states with a positive linear time trend, the highest coefficients are observed in Mississippi (0.00047) and Vermont (0.00045) followed by New Hampshire (0.00042), Tennessee, and Louisiana (0.00041). Note that, these states are all in the eastern part of the country (see Fig. 4).

Table 5 Estimates of d in the precipitation anomaly: results by state
Table 6 Selected coefficients in the precipitation anomaly: results by state
Fig. 4
figure 4

Time trend for precipitation anomaly, based on the results of Table 6

With respect to the degree of integration, we see that the estimated values of the integration order d are smaller than those of the temperatures: short memory or I(0) behaviour cannot be rejected in 14 states: Alabama, Indiana, North Dakota and Wisconsin (with d = 0.04); Connecticut, Delaware, Maryland and New York (0.03), Maine (0.02), Massachusetts, Rhode Island and Vermont (d = 0.01), and Michigan and New Hampshire (d = 0.00); for the rest of the states, the estimate of d is significantly higher than 0, implying a long memory pattern, and the highest values are obtained at Arizona (d = 0.11) and Texas (0.12). Figure 5 provides a graphical summary of the results relating the differencing parameter. We observe that the states with the highest degrees of persistence seems to be located in the Southwest, while those with the lowest values are in the Northeast. As with temperature anomaly, these findings of higher persistence in precipitation anomaly seem to be negatively correlated with climate change–related risks and degree of preparedness, albeit in a tentative manner.

Fig. 5
figure 5

Estimate of d for precipitation anomaly, based on the results of Table 6

5 Discussion and conclusions

The time series features of the temperature and precipitation anomalies in the US have been examined in this paper, looking first at the aggregated, data and then at the data disaggregated by the 40 contiguous states. In order to do that, we have employed techniques based on fractional differentiation allowing thus the number of differences that can be used in the series to take a fractional value.

Starting with the aggregated data, our results support the hypothesis of long memory or strong dependence since the differentiation order is significantly positive in the two cases of temperature and precipitation anomalies, and the time trend coefficient is positive in the two cases, with it being higher for the temperatures.

If we look at the data disaggregated by states, starting with the temperature anomaly, we see that the coefficient for the time trend is significantly positive in the majority of the states, barring seven cases with the insignificant trend, and they being all located in the South East. For the estimate of the differencing parameter, there is a large degree of heterogeneity across the states, with the value of d ranging from 0.09 (Nebraska and Kansas) and 0.10 (Missouri, Montana, Oklahoma, and Wyoming) to 0.18 in Florida and Michigan.

For the precipitation anomaly, the trend is now found to be statistically insignificant in a large number of states, and the degree of differentiation is slightly smaller than for the temperature anomaly. In fact, the hypothesis of a short memory pattern (i.e., d = 0) cannot be rejected in fourteen states and the highest degree of integration is observed in Arizona (d = 0.11) and Texas (0.12).

Climate risks, as captured by the behavior of temperature and precipitation anomalies, are known to have an effect on economic activity (Descêhnes and Greenstone 2007; Dell et al. 2009, 2012, 2014). Our results suggest that given the heterogeneity in terms of the trend and persistence of temperature and precipitation anomalies, the nature and strength of policies adopted by the local governments to mitigate climate change would need to be different from one another; i.e., state-specific policies need to be pursued to accurately tackle the issue of local climate change. At the same time, we must also emphasize that policymakers must not rely on aggregate results to come up with policy decisions at the state level.

Given that, climate risks have also been associated with the volatility of temperature and precipitation anomalies (Donadelli et al. 2017, Donadelli et al. 2021a, 2021b, 2021c; Kotz et al. 2021), as part of future research, it would be interesting to conduct similar analyses on the assessment of trend and persistence of the variance of these two series. From a methodological viewpoint, future research might investigate the presence of non-linear and/or cyclical structures in the data still in the context of fractional integration.