Correction to: São Paulo City is getting 2 °C hotter in the last 50 years: regression model with autoregressive errors or with lagged temperatures?

In this article the years and temperature values in Fig. 4 were missing on the x-axis and y-axis, resp. The correct figure is shown below. The original article has been corrected.


Introduction
One of the main global concerns is global warming and its impact on our lives. Temperature is increasing worldwide [13], and the main goal of the Paris Agreement (2015) is to take efforts to maintain the rise of the global temperature in our century below 2 • C above pre-industrial levels and limit the temperature increase further to 1.5 • C.
There are many lines of evidence about global warming. In March 2020, there were 1.89 million published papers including the words global warming and temperature increase in the Google Scholar website. Some of them discuss the impact of global warming, and that evidence of the increase in temperature is undeniable. The global annual temperature has increased at an average rate of 0.18 • C per decade since 1981 [15].
This paper aims to present and discuss time series models to estimate the increase in temperature. In special, the main goal of this analysis consists of estimating the mean temperature increase in the last 50 years based on the monthly mean temperature from 1961 to 2017 in the city of São Paulo, Brazil.
São Paulo is a vast city with an estimated population of 12 million people in 2016 [10]. Another motivation for this analysis is that in the last ten years the number of vehicles increased by 39% in São Paulo, passing from 6,369,581 in 2008 to 8,861,208 in 2018 (Detran), whereas the population increased by 8% in the last two decades.
Linear regression models are convenient to estimate the linear trend and seasonality, but significant residual autocorrelations show that the usual assumption of error independence is violated when analyzing time series. To estimate the mean temperature increase, we propose a linear regression model including linear trend and sine and cosine functions, but it is necessary to include also autoregressive errors, named AR error model. The alternative model is a linear regression model in which we include the lagged temperature (called ARX model) and consider the serial correlation between the temperatures in consecutive months. In this last model, the trend effect is accumulated in the long term and depends not only on the parameter of the linear trend, as some practitioners may think, but also on the parameters associated with the lagged temperatures.
In this paper, we propose, fit, and discuss both models. First, we check if all model assumptions are valid through a residual analysis. Then, the mean temperature increase is estimated for 50 years with the corresponding prediction interval. The method to estimate the mean annual increase is explained, and more complex calculations are required to estimate the long-term trend effect for the model with lagged temperatures.
Based on the observed monthly mean temperature from 1961 to 2017 in the city of São Paulo, it was observed a mean increase of 0.4 • C in each decade, and a mean increase of approximately 2 • C in the last 50 years. We also fitted a dynamic linear model only to confirm that the trend effect is not varying over time.
This paper unfolds as follows. Section 2 presents the method used. Section 3 presents all the estimation results. Section 4 discusses the advantages and disadvantages of each model. Finally, some conclusions are given in Sect. 5, along with some explanations on why the model with autoregressive errors is easier to estimate covariate effects, such as the trend effect.

Models, estimation methods, and residual analysis
The monthly mean temperatures in São Paulo, measured at the Mirante de Santana station from 1961 to 2017, were obtained from the Instituto Nacional de Pesquisas Espaciais (INPE) [11]. This time series consists of 684 monthly observations, corresponding to 57 years. This time series presents 20 missing values, mainly in the beginning of the 80's. No imputation method was applied and the models proposed in this paper may be fitted with few missing values.
The basic model includes a linear trend and harmonic components to take into account the seasonality as in [1]. Then, the regression model for the mean temperature in the tth month, y t , is given by Note that all 11 harmonics, cos(2 jt∕12) and sin(2 jt∕12) , were included, except the sine function for j = 6 since sin(2 6t∕12) = sin( t) = 0 . Beyond the orthogonality of these trigonometric functions, another important feature is that for any integer k we have Then, the seasonal component oscillates around zero and its inclusion in (1) implies that the mean of y t oscillates around the linear trend 0 + 1 t∕12.
The main concern is that the error process is probably autocorrelated. Then, it was proposed that the error may be modeled as an autoregressive (AR error) process, as suggested by the residual autocorrelation and partial autocorrelation functions.
After removing the nonsignificant harmonic terms, based on the likelihood ratio test to verify the null hypothesis that 7 coefficients are all simultaneously null (p=0.5521), the model with AR(1) errors is t = 1, … , 684 , where a t are zero-mean, independent, Gaussian, and homoscedastic random errors. Also, after a sequence of replacements, the error can be written as e t = ∑ i i a t−i and the process e t is stationary assuming that | | < 1.
This model was estimated using the maximum conditional likelihood method. The likelihood is the product of the Gaussian conditional density functions of y t given all the available information up to t − 1 [19]. The asymptotic distribution of these maximum likelihood estimators is Gaussian and their variance are obtained from the inverse of Fisher information matrix. More details are, for example, in [3,19]. (1) (2) In this analysis, the model was estimated using the Arima command of the forecast library in [8] using the R programming language.
As an alternative, we also proposed a model that includes the last observation y t−1 in the regression model to take into account the autocorrelation. This proposal is simpler to practitioners since it is a regression model where the lagged observation is another covariate. We call this model ARX(1) model because it may be understood as an AR(1) model with covariates (X = trend and seasonal components), as the ARMAX model in [19]. The ARX(1) model is defined for t = 1, … , 684 as where e t are independent Gaussian errors with standard deviation e and | | < 1.
The conditional mean of the temperature is but the annual increase in the mean temperature does not directly correspond to 1 since y t−1 also depends on t. By increasing one month, the covariate x t = t goes to t + 1 , and in the first month y t+1 increases on average 1 , in the second month y t+2 increases 1 + 1 and after h months 7,9]. As the process is a stationary autoregressive process, | | < 1 . Then, in the long term (as n → ∞ ), the mean annual increase corresponds to Using the Delta method [18] and assuming that | | < 1 , the asymptotic variance of the maximum likelihood estimator The estimated variance is obtained by replacing the parameters 1 and by their respective estimates. The residual analysis of both models included residual time series plots, residual autocorrelations and residual quantile-quantile (QQ) plots. After concluding that all (3) cj cos 2 jt 12 E(y t |y t−1 ) = 0 + 1 (t∕12) + ∑ j=1,2,6 cj cos 2 jt 12 model assumptions are valid, the predicted temperatures were calculated for both models. The predicted temperatures for the AR(1) error model in (2) are and, for the ARX(1) model in (3), the prediction includes the lagged term The temperature forecasts for the next 50 years are calculated with corresponding 95% prediction intervals using the AR(1) error model through the forecast library [8] in the R software [17]. The variance of forecasts using ARMA models are obtained, writing the model as an infinite order-MA model as explained in [3]. As we are considering an AR(1) error model, the coefficients of the MA model are Including the covariates at time h ( h ), the variance of a forecast h steps ahead is A state space model [16] with a time-varying trend and also seasonality was also proposed only to verify if the trend parameter is constant over time. The non-observed state variables to measure the time-varying trend t evolves as a random walk in where 0 is the intercept, t is the time-varying trend parameter, each s t is the monthly effect and the errors v t ∼ N(0, V ) , w S t ∼ N(0, W S ) and w t ∼ N(0, W ) are independent. In state space models [16], it is usual to assume that the seasonal effects s t are random variables such that s t + s t−1 + ⋯ + s t−11 = w S t have zero-mean and the timevarying effect is a random walk. This state space model was fitted using the Kalman Filter and the maximum likelihood method through the dlm library [16] in the R software [17]. The predicted timevarying effect ̂t is obtained with the smoothed equations (6) y t =̂ 0 +̂ 1 t∕12 + ∑ j=1,2,6̂ cj cos 2 jt 12 +̂ s1 sin 2 t 12 , cj cos 2 jt 12 +̂ s1 sin 2 t 12 +̂ y t−1 .
of the Kalman filter and is plotted to verify if it may be considered a constant or time-varying effect. The computing code used and the database is available upon request.

Results
In this section, we shall present the results of the present temperature data analysis. Figure 1 shows the monthly mean temperature in São Paulo. We note a linear increase from 1960, and the mean temperature increase is around 0.4 • C per decade as presented in Table 1. This corresponds to a mean increase of almost 2 • C in the last 50 years ( 0.39 × 5 = 1.95 • C).
All the estimates and corresponding standard errors for the proposed models are presented in Table 2. For both models, the estimated annual mean increase in temperature is 0.038 • C, which corresponds to an increase of  (Fig. 2) and also there is no significant residual autocorrelation until lag 10 according to the Ljung-Box test ( p value = 0.5161 for the AR(1) error model and p value = 0.2424 for the ARX(1) model) [2]. Also, Fig. 2 shows that the residuals are homoscedastic and normally distributed. Then, all model assumptions are valid for the AR(1) error and ARX(1) models and the prediction intervals and forecast may be used for inference. Figure 3 presents the predicted values only from 2010 for better visualization. The predictions for the AR(1) error model in (6) and ARX(1) model in (7) are similar and close to the observed temperature values.
Notice that the weather will be warmer in São Paulo if the current trend remains the same in the coming years, as observed in Fig. 4. These forecasts are calculated assuming that the trend will remain the same for a long time and it is shown only to visualize the future temperatures.
Based on the dynamic model with time-varying trend effect, the predicted time-varying effect ̂t oscillates close to 0.0383 as shown in Fig. 5. The variance of the time-varying error is too small ( < 0.0001 ), indicating that the linear trend effect does not change over time.

Discussion
Both the regression model with AR(1) errors in (2) and the ARX(1) model in (3) provide similar estimates and predicted values.
For several practitioners, it is easier to include the lagged observation in the regression model, as in the ARX(1) model (3), since it can be estimated by the least squares method using any spreadsheet program. If the main goal is to calculate predictions or forecasts for the temperature, this model may be chosen or even a SARIMA model. However, considering the trend coefficient as the    Table 2, the trend parameter estimate is 0.032, but the mean increase estimator must be calculated using (4), providing an annual increase equals to 0.038 with corresponding asymptotic variance in (5).
On the other hand, to estimate and obtain confidence intervals or tests involving the model parameters, for example, for the secular trend, it is easier to consider the model with AR(1) errors. This occurs because its trend parameter directly measures the trend effect, and it is easier to obtain standard errors and confidence intervals.
It is worth noting that usually it is expected that the range of the forecast interval grows for larger horizons. However, the forecast interval are not getting wider in Fig. 4 and this may occur due to the very small variance of the trend estimator.
In [6], a regression model with trend and autoregressive error is fitted, but there is no seasonal component. This may have increased the estimated autoregressive coefficient, turning the error process similar to a random walk (unit root process). After considering differences, they also found significant temperature increases in Alaska, but the estimated parameters do not correspond to the mean annual increases presented here, which is easier to interpret. A dynamic linear model to estimate a time-varying trend effect was also fitted indicating that this effect does not vary over time and the temperature is always increasing with the same rate, as also concluded by [5] for the global temperature data until 1980.
Recent papers concluded that global warming was overestimated. For example, [4] indicated that, in the period 1993-2012, the mean global temperature increased 0.14 • C(∓0.06 • C) per decade. This increase is lower than the previously estimated 0.3 • C using simulations of complex models. They concluded that maybe the models did not reproduce the observed global warming over the past 20 years or the slowdown in global warming over the past fifteen years.
Also, the mean increase in the global temperature is 0.18 • per decade since 1981 in the Global climate report [15] and, for example, it is 0.2 • in Ghana [12]. All these increases are smaller than the estimated increase around 0.4 • C per decade in São Paulo City, as observed in Table 1 and estimated by the proposed models ( 0.38 • C in Table 2). Our data from São Paulo showed a constant increase in temperature, with no slowdown.

Conclusions
Whenever there is evidence that the temperature increase is not changing over time, it is recommended to fit a regression model including trend and seasonal components and autoregressive errors to estimate the mean temperature increase with the corresponding confidence interval. If a model with lagged temperatures is chosen, it is necessary to calculate the long-term mean increase as in (4) and its variance as in (5), because the estimator of the trend coefficient is a biased estimator of the mean annual increase in the temperature.
Focusing on the climate change issue, there is sufficient evidence that temperature is increasing and there is an increase of 1.9 • C in the last 50 years in the city of São Paulo, with a 95% confidence interval of [1.6; 2.2] for the AR(1) error model, including the upper limit, stated in the Paris Agreement [14] of 2 • C-increase. Maintaining this increase, it is expected another increase of around 1.9 • C in the next 50 years.