Finite-sample properties of estimators for first and second order autoregressive processes

The class of autoregressive (AR) processes is extensively used to model temporal dependence in observed time series. Such models are easily available and routinely fitted using freely available statistical software like R. A potential problem is that commonly applied estimators for the coefficients of AR processes are severely biased when the time series are short. This paper studies the finite-sample properties of well-known estimators for the coefficients of stationary AR(1) and AR(2) processes and provides bias-corrected versions of these estimators which are quick and easy to apply. The new estimators are constructed by modeling the relationship between the true and originally estimated AR coefficients using weighted orthogonal polynomial regression, taking the sampling distribution of the original estimators into account. The finite-sample distributions of the new bias-corrected estimators are approximated using transformations of skew-normal densities, combined with a Gaussian copula approximation in the AR(2) case. The properties of the new estimators are demonstrated by simulations and in the analysis of a real ecological data set. The estimators are easily available in our accompanying R-package for AR(1) and AR(2) processes of length 10–50, both giving bias-corrected coefficient estimates and corresponding confidence intervals.


Introduction
The class of autoregressive (AR) processes is one of the most central and widely applied time series models. These processes are simple and intuitive, expressing the current value of a time King Abdullah University of Science and Technology, Thuwal, Saudi Arabia series as a linear combination of previous values and additional random noise. Specifically, a pth order autoregressive process (AR( p)) can be defined by where the set {φ j } process has pseudo-periodic behavior for pairs of coefficients below the given parabolic curve. In the lower part of this region, the bias of the exact MLE and Burg's method is not too severe but it increases with increasing values of φ 2 . The Yule-Walker approach clearly gives the most biased results for both sample sizes. The average estimates using the conditional MLE are not illustrated as these were not visually distinguishable from the exact MLE. A number of methods to provide bias-corrected estimators for the AR coefficients have been proposed in literature. These range from asymptotic-based formulas for the bias (Marriott and Pope 1954;Kendall 1954;Shaman and Stine 1988;Tanaka 1984;Cordeiro and Klein 1994) to methods using restricted maximum likelihood (Cheang and Reinsel 2000) and bootstrapping (Thombs and Schucany 1990;Kim 2003). Andrews (1993) introduced a median-unbiased correction for the least squares estimator of the AR(1) coefficient which was generalized to give an approximate median-unbiased estimator of AR( p) processes in Andrews and Chen (1994). This estimator is implemented in the R-package BootPR (Kim 2014), also including the estimators of Shaman and Stine (1988) and Roy and Fuller (2001). The resulting estimates are not constrained to fall within the stationary area of the processes.
Our bias-correcting approach differs from previous suggestions in the sense that we model the true AR coefficients as a function of original estimates, accounting for the sampling distribution of the original estimator. This is achieved by a brute-force simulation approach where we generate AR(1) and AR(2) processes for a fine grid of underlying true values of the coefficients. The relationship between the true and estimated coefficients is then described using a weighted orthogonal polynomial regression model. In the AR(2) case, the regression model is fitted based on a total of more than 59 million time series for a given sample size n. The resulting bias-corrected average estimates are illustrated in the lower panels of Fig. 1, again using the exact MLE, Burg's method and the Yule-Walker solution as original estimators. We do see a clear improvement in the bias properties for all of the methods. Admittedly, the bias-corrected estimators will not be completely unbiased as the relationship between the true and originally estimated coefficients cannot be modeled perfectly. This is especially seen for coefficient combinations along the borders of the triangular area.
The given brute-force simulation approach makes it possible to also derive sampling distributions for both the original and bias-corrected estimators. Specifically, we have fitted skew-normal distributions to transformations of the originally estimated coefficients of AR(1) and AR(2) processes. The parameters of the skew-normal distribution are then modeled in terms of the true underlying AR coefficients, again using orthogonal polynomial regression. In the AR(1) case, it is straightforward to obtain confidence intervals by Monte Carlo sampling, both when the original and bias-corrected estimators are used. In the AR(2) case, confidence intervals are found by combining Monte Carlo sampling and a Gaussian copula representation to preserve correlation between the estimated AR coefficients.
This paper is structured as follows. Section 2 outlines our modeling approach giving bias-corrected estimators of the first-lag autocorrelation coefficient of AR(1) processes and derive their sampling distributions. In Sect. 3, the suggested approach is extended to give bias correction and approximate sampling distributions in estimating the coefficients of AR(2) processes. Section 4 illustrates the bias correction for AR(2) processes for a real ecological data set, where the autoregressive coefficients represent direct and delayed density dependence for a vole species. Concluding remarks are given in Sect. 5. The Appendix describes our accompanying R-package which can be used to obtain the bias-corrected estimates and 95% confidence intervals for time series of length n = 10, 11, . . . , 50.

Finite-sample properties for AR(1) processes
The AR(1) model has a simple one-step dependence as defined by (1) where p = 1. The bias in estimating the first-order autocorrelation coefficient φ 1 = φ has been studied by several authors, see e.g Krone et al. (2017) for a recent comparative simulation study of AR(1) estimators in short time series. The bias of the exact MLE is illustrated for different sample sizes in Fig. 2, giving empirical averages for 10,000 simulations for a fine grid of φ-values in the stationary area (−1, 1).
A common way to explicitly construct an unbiased estimate for a parameter φ is simply to subtract an estimate of the bias from the original estimator, providing a bias-corrected estimator of the formφ For example, the exact MLE can be corrected using the asymptotic bias −(1+3φ)/n (Tanaka 1984;Cordeiro and Klein 1994). Naturally, such a linear correction would not be accurate enough for small sample sizes. In Arnau and Bono (2001), the ordinary autocorrelation estimator for φ is bias-corrected by adding the absolute value of a polynomial fit to the Fig. 2 The estimated empirical biasφ − φ of the exact MLE for AR(1) processes with length n = 10 (black), n = 15 (red), n = 20 (green), n = 30 (blue), n = 40 (light blue) and n = 50 Empirical bias empirical bias. This has a slight resemblance to our approach, but an important difference is that we do not try to estimate or model the bias. We model the true parameter value φ directly as a function of original estimates using a large number of simulations. Similarly, we provide approximate sampling distributions by modeling parameters of skew-normal approximations as functions of the true values of φ.

Deriving bias-corrected estimators by simulation
Letφ denote an original estimator for the first-lag autocorrelation coefficient of AR(1) processes. Our goal is to construct a bias-corrected estimator for φ such that E(φ c ) = φ for all values of φ. To achieve this we model the relationship between the true and estimated parameter values using a weighted orthogonal polynomial regression model, using the true parameters as response variables. The coefficients of the regression model are calculated by minimizing the weighted squared error between the corrected estimates and the true parameter values for a grid of true values used as a training set. This provides a predictive model which can then be used to provide bias-corrected estimates for any original estimateφ. The specific steps in constructing this bias-corrected estimator can be summarized as follows: 1. To avoid constraints on the support of φ, we first introduce a monotonic transformation which has infinite support. This facilitates optimization and implies that our inverse transformed bias-corrected estimate will always be within the stationary area of the AR(1) process. 2. Letφ denote an original estimator of φ. We model the true AR coefficient using an orthogonal polynomial model where β = {β k } K k=0 denotes a fixed set of regression coefficients while {h k (.)} K k=0 represents a set of orthogonal polynomials of order k. Here, we choose to use the probabilists' Hermite polynomials which are orthogonal with respect to the standard normal density. These polynomials are defined by 3. To estimate β for a given sample size n, we generate a total of m = 10,000 time series for a fine grid of φ-values, which can then be considered as our training set. Note that the suggested estimator is a non-linear function ofφ implying that This means that the optimization takes the estimated value for each time series into account not just the average estimate of the m simulations for each φ. The regression coefficients are thus found by solving the optimization problem The quantityφ r j represents the estimate of φ r in simulation j. Specifically, we choose the grid of true parameter values defined by φ r ∈ (−0.95, −0.94, . . . , 0.95), implying that l = 191. The sample variances s 2 r of the m parameter estimates of φ r are used as weights. This gives a unique set of regression coefficientsβ for each sample size n and for each original estimatorφ, implying thatβ accounts for the sampling distribution of the estimatorφ. The bias-corrected estimator is then given byφ c = f (φ,β) which is used to predict the true value of φ for a new estimateφ.
In minimizing (5), we have chosen to exclude values of φ close to the edges of the stationary interval in the training set. This is to avoid a severe inflation of the variance caused by forcing the estimator to give unbiased estimates at the edges of the interval. Also, we have chosen to use weighted regression where the weights s 2 r represents the ordinary unbiased sample variances of the m = 10,000 estimatesφ r . This choice was made to robustify the given modeling approach to increasing sample variances close to the boundaries of the given interval but this did not have any major impact on the optimized values for β.
In practice, the given approach can be used to find corrected estimates for any estimator φ giving values within the stationary range, not only the four estimators considered here. The corrected estimates might be equal to ±1, but will never fall outside of the interval [−1, 1]. Our implementation includes the exact and conditional MLEs, Burg's algorithm and the Yule-Walker solution. We have stored all the sets of regression coefficients for these four estimators for AR(1) series of length n = 10, 11, . . . , 50. To facilitate use of the given bias-corrected estimates we have provided an accompanying R-package, see the Appendix. at the vertical axis. Naturally, this gives a quite large bias-correction when n = 15, where original estimates above 0.5 are corrected to a value close to 1. In the case of n = 30, we notice that the correction curves are close to linear for an internal subset of the interval. The original and the corresponding corrected average estimates for m = 10,000 simulations are displayed in Fig. 4. Using the correction, we do get close to unbiased results both when n = 15 and n = 30. The overall average bias and sample variance for the estimators are given in Table 1. We also compute the overall root mean squared error (RMSE), which in the case of the corrected estimator is defined by

Bias-correcting curves
whereφ c,r j is the corrected estimate of φ r in simulation j. The RMSE for the original estimator and the bias and variances are computed correspondingly. The results illustrate the well-known bias-variance trade-off implying that a decrease in the bias of an estimator will inherently cause an increase in the variance. Using the biascorrected estimators we get approximately unbiased results only causing a negligible increase in RMSE, especially when n = 30. By increasing the order K of the Hermite polynomials, the bias can be further decreased but this also increases the variance giving higher RMSE and K = 3 was opted to be the best choice. In the minimization we could also have excluded more of the φ-values at the ends of the unit interval, e.g. using φ ∈ (−0.9, 0.9). This would reduce the variance but naturally also increase the bias close to the limits of the interval (−1, 1).

Sampling distributions for the original and corrected estimators
Commonly applied estimators for the coefficients of AR( p) processes are asymptotically normal (Hannan 1970) but the finite-sample distributions of these estimators have not been thoroughly studied. Figure 5 illustrates the sampling distribution for the logit transformation g(φ) whereφ is the exact MLE for AR(1) processes of length n = 30 and where the underlying true values are φ ∈ (−0.9, 0.6, 0.9). The fitted curves are skew-normal densities which are seen to give very good approximations to the given sampling distributions. These  have been fitted using the function fGarch:::snormFit in R (Wuertz et al. 2020), implementing the skew-normal density as defined by Fernandez and Steel (1998) The function π G (.) denotes the standard normal density while H (x) is the ordinary Heaviside or unit step function. In addition to the skewness parameter ξ > 0, the skew-normal density is parameterized in terms of a mean μ and standard deviation σ , using an input argument (x − μ)/σ . Based on the skew-normal density approximation for g(φ), we can derive finite-sample distributions for the estimatorsφ and the corresponding bias-corrected estimatorsφ c . Let π sn (.) denote the skew-normal approximation for the logit transformation g(φ). The approximate sampling distribution for the original estimator is easily expressed analytically by the ordinary change of variable transformatioñ Likewise, an approximation to the sampling distribution for the corrected estimator can be derived numerically asπ where s(φ c ) represents a spline function approximating the monotonic relationship between φ c and g(φ). The resulting sampling distributions for the original and corrected estimators are shown in Fig. 6, where n = 30 and φ ∈ (−0.9, 0, 6, 0.9).
To calculate the sampling distributions in (7)-(8) for an estimatorφ, we first need to assess the relationships between the parameters of the skew-normal approximation and φ. Using generic notation, letθ r = (θ 1,r ,θ 2,r ,θ 3,r ) = (μ r ,σ r , ln(ξ r )) denote the parameters of the skew-normal approximation to m = 10,000 samples of g(φ r ), for each φ r ∈ (−0.99, −0.98, . . . , 0.99). Each of the estimated skew-normal parameters are used as response variables in separate orthogonal polynomial regression models  Fig. 7 The mean, standard deviation and skewness parameter of the skew-normal approximation for g(φ) as smoothed functions (red) of φ, whereφ represents the exact MLE when n = 30 The resulting estimates of the coefficients, {b k,s }, are found straightforwardly by ordinary least squares, using a polynomial order of K = 3. Figure 7 illustrates the relationships between each of the parameters of the skew-normal approximation to g(φ r ) and φ r , wherê φ r denotes the exact MLE for series of length n = 30. The red curves illustrate the smoothed version of these curves which can then be used to predict the parameters of the skew-normal approximation for a new value of g(φ). Especially, we notice that the skewness parameter is quite close to 1 for the interior of the given interval, implying that the sampling distributions for g(φ) are not very far from being Gaussian. The skewness increases as φ increases towards the upper limit of the stationary area. By storing the coefficients {b k,s } for all the four original estimators and each sample size n, we can now calculate confidence intervals for φ for all cases using Monte Carlo simulations. Given an estimate g(φ), we predict the parameters of the corresponding skew-normal approximation and sample from this distribution. These samples are easily transformed to give confidence intervals for φ, using the relevant percentiles for the distributions ofφ andφ c . To investigate the coverage probability of the resulting confidence intervals, we performed a simulation study generating 10,000 AR(1) process with a uniformly drawn coefficient, i.e. φ ∼ Uniform(−1, 1). The simulation study was performed for sample sizes n = 10, 15, 20, 30, 40, 50, in which the coefficient φ was estimated using each of the four original methods. We then calculated the bias-corrected estimates and found 95% equi-tailed confidence intervals for all cases by Monte Carlo simulation. The results demonstrate that the coverage probabilities for the original estimators are below the nominal level of 0.95 for all estimators and all sample sizes, see Table 2. Especially, the nominal level using the Yule-Walker solution is below 0.90 for all sample sizes. Using the bias-corrected estimators, the coverage properties are clearly better being quite close to the nominal level of 0.95 in all cases. For the smaller sample sizes, this can partly be explained by the larger variance of the bias-corrected estimators giving wider confidence intervals. For the large sample sizes, the confidence lengths are approximately the same. The true value of φ is randomly generated from (−1, 1). The confidence intervals are found by Monte Carlo sampling after fitting a skew-normal approximation to g (φ) expensive than in the AR(1) case as we need to generate time series for a fine two-dimensional grid of the coefficients in the triangular stationary area. For each pair of the true parameters, we then fit weighted polynomial regression models to the generated time series and store the resulting optimized regression coefficients. As previously, the original estimators used include the exact and conditional MLE, Burg's algorithm and the Yule-Walker solution for AR(2) processes of length n = 10, 11, . . . , 50. We obtain approximate sampling distributions for the bias-corrected versions of these estimators by constructing Gaussian copulas where the marginals are generated as transformations of skew-normal densities.

Modeling approach in two dimensions
The AR(2) process is defined by (1) where p = 2 and it is stationary within the triangular area constrained by φ 2 + |φ 1 | < 1 where |φ 2 | < 1. A more appealing parameterization of this process is given by the partial autocorrelations, as the stationary area of the AR(2) process is then defined by the square ψ i ∈ (−1, 1), i = 1, 2. The area in which the process has pseudo-periodic behavior is characterized by ψ 2 1 (1 − ψ 2 ) 2 + 4ψ 2 < 0. We now extend the algorithm in Sect. 2 to construct bias-corrected estimators (φ c,1 ,φ c,2 ), again taking the sampling distribution of the original pair of estimators (φ 1 ,φ 2 ) into account. In this case we estimate the parameters of the regression model by minimizing the squared error between the corrected and true partial autocorrelations. This is computationally beneficial to avoid the triangular constraints on φ 1 and φ 2 . Also, the correlation betweenψ 1 andψ 2 is much smaller than the corresponding correlation between the estimates of the φ-coefficients. Naturally, this only makes a difference for the first coefficient as φ 2 = ψ 2 . The algorithm extending the weighted polynomial regression model to two dimensions can be summarized as follows: 1. Using the logit transformation in (2), the underlying true partial autocorrelations are modeled by where h k,q (g(ψ 1 ), g(ψ 2 )) = h k (g(ψ 1 ))h q (g(ψ 2 )) denotes the product of Hermite polynomials of order k and q. Notice that the two partial autocorrelations are modeled separately, giving separate sets of regression coefficients β i = {β k,q,i } for i = 1, 2. However, each of the true partial autocorrelations need to be modeled in terms of the estimated pair (ψ 1 ,ψ 2 ) as these parameters are not independent. 2. Due to the dependence, the regression coefficients β = {β 1 , β 2 } of the given predictors for ψ 1 and ψ 2 are found simultaneously. This is achieved by solving the following optimization problem: The values (ψ r j1 ,ψ r j2 ) denote the original estimates for the r th pair of the partial autocorrelations in simulation j while s 2 ri denotes the sample variances for the m simulations in each case. The value l denotes the total number of pairs of the partial autocorrelations that are included in the minimization.
In solving (12), we needed to generate time series for a fine two-dimensional grid of the parameter values (ψ 1 , ψ 2 ) within the square defining the stationary area. The estimateβ is based on using the grid ψ i ∈ (−0.95, −0.925, . . . , 0.95), i = 1, 2. This gives a total of l = 77 2 = 5929 different combinations of the partial autocorrelations. For each pair (ψ 1 , ψ 2 ), we generated m = 10,000 times series of a specific length n implying that the regression coefficients are estimated based on approximately 59 million time series. This was repeated for all sample sizes n = 10, 11, . . . , 50, such that the total number of generated time series is equal to 77 2 · 41 · 10,000 = 2,430,890,000 or approximately 2.43 billion time series for a given value of K . We then saved the regression coefficients for each sample size and for each of the original estimation methods providing bias-corrected estimatorŝ which are then transformed by (10) to give estimates for the AR coefficients. As in the AR(1) case, the estimated coefficients might fall at the border of the stationary area but not outside.
If run sequentially on an ordinary single-core laptop, the given brute-force simulation approach to compute the bias correction would be computationally infeasible as it would take approximately 5 years of CPU time. The computations were therefore done on the Ibex cluster at KAUST (https://www.hpc.kaust.edu.sa/ibex), which reduced the time down to about 2 days. The computation time could have been further reduced by lowering the number of generated times series from m = 10,000 to e.g m = 3000 giving approximately the same results. However, as the given calculations were done only once we chose to use a large number of generated time series.

Properties of the bias-corrected estimators
By using the partial autocorrelations in (12), the resulting corrected estimators for the AR(2) coefficients are not completely unbiased. However, in addition to the numerical advantages, we have noticed that this approach also gives smaller variance and RMSE compared to performing the optimization with respect to the φ-coefficients. As in the AR(1) case, the order K of the Hermite polynomials can be increased to slightly reduce the overall bias but the variance and the RMSE then increase. In the subsequent analysis we have therefore chosen to use K = 3 as this gave the best overall results. The number of regression coefficients in (12) is then equal to 10 for each of the parameters ψ 1 and ψ 2 . Using for example K = 7, the model in (12) would have a total of 36 terms for each of the parameters. The estimated partial autocorrelations using the exact MLE and the corresponding biascorrected estimates are shown for sample sizes n = 15 and n = 30 in Figs. 8 and 9, respectively. Visually, the corrected estimator is very accurate in estimating ψ 2 but show some bias in estimating ψ 1 , especially for the upper part of the square where the value of ψ 1 is either underestimated or overestimated depending on the value of ψ 2 . The corresponding bias for φ 1 is much smaller than for ψ 1 as the given squared domain translates into a triangular area.
To further study finite-sample properties, we have calculated the overall average bias, variance and the RMSE of both the original and the bias-corrected versions. The calculations are based on taking the averages for the m simulations for each of the l = 5929 combinations of (ψ 1 , ψ 2 ), which are easily transformed to give estimates for the AR coefficients by (10). In calculating the overall averages we have added the results for both parameters. For example, RMSE for the bias-corrected estimator is given by The bias and the variance are calculated correspondingly. Table 3 summarizes the overall average bias, variance and the RMSE for the original and bias-corrected estimators using K = 3. The bias-corrected estimators do have a smaller bias and larger variance than the original estimators, but the improved bias properties do not appear at the expense of any significant increase in RMSE. When n = 15, the bias-corrected estimator only have a slightly larger RMSE for the MLE's, avoiding the rather large bias of the original estimators. When n = 30, the RMSE is actually smaller for the bias-corrected versions versus all of the original estimators. These results are calculated using all the 10,000 simulations for each of the selected pairs in (φ 1 , φ 2 ) within a fine grid of the stationary area

Sampling distributions using a Gaussian copula representation
Similar to the AR(1) case, we move on to find approximate sampling distributions for the estimators of the parameters of the AR(2) model. For each method and each sample size, we have fitted skew-normal densities to m = 10,000 generated samples of g(ψ r ,1 ) and g(ψ r ,2 ) where r = 1, . . . , l. Figure 10 illustrates the skew-normal approximation for a few selected pairs of the transformed original estimators where the true underlying partial autocorrelations are given by ψ ∈ {(−0.6, −0.6), (0.6, −0.6), (0.6, 0.6)}. The skew-normal densities are seen to give accurate approximations of the sampling distributions also in this case. The next step is to find approximations of the sampling distributions for the original non-transformed estimators (φ 1 ,φ 2 ) and the bias-corrected estimators (φ c,1 ,φ c,2 ). Similar to the AR(1) case, we first need to assess the relationship between the parameter estimates of the skew-normal approximations to g(ψ 1 ) and g(ψ 2 ) as a function of the true underlying parameters. To sample from the resulting bivariate distribution, we also need to be able to predict the correlation ρ = Cor(g(ψ 1 ), g(ψ 2 )). The full set of estimated parameters is then given bŷ θ r = {θ 1,r ,θ 2,r , . . . ,θ 7,r } = {μ 1,r ,σ 1,r , ln(ξ 1,r ),μ 2,r ,σ 2,r , ln(ξ 2,r ), logit(ρ r )} for r = 1, . . . , l. Using the same approach as in the AR(1) case, each of the seven parameters θ s,r ∈θ r is used as response variable in separate orthogonal polynomial regression models given bŷ q (g(ψ r ,1 ), g(ψ r ,2 )), s = 1, . . . , 7, r = 1, . . . , l The estimated coefficients {b k,q,s } are again found by ordinary least squares. Using K = 3, this gives a set of 10 regression coefficients for each of the seven parameters, stored for all methods and sample sizes.  The results are based on 10,000 simulations where ψ 1 and ψ 2 are randomly generated from (−1, 1) In generating samples for the original and bias-corrected estimators, we need to preserve the correlation between the transformed partial autocorrelations. This can be done by constructing a two-dimensional Gaussian copula by The function ρ (.) denotes the joint cumulative distribution of a bivariate standard normal vector with correlation between the components being equal toρ. The functions F sn (.) represent the skew-normal cumulative distribution functions (cdf) of g(ψ 1 ) and g(ψ 2 ). By the probability integral transform, represents samples from the given skew-normal densities where the uniformly distributed variables u 1 and u 2 are generated from the inverse cdf of the standard normal marginals of the bivariate distribution. The resulting samples for (φ 1 ,φ 2 ) and (φ c,1 ,φ c,2 ) can be used to find confidence intervals for (φ 1 , φ 2 ). We performed a similar simulation study as in the AR(1) case, generating 10,000 AR(2) processes where the partial autocorrelation coefficients are drawn randomly from (−1, 1). The coverage probabilities of the estimated 95% confidence intervals are given in Table 4 for sample sizes n = 15 and n = 30. We do notice that the coverage probabilities using the original estimators are smaller than the nominal level in all cases. In particular, the coverage is very low for φ 2 giving values below 0.80 also when n = 30. The confidence intervals using the bias-corrected estimators have coverage probabilities quite close to the nominal levels being within the interval 0.95 ± 0.03.

Application in population ecology
Wildlife ecological research studies are often characterized by small sample sizes (Bissonette 1999). This can be due to sparse distributions of the animal species of interest, and also the research design in which data are collected by fieldwork. As expressed by Ives et al. (2010): " While a time series covering 40 years might represent an ecologist's entire career, such time series are short for statistical purposes". The population density of specific animal species often exhibit cyclical fluctuations which, to a great extent, are driven by the relationship between the density and the carrying capacity of the environment. This is referred to as density dependence as the population density itself regulates the growth rates of the species, see e.g. Sinclair and Pech (1996). Both AR(1) and AR(2) processes have commonly been used to estimate this type of intra-specific population dynamics of a species, among others in modeling the population cycles of small rodents (Bjørnstad et al. 1995;Hansen et al. 1999;Stenseth et al. 2003;Nicolau et al. 2020). This type of analysis calls for accurate estimation of the AR coefficients as these are interpreted directly in terms of the strength of direct and delayed intra-specific dependence.
To demonstrate the new bias correction for a real world example, we consider a dataset on gray-sided voles (Myodes rufocanus), collected by the Japanese Forestry Agency at 85 different sites in the Hokkaido island, Japan (Saitoh et al. 1998). The dataset includes time series for the raw counts of the voles at each site, collected during both spring and fall for a total of 31 years . These data have been extensively studied in the literature and AR(2) model approximations have been used to assess the strength of density dependence, periodicities and synchrony of the vole populations (Stenseth et al. 2003;Hugueny 2006;Cohen and Saitoh 2016). Here, we fit the AR(2) model to the log of the density estimates for the annual fall time series derived by Cohen and Saitoh (2016) which account for differences in sampling effort. The data were downloaded from the Supporting Information of Cohen and Saitoh (2016), available at https://doi.org/10.1002/ecy.1575. Specifically, we used the log of the density estimates given in the file App3BayesCountsParameterEstimates.csv in their Zip archive.
The left panel of Fig. 11 illustrates the estimated AR(2) coefficients for the 85 time series using the exact MLE (black) and the corresponding bias-corrected estimates (red). About twothirds (57/85) of the original pairs of estimates are within the pseudo-periodic area which implies cyclic population dynamics where shorter periods imply stronger density dependence (Stenseth et al. 2003). As expected, the difference between the original and bias-corrected estimates are not very large for this dataset as the time series length is n = 31 years. However, the bias correction is systematic in the sense that the estimates of φ 1 and φ 2 are mainly shifted right and upwards, respectively. This is further illustrated in the middle and right panels of Fig. 11 showing scatterplots of the corrected versus the original estimates. In correspondence with our simulation results, the bias correction of φ 1 is quite small for estimates that are not too close to the borders of the triangle, while φ 2 is increasingly underestimated for larger values of the parameter. The given bias-correction implies slightly weaker estimated direct density dependence implying longer periods and two of the series can no longer be considered to be cyclic as the pairs of autoregressive coefficients fall outside of the pseudo-periodic area. Overestimation of the strength of density dependence have been associated with ignoring sampling variance (Stenseth et al. 2003) and ignoring the estimation bias could thus add to this overestimation.
Further use of the given bias correction for AR processes in population dynamics could include estimation of other ecologically interesting characteristics like the spectral density function, measures of cyclical frequency and measures of spatial synchrony. A simple example is the average length of the stochastic cycles for AR(2) processes This is a non-linear function of the coefficients, but for simplicity we can use the original and bias-corrected estimates as plug-in estimates in this formula. For the 55 time series where both the original and bias-corrected ML estimates were within the pseudo-periodic area, the mean average cycle length increased slightly froml original = 4.1 tol corrected = 4.4 using the bias-corrected estimates. Naturally, this analysis is quite simplistic as we have neither taken sampling error nor spatial correlation between sites into account. It is well-known that spatial heterogeneity in population densities might have important implications for the population dynamics (Thorson et al. 2015), and a proper ecologically valid analysis of these time series should take this into account. Also, a more thorough study of the implications of using the bias-corrected estimator in estimating various characteristics of population dynamics represents interesting topics for future research.

Concluding remarks
The simplicity and parsimonious parameterization of first and second order AR processes make them attractive as plausible models for short time series. The AR(1) model reflects the Markov property, providing an important extension to a temporal independence assumption. The AR(2) process is more flexible and is particularly useful in modeling pseudo-periodic dynamics. The bias involved in estimating the coefficients of short AR processes has been well-known for decades but this still remains a practical problem even for the simple first and second order models. The choice of estimator does make a difference for small sample sizes and incautious use of commonly applied estimators might give misleading results. The default choice using the ar-function in R gives the Yule-Walker estimates which are clearly not optimal, neither for short nor long time series. As stated by Tjøstheim and Paulsen (1983), "uncritical use of Yule-Walker estimates may be hazardous".
The main goal of this paper was to investigate the finite-sample properties of commonly used estimators for the coefficients of AR(1) and AR(2) processes and provided bias-corrected versions of these estimators. This was achieved by modeling the true parameter values in terms of estimated values for a huge number of simulations, accounting for the sampling distribution of the original estimators and different sample sizes. The model fitting step was computationally expensive but needed to be done only once for each of the original estimators and each sample size, providing the regression coefficients which are used to compute the bias-corrected estimates.
The asymptotic behavior of the estimators of AR coefficients have been thoroughly studied in literature, while less effort has been made in studying their finite-sample properties. The given simulation-based approach makes it possible to find skew-normal approximations to transformations of the coefficient estimates, in which the parameters of the skew-normal distributions are modelled in terms of the estimated AR coefficients. This provides approximate finite-sample distributions for both the original and bias-corrected estimators. These distributions are then used to give approximate confidence intervals for the AR coefficients.
The given simulation-based approach cannot easily be generalized to higher order AR processes, but this was never our intention. For short time series, it does make sense to apply simple and well interpretable models like the AR(1) and AR(2) processes, avoiding potential overfitting. Possible extensions of the given approach include studying finite-sample properties of estimators for other parsimoniously parameterized models like moving average (MA) processes of order 1 or 2 or the combined AR and MA model, ARMA(1,1). Naturally, this requires that the simulations and the regression model fitting steps have to be repeated. Even though this is computationally expensive, it is feasible with access to supercomputers, especially as the optimization step only needs to be done once for each method and sample size.
Funding Open Access funding provided by UiT The Arctic University of Norway Data Availability The data analysed in Sect. 4 were downloaded from the Supporting Information of Cohen and Saitoh (2016), available at https://doi.org/10.1002/ecy.1575. Specifically, we used the log of the density estimates given in the file App3BayesCountsParameterEstimates.csv in their Zip archive.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix: R-package
To make the presented methodology easily available, we include the R-package ARbiascorrect which can be used to obtain the bias-corrected estimates and the resulting approximate 95% confidence intervals, in addition to 95% confidence intervals using the original estimates. The package consists of only one function as defined below. The user can either use the relevant time series as input or the parameter estimates found by one of the original methods including the exact and conditional MLE, Burg's algorithm or the Yule-Walker solution. The R-package can be downloaded and installed from https://github.com/ pedrognicolau/ARbiascorrect, or installed directly in R through the devtools package by running: devtools::install_github("pedrognicolau/ARbiascorrect-v1") The specifications of this package are as follows: Description Gives bias-corrected estimates and 95% confidence intervals for autoregressive coefficients of AR(1) and AR(2) processes, for sample sizes n = 10, 11, . . . , 50.

Arguments
phi Single value (AR1) or two-dimensional vector (AR2) containing the coefficient estimates subject to bias correction. Not required if the time series x is used as input.
n Integer for the length of the time series. Needs to be between 10 and 50. Not required if the time series x is used as input. method Character string specifying the method used to estimate the autoregressive coefficients. Needs to be either mle, cmle, burg or yw, specifying the exact MLE, the conditional MLE, Burg's method and the Yule-Walker solution, respectively. order Order of the estimated autoregressive process. Needs to be provided if x is used as input.
x The time series to be fitted by AR(1) or AR(2) in which order needs to be specified. The default estimation method is the exact MLE if none is specified.
Value phi.hat The original estimates of the AR coefficients phi.correct The bias-corrected estimates ci.hat The 95% confidence interval using the original estimates ci.correct The 95% confidence interval using the bias-corrected estimates