Linear approximation of the Threshold AutoRegressive model: an application to order estimation

This paper proposes a linear approximation of the nonlinear Threshold AutoRegressive model. It is shown that there is a relation between the autoregressive order of the threshold model and the order of its autoregressive moving average approximation. The main advantage of this approximation can be found in the extension of some theoretical results developed in the linear setting to the nonlinear domain. Among them is proposed a new order estimation procedure for threshold models whose performance is compared, through a Monte Carlo study, to other criteria largely employed in the nonlinear threshold context.


Introduction
The complexity of nonlinear time series models often makes it difficult to analyze the main features of their dynamics, the derivation of their statistical properties and the identification and estimation of the models. In this case the issues developed in the linear domain cannot be further adapted and new tools need to be introduced.
We herein propose a bridge between the linear AutoRegressive Moving Average (ARMA) model and the nonlinear Self-Exciting Threshold AutoRegressive (SETAR) model (Tong and Lim 1980;Tong 1990). We introduce a linear approximation of the threshold model that offers the opportunity to investigate the dynamics of the generating process.
In more detail, we show that the proposed linear approximation of the nonlinear SETAR process is given by an ARMA model. Even though one might expect that this approximation is given by an autoregressive (AR) process, we show that there is also a moving average (MA) component.
The linear approximation is obtained using the best (in L 2 norm) one-step-ahead predictor of the SETAR process, and the corresponding coefficients of the ARMA approximation are theoretically derived.
Then, leveraging these results which allow us to derive the ARMA model that best approximates the threshold structure, we introduce a new procedure by which to estimate the order of the SETAR process, in what can be considered one of the potential applications of the proposed linear approximation.
Before going into the details of our proposal and to better introduce our contribution, we need to clarify the distinction between linear representation and linear approximation of a nonlinear process. The linear representation of nonlinear models was extensively presented in Francq and Zakoïan (1998), and in the references therein. It is based on a new parametrization of the nonlinear model that leads to a weak ARMA representation, where the adjective weak relates to the fact that the assumptions, usually given on the noises of the ARMA model (such as independence), are not satisfied by the linear ARMA representation. (For an example of linear representation of the subdiagonal bilinear model, see Pham 1985).
In this domain, more recently, Kokoszka and Politis (2011) used the definition of weak linear time series, and even showed that Autoregressive Conditional Heteroscedasticity-type and Stochastic Volatility-type processes do not belong to this class.
The definition of linear approximation of X t is instead based on different issues. The aim is to introduce of an approximation that allows one to distinguish two components in the process X t : with X L t the linear approximation and Y t the nonlinear component, where X L t is given by the linear causal process: with w 0 ¼ 1, P 1 j¼0 jw j j\1 and fe t g a sequence of independent and identically distributed (i.i.d.) random variables, with E½e t ¼ 0 and E½e 2 t ¼ r 2 . The derivation of the linear approximation X L t is associated in the literature with the use of the Volterra series expansion of X t , where X L t is obtained considering the first-order term of the expansion of X t . (For an example of the bilinear model, see Priestley 1988, p. 58.) The spirit of the Volterra expansion has inspired various contributions to linear approximation, with applications in many domains (among others see Huang et al. 2009 andSchoukens et al. 2016).
In this study we obtain a linear approximation of X t by borrowing the definition of the best mean square one-step-ahead predictor (see among others Brockwell and Davis 1991, Sect. 2.7).
The first results are presented in Proposition 1, whereas the elements included in X L t are further discussed in Remark 1 that leads to the proposed linear approximation of X t . Moreover, we show that the proposed linear approximation is an ARMA process and we use this result in the order estimation of the threshold process, thus confining the computational effort needed to estimate the order of a nonlinear process to the linear domain.
In particular, in Sect. 2 we establish the conditions under which a linear approximation of X t can be obtained. In Sect. 3 the linear approximation is used to implement a new order estimation procedure whose consistency property is shown; in Sect. 4 some extensions of the linear approximation are given after removing some conditions fixed in the previous pages, and the role played by the SETAR parameters in the approximation is discussed; in Sect. 5 the order estimation procedure is evaluated and compared to other information criteria, through a Monte Carlo study. Some final comments are given at the end, and all proofs are given in the Appendix.

Linear approximation of the SETAR process: main results
Let fX t g t2Z be a SETAR nonlinear process, given by: where fe t g is a sequence of continuous i.i.d. R-valued random variables with mean zero and E½e 2 t ¼ r 2 e \1, IðÁÞ is the indicator function, k is the number of regimes, p j is the order of the autoregressive regimes, X tÀd is the threshold variable, d is an integer representing the threshold delay, p j is a nonnegative integer, k and d are positive integers, R j ¼ ðr jÀ1 ; r j for j ¼ 1; . . .; k À 1 with À1 ¼ r 0 \r 1 \. . .\r kÀ1 \r k ¼ 1, and R k ¼ ðr kÀ1 ; þ1Þ are subsets of the real line such that R ¼ S k j¼1 R j . In the following, to find a stochastic linear approximation of the process X t , we use the L 2 norm given by set of information on X t available up to time t À 1.
To show the theoretical results, we take advantage of an alternative representation of the threshold process that, to avoid heavy notation (which would not help in understanding the issues), is assumed to have k ¼ 2 regimes (the case with k [ 2 is discussed in Sect. 4.3) and null intercepts (/ ðjÞ 0 ¼ 0, for j ¼ 1; 2; . . .; k). This guarantees the following form to process (2): with I the identity matrix, 0 the null vector, and the autoregressive order p (the assumption that the two regimes have the same autoregressive order simplifies the presentation of results and can be easily met including null coefficients in the model (see Sect. 4.2)). Without loss of generality, we can suppose that d ¼ 1 and r 1 ¼ 0 (these assumptions will be further discussed in Sect. 4.1); then, the process (3) can be written as where Note that all results given in the following are based on the assumptions of strict stationarity (refereed to as stationarity in the next pages) and ergodicity of the process. In the threshold domain the milestone in this framework is given by Petruccelli and Woolford (1984), who state a set of sufficient and necessary conditions for the stationarity and ergodicity of a SETAR(2;1) model; sufficient conditions for the more general SETAR(2;p) process, are given in Chan and Tong (1985) and Bec et al. (2004). There are at least two difficulties inherent in obtaining the linear approximation of the SETAR model; these make this model different from other nonlinear models, such as Markov Switching structures (Hamilton 1989). First, the SETAR process (5) has stochastic coefficients that depend on the process X t itself: it implies, among others things, that it is not easy to obtain its moments (whose sufficient conditions for their existence are given in Lemma 2 in the Appendix). Second, the SETAR process does not fulfil many regularity conditions that can assist in obtaining the linear approximation. (For example, the first derivative of X t , carried out to derive the coefficients of the first-order Volterra expansion (see Priestley 1988, p. 26), may not exist when the skeleton of the process assumes a value equal to the threshold value.) Starting from these two points, to derive a linear approximation of a SETAR(2; p) process, we need to introduce an additional notation that simplifies the presentation.
Let X t ¼ e T 1 X t be the SETAR(2;p) process (5), with e 1 a ðp Â 1Þ vector with 1 as its first element and all remaining ðp À 1Þ elements are zero, whereas A > is the transpose of A. Further, let be two matrices where p is a ð2p Â pÞ matrix, with p ¼ PrðX t 0Þ, I p is an identity matrix of order p and is the Kronecker product, whereas P is a ð2 Â 2Þ matrix, with p ij ¼ PrðX t 2 R j jX tÀ1 2 R i Þ, for i; j ¼ 1; 2 and R 1 ¼ ðÀ1; 0, R 2 ¼ ð0; 1Þ. Moreover, consider the following matrix: with U i , for i ¼ 1; 2, defined in (4). Given the previous notation, we can now introduce the linear approximation of X t . Let X t ð1Þ be the best (in L 2 norm) one-step-ahead predictor for the SETAR(2;p) process, X t ð1Þ ¼ E½X t jI tÀ1 ; we can now state the following results.
Proposition 1 Let X t , t 2 Z, be a stationary and ergodic SETAR(2;p) process with Eðe 2 t Þ\1 and c 2 \1 (with c 2 given in Lemma 2 in the Appendix); then, given the linear process X L t as in (1), we have the following results in L 2 norm: where I tÀ1 ¼ fX tÀ1 ; X tÀ2 ; . . .g and the subscript t À 1 in X L tjtÀ1 is due to the conditional set I tÀ1 .
Proof See the Appendix.
The result of Proposition 1 is twofold. First it gives, for the SETAR(2; p) model, a representation of the optimal one-step-ahead predictor, X t ð1Þ, which is a generalized linear process with coefficients g j ðI tÀ1 Þ that relate to the process itself. Second, this optimal predictor corresponds to X L tjtÀ1 when the minimization is done with respect to the linear process X L t . Now we need to emphasize the following key remark.
Remark 1 Proposition 1 gives a representation of the best one-step-ahead predictor for the SETAR(2;p) process. However, the quantities g j ðI tÀ1 Þ are not easly managed because they are nonlinear functions of the observations. An easy way to derive a linear process, as defined in (1), is to consider the expectation of g j ðI tÀ1 Þthat is If we suppose that c 1 \1 (see Lemma 2) and use the same arguments as in the proof of Lemma 2, we have that g ð2Þ j ¼ Oðc j 1 Þ, 8j ! 1. So, it follows that P 1 j¼0 jg ð2Þ j j\1. Then, we have the following linear approximation of the SETAR(2;p) process h Further note that Proposition 1 gives the best (in L 2 norm) one-step-ahead predictor for the SETAR(2;p) process, whereas the linear process in (10) is a projection of the SETAR(2;p) predictor in the space of the linear predictors. Many features of this linear process will be analysed in the following; the extension to the SETAR(2;p 1 ; p 2 ) case, with p 1 6 ¼ p 2 , is discussed in Sect. 4.2.
First of all, we provide an example to illustrate the linear approximation in (10).
Example 1 Let X t be a stationary and ergodic SETAR(2;1) process given by To simplify the computations, we suppose that the transition matrix P ¼ 0 1 1 0 , which implies that p ¼ 1=2 and that the two parameters / 1 and / 2 are negative. Moreover, the matrix K, defined in (8), is By (9) and (27) Set j ¼ 2u and j ¼ 2u À 1 for even and odd cases of j, respectively. Since it follows that Finally, note that the coefficients g ð2Þ j derived through Proposition 1 and Remark 1 exhibit the same ergodic conditions given in Petruccelli and Woolford (1984). Ä The next theorem shows a more precise characterization of the linear approximation (10) of the SETAR(2;p) process.
Proof See the Appendix.
What is stated in Theorem 1 allows us to make some interesting remarks.
Remark 2 The assumptions in Theorem 1, Eðe 2 t Þ\1 and c 2 \1, are sufficient to have a finite second moment, EðX 2 t Þ\1 (see Lemma 2 in the Appendix). Further, c r given by c r ¼ pkU 1 k r þ ð1 À pÞkU 2 k r is always less than one when the two regimes are also stationary, with kU 1 k\1 and kU 2 k\1; in the other cases, however, this condition needs to be verified case by case. h Remark 3 Theorem 1 provides the linear approximation of the SETAR(2;p) process, given by the ARMA(2p,2p) model. Looking at the proof of Theorem 1 in the Appendix (Eq. (30)) we have the exact expression of the linear process X ð2Þ t in (10) that even allows to theoretically derive the coefficients of the ARMA(2p,2p) model. For simplicity, suppose that the matrix K, in (8), has different eigenvaluessay k 1 ; . . .; k 2p . By (30) and after some easy but long algebra, we have two polynomials of degree 2p related to the autoregressive and moving average component of X ð2Þ t whose coefficients are given respectively by: for v u 2 f0; 1g and where the sums in (11) are made with respect to all different i and j À 1 groups of ones in the vectors v and v Àk , respectively. h To emphasize the issues in Theorem 1 and illustrate how the coefficients (11) can be used, consider the following example.
Example 2 Let X t be a stationary and ergodic SETAR(2;1) process with the matrices of the coefficients U 1 ¼ f/ 1 g and U 2 ¼ f/ 2 g.
The matrix K in (8) becomes For simplicity, suppose that the matrix K has two different eigenvalues -say k 1 and as in the proof of Theorem 1. Let c k , k ¼ 1; 2, be the component-wise product of the elements in the vectors P 1 and P 2 . By Theorem 1, we have that the linear approximation of X t is given by an ARMA(2,2) model -that is, where (11) in Remark 3. Ä Furthermore note that from Remark 1 it follows that the coefficients of the approximation (10) are based on the expectation (9) that makes X ð2Þ t a suboptimal linear approximation, in the sense that it is not the best linear one-step-ahead predictor of X t (see Proposition 1). Moreover, it has the great advantage of establishing a clear correspondence between the ARMA and SETAR orders, as stated in Theorem 1. This correspondence is the key element that overshadows the need to evaluate how X ð2Þ t approximates X t , which becomes only a theoretical issue lacking any empirical application in identifying the SETAR model (as discussed in Sect. 3).
Another key remark needs to be introduced to discuss the order of the ARMA approximation.
Remark 4 The order 2p of the ARMA approximation is the maximum value of the AR and MA components, and this correspondence between the order of the ARMA approximation and the order of the autoregressive regimes of the SETAR process could be used to estimate the autoregressive order of the threshold process (usually left to information criteria). h To emphasize this last remark and to give empirical evidence that 2p is the maximum order of the ARMA approximation (where 2p could be greater than the ''true'' ARMA order), consider the following example.
Example 3 Let X t $ SETAR(2;1) with coefficients / It is easy to note that this last equality implies the degeneration of the SETAR(2;1) model to an AR(1) structure, and that this degeneration needs to be found even in (10). This can be verified because, in this case, the transpose of the vector p in (7) becomes (1, 0) (or (0, 1)), whereas P ¼ I 2 and then K ¼ . . .; and so: which is the MA(1) representation of the AR(1) process. Ä The result given in Example 3 leads to an important evaluation: when X t is a linear autoregressive process then X ð2Þ t is identically equal to X t and is no longer an approximation.
Another important consequence of the result of Theorem 1 and Remark 4 is the fact that we can build a SETAR(2;p) process whose linear approximation is null. This is important for two reasons. First we can subtract from X t the linear approximation, and then we can mainly focus on the structure of the residuals. Second, by a proper parametrization, we can find a SETAR(2;p) process whose linear approximation (10) is null.
This last evaluation is summarized in the following Corollary.
Corollary 1 Let X t , t 2 Z, be a stationary and ergodic SETAR(2;p) process. Under the assumptions of Theorem 1, there exists a SETAR(2;p) process such that its linear approximation (10) is null -that is g Proof See the Appendix.
To illustrate the result of this Corollary consider the following example.
Example 4 Let X t be an ergodic SETAR(2;1) process. As shown in the proof of Corollary 1, a sufficient condition to obtain a SETAR(2; p) process with null linear approximation, is given by To give empirical evidence of this result, we generated n ¼ 200 artificial data from a SETAR(2;1) model with autoregressive coefficients / 1 ¼ À0:58 and / 2 ¼ 0:58. The correlogram on the left side of Fig. 1 clearly shows the absence of the linear component whereas the correlogram of X 2 t on the right side, gives evidence of nonlinearity in the generating process. Ä Corollary 1 also plays an important role in the order estimation presented in the next section, because if we have a SETAR(2;p) model with p [ 0, it can happen that the linear approximation in Theorem 1 is a white noise, and then the 2p order of the ARMA process will not match that of the SETAR(2;p) model (as discussed in Remark 4). This implies that in this case, the results of Theorem 1 cannot be used in the order estimation. To manage this problem and broadly use the results of Theorem 1 to estimate the autoregressive order of the threshold model, some additional conditions need to be added (see Theorem 2). In the following Remark we give a bound for the residual part, coming out from the proposed linear approximation, X ð2Þ t , in L 2 norm.
Remark 5 First, for simplicity, suppose that X ð2Þ t ¼ e t , that is the linear approximation is null. Otherwise, we can always consider the process X t À X ð2Þ t . So, we have to evaluate the following quantity By using the representation of the SETAR process as in Lemma 1, if c 4 \1 the main upper bound (in the sense that we take into account the series of the squared terms) of (13) is whereas the other quantities correspond to those defined in (8). This result follows by using the same arguments as in the proof of Theorem 1. Ä Then, by focusing on the proposed linear approximation, the threshold autoregressive order can be consistently estimated, thus restricting attention to the linear time series domain. As will be discussed in Sect. 3, this has remarkable advantages not only from a theoretical point of view but even empirically, because the computational burden of the order estimation procedure of a nonlinear time series model is restricted to its linear (and less complex) approximation.

Order estimation
Identify the time series models is a crucial step in the 'iterative stages in the selection of a model' (Box and Jenkins 1976) that needs to preserve the parsimony of the model and makes a heavy impact on the computational effort needed to estimate the parameters.
In the linear domain, and in particular in the ARMA context, the identification has well-established results based on the relation between the ARMA parameters and the total/partial autocorrelation (at different lags) of the generating process.
As expected, these results -which relate strictly to the linear dependence -cannot be extended to the nonlinear domain. The complexity of the nonlinear time series structures has led to the model selection and identification largely discussed in the literature, which often focuses on information criteria and their performance (see Psaradakis et al. 2009;Emiliano et al. 2014;Rinke and Sibbertsen 2016). If we focus on order estimation in the SETAR domain, we find it looks different: Kapetanios (2001) proposes a consistent information criterion by which to estimate the lag order of the autoregressive regimes; Wong and Li (1998) introduce a correction to the Akaike Information Criterion (AIC; Akaike 1974) to correct its bias in the presence of SETAR models; and De Gooijer (2001) proposes crossvalidation criteria to select the autoregressive order of these nonlinear structures. Galeano and Peña (2007), modifying the model selection criteria introduced by Hurvich et al. (1990), propose the inclusion of a determinant term related to the estimated parameters of each regime; furthermore, more recently, Fenga and Politis (2015) evaluated a bootstrapped version of the AIC in the SETAR domain, and defined the procedure step by step.
The results given in Sect. 2 could be used in this context to introduce a new approach to estimating the autoregressive order of the threshold regimes. For simplicity, assume that both regimes have the same order. (As noted before, this is not a limitation because zeroes could be included in the vector of parameters.) The proposed procedure is based on the linear approximation of the SETAR model as given in Theorem 1. We call this procedure Linear AIC (L-AIC) and it is based on the two main steps that are discussed below and summarized with the pseudo-code in Algorithm 1.
Let X t be a stationary SETAR(2; p) process. The first step starts by fixing the maximum order for p (p max ) and then for each p ¼ 1; 2; . . .; p max : (1:a) estimate the parameters of the SETAR(2; p) model; (1:b) compute the eigenvalues of the matrix K in (8) (1:c) compute the estimates of the autoregressive parameters -sayâ j -for j ¼ 1; 2; . . .; 2p, of the linear approximation using the results in (11).
Given the p max -estimated SETAR models, the second step is based on a parametric bootstrap approach with B bootstrap replications. In particular, for b ¼ 1; . . .B: (2:a) generate the bootstrap independent innovations fe Ã t g, t ¼ 1; . . .; n þ 2p max , from a random variable with mean 0, variance r 2 , and with n the time series length; for i ¼ 1; . . .; p max run the steps 2.b) and 2.c): (2:b) generate the artificial time series from the AR(2i) models by using the innovations fe Ã t g in 2.a) and the coefficients estimated in 1.c). Then we have X (2:c) for each artificial time series i, select the autoregressive orderp AICðjÞ, with j the order of the autoregressive process fitted to X Ãð2Þ t , whose coefficients are obtained through the Yule-Walker estimators; (2:d) using thep ðiÞ b obtained from steps 2.b) and 2.c), estimate the autoregressive order of the SETAR(2;p b ) model such that: b j 6 ¼ 0g; for i ¼ 1; 2; . . .; p max À 1: Given the B selected orders (one for each bootstrap replication), the SETAR model has autoregressive orderp such that with #p b being the empirical frequency ofp b . Note that: in step 1.b), given the ergodicity of the process X t , the probabilities included in the matrix P of (8) can be consistently estimated using the empirical frequencies n ij ¼ P n t¼2 I X t 2 R i jX tÀ1 2 R j À Á , for i; j ¼ 1; 2, such thatp ij ¼ n ij = P 2 j¼1 n ij (see among others Anderson and Goodman 1957); in step 1.c)) and all the elements included in the second step of the procedure, only the linear AR model is involved; this is because, given the results of Theorem 1 and Remark 3, the AR and MA components have the same order 2p. Moreover, the AR part of the ARMA approximation shares all the elements used in order estimation. Therefore, to simplify the procedure, we can only choose the AR component. (The extension to SETAR models with different autoregressive order is discussed in Sect. 4.2.) Further, from the computational point of view, the use of theoretical issues related to linear models (instead of to nonlinear ones) in the second step, makes the algorithm quite fast.
This procedure has two important aspects. First, the linear approximation process, X t , is not observable, and so it is a sort of latent process. For this reason, in step 2.a) we generate for each p a sequence of i.i.d. innovations (for example, from the standard Gaussian distribution) and we estimate the parameters of the linear approximation by using (11). In this way, we build the process in (15) that is always a well-defined AR(2i) for each i; this is the key point for using the selection rule in step 2.d).
Moreover, our procedure is motivated by two main considerations. First, when we estimate a linear model in the SETAR(2;p) data generating process, the residuals are not a sequence of i.i.d. random variables, because they also capture the nonlinear structure of the SETAR process. Second, even if we could estimate the parameters of the linear approximation by using the results in Francq and Zakoïan (1998), it is not easy to derive the variance-covariance matrix of the estimators, in which case, the results (11) should be preferred.
Before stating the next results about the consistency of the L-AIC procedure, we need to report some regularity assumptions that refer to the Assumptions of Theorem 1 of Chan (1993).
Assumptions (H1). Let X t be a nondegenerate SETAR(2;p) process where the coefficients of the two regimes are not equal and 0\p\1. Moreover, (1) the process X t is stationary and ergodic (see Chan and Tong 1985;Bec et al. 2004); (2) there exists the univariate density function of X t with respect to its invariant distribution, which is positive everywhere. h Finally, starting with the AIC behaviour in Shibata (1976), we define an asymptotically type-AIC consistent procedure if the asymptotic distribution of the chosen order is lim n!1 with p 0 the true order andp the estimator of the autoregressive order that depends on the series length n.
Proof See the Appendix.
Note that the assumption p 11 6 ¼ p in Theorem 2 is relevant because it guarantees that the autoregressive process in Theorem 1 is exactly of order 2p (see Remark 4) and then what stated in Corollary 1 is prevented.
The proposed procedure and its consistency were empirically evaluated in a Monte Carlo study (see Sect. 5) that considers various SETAR models characterized by various degrees of complexity and nonlinearity.

Extensions
In this section we examine the proposed linear approximation of the SETAR process and evaluate three different aspects. First, we show the role of the threshold and delay parameters in this linear approximation. Second, we study the case where regimes have different orders. Third, we generalize the result of Theorem 1 when the number of regimes is more than two.

Threshold and delay parameters
To write the linear approximation for any values of the threshold and delay parameters, we consider the results in Sect. 2 presented for the case of two regimes with the same order. For simplicity, we consider only the AR part of the linear approximation. So, the matrix K in (8) becomes where the threshold and delay parameters are r and d, respectively. The matrix P r;d is defined as ij;r ¼ Pr X t 2 R j jX tÀd 2 R i À Á , i; j ¼ 1; 2 and R 1 ¼ ðÀ1; r, R 2 ¼ ðr; þ1Þ. All the other quantities are the same as in (8).
First, note that the delay parameter has the role of the order for the transition probabilities in P r;d . It follows that P r;d ¼ P r;1 À Á d . Then, the matrix K r;d can be written as Now, looking at the form of the matrix K r;d , we can note that: 1. the parameters r and d involve only the probabilities in the matrix P r;d ; 2. in general, assuming that the SETAR model does not degenerate into an AR process, the rank of the matrix K r;d depends only on the matrices of the coefficients U 1 and U 2 ; 3. the procedure for the order estimation (proposed in Sect. 3) needs only evaluate the difference between two successive orders. Then, it mainly depends on the matrices of the coefficients and not on the matrix P r;d .
By using the previous considerations, we can argue that our procedure for the order estimation is independent of the threshold and delay parameters. This means that in identifying the SETAR model, we can separate the estimation of the threshold and delay parameters from the order estimation of the regimes.

Different orders in the regimes
Suppose that we have a SETAR model with two regimes with two different autoregressive orders -say p 1 and p 2 . Moreover, suppose that the threshold and delay parameters are zero and one, respectively. Set p 0 ¼ maxfp 1 ; p 2 g. Let p 1 \p 2 and p 2 ¼ p 1 þ m. Then, the matrix K is the same as in (8), except that matrix U 1 becomes: In this case, K is a 2p 0 matrix with a 2p 0 ¼ a 2p 0 À1 ¼ . . . ¼ a 2p 0 Àmþ1 ¼ 0 by (11) and using the fact that the matrix K has m eigenvalues equal to zero. This means that the AR part of the linear approximation is ARðp 1 þ p 2 Þ. Note that the transition probabilities in the matrix P are well defined in the sense that the irreducibility property of the Markov Chain with respect to the two regimes is still valid.
In this case, the procedure for the order estimation (see Sect. 3) is applied properly, changing Algorithm 1. The first step is modified by fixing a grid of candidate values for p 1 and p 2 such that p 1 ¼ 1; . . .; p 1;max and p 2 ¼ 1; . . .; p 2;max . In the second step, the p 1;max Â p 2;max estimated SETAR models are used to carry out the bootstrap replications, with p 1;max þ p 2;max innovations in row 13. A double cycle given by i ¼ 1; . . .p 1;max and s ¼ 1; . . .p 2;max is considered in row 14, whereas i is replaced by i þ s in row 16. All these changes imply that the maximum in row 17 becomes: Àp ði Ã ;s Ã Þ b j 6 ¼ 0g for i ¼ 1; 2; . . .; p 1;max À 1, s ¼ 1; 2; . . .; p 2;max À 1 and with i Ã ! i, s Ã ! s, such that at least i Ã or s Ã is strictly greater than the corresponding i or s value. Finally, the computation of the empirical frequencies of ðp b 1 ;p b 2 Þ allows to conclude the algorithm.

More regimes
Suppose that we have k regimes with k ! 2. For simplicity, assume that all regimes have the same order -say p. Moreover, suppose that the delay is d ¼ 1 and the thresholds are r i , for i ¼ 1; . . .; k À 1, with À1\r 1 \. . .\r kÀ1 \1. Then, the quantities in (7), become where U i , i ¼ 1; . . .; k are the matrices of coefficients of each regime. Repeating the proof of Theorem 1, we get the expression in (30) as the sum of k Á p quantities instead of 2p. This implies that we have as a linear approximation the ARMA(k Á p; k Á p) process. Finally, in this case the procedure for the order estimation can be applied by replacing 2p with k Á p in Algorithm 1.

Simulation study
To evaluate the L-AIC procedure for the order estimation of the SETAR(2; p) process, we generated time series from four data generating processes: M1 : X t ¼ À0:8X tÀ1 I tÀ1 À 0:2X tÀ1 ð1 À I tÀ1 Þ þ e t M2 : X t ¼ 0:5X tÀ1 I tÀ1 À 0:5X tÀ1 ð1 À I tÀ1 Þ þ e t M3 : X t ¼ ðÀ0:4X tÀ1 À 0:5X tÀ2 ÞI tÀ1 þ ð0:2X tÀ1 À 0:6X tÀ2 Þð1 À I tÀ1 Þ þ e t M4 : X t ¼ ðÀ0:7X tÀ1 À 0:2X tÀ2 ÞI tÀ1 þ ð0:3X tÀ1 À 0:6X tÀ2 Þð1 À I tÀ1 Þ þ e t with threshold value r 1 ¼ 0, and e t $ Nð0; 1Þ. For each model we considered time series of length n ¼ 50; 75; 150; 200; 500 (with burn-in at 500 to discharge the effects of the initial conditions). The models M1 and M2 are those considered in Galeano and Peña (2007), where even for model M1 the condition of Theorem 2, p 6 ¼ p 11 , is satisfied; Models M3 and M4 were chosen to guarantee the assumptions of Theorem 1 and change the percentage of observations generated from each regime. (In M3 this percentage is almost identical in the two regimes, whereas in M4 the percentage of observations generated from the first regime is, on average, less than 40%.) Further, note that not all models include the intercepts: it makes it less easy to distinguishing between the two regimes, and the performance of the order estimation procedure could be affected from it. After fixing p max ¼ 5, we estimated the SETAR(2;p) models, for p ¼ 1; . . .; 5 using conditional least squares estimators, with d ¼ 1 and r 1 ¼ 0. For each estimated model we obtained the linear autoregressive approximationX Ãð2Þ t and then we started, for each approximation, the B ¼ 125 bootstrap replicates with e t $ Nð0; 1Þ. The number of Monte Carlo runs is 1000.
In the Monte Carlo study the L-AIC procedure has been compared to two other approaches.
The SETAR-AIC (Tsay 1989): where T j , p j andr 2 e j are the number of observations, the candidate autoregressive order, and the residual variance of regime j, for j ¼ 1; 2, respectively, whereas the second approach considered for the order estimation of the SETAR(2; p) model is the bAIC of Fenga and Politis (2015), which is a bootstrapped version of the SETAR-AIC.
Comparisons of the L-AIC with both the SETAR-AIC and the bAIC are presented in the left side of Table 1 where the relative empirical frequency of the selection of the true autoregressive orders of the SETAR(2; p) model are summarized. (The complement to one of the relative frequencies in the table is related to overparametrized models.) It is widely known that the AIC tends to overfit models (see among others Koeher and Murphree 1988), whereas this overparametrization is penalized by the Bayesian Information Criterion (BIC), so we extended our procedure to the BIC domain giving rise to the Linear BIC (L-BIC) procedure.
For this purpose we considered steps 1.a) -2.d) of the L-AIC procedure and replaced in step 2.c) the AIC with the BIC. Then we replicated the simulation study and compared the performance of the SETAR-BIC (Galeano and Peña 2007): to that of the L-BIC procedure. (Note that Fenga and Politis 2015 do not consider the BIC.) The results of each model are presented in the right side of Table 1. Finally note that the L-BIC procedure is made consistent by using the same arguments as in the proof of Theorem 2, together with the consistency of the BIC measure in the linear domain.
The results in Table 1 clearly show the improvement in the L-AIC procedure with respect to the SETAR-AIC and the bAIC approaches in all considered cases and correspondingly the good performance of the L-BIC if compared to the BIC criterion, mainly for small values of n. In particular, one can be note that when the distinction between the two regimes is more marked, as in model M2 where the autoregressive coefficients have opposite signs, there is an overall improvement in the competing order estimation approaches (mainly in the L-AIC case).
As the complexity of the data generating process grows with models M3 and M4, all procedures confirm their pertinence to estimating the autoregressive order of the nonlinear threshold processes, but even in this case the frequency with which the L-AIC procedure correctly detects is generally higher than that of the competing approaches based on the Akaike Criterion and this result is confirmed in the L-BIC case, even if with less marked differences between the L-BIC and the SETAR-BIC.
From the computational point of view, if we compare the L-AIC procedure with the Fenga and Politis (2015) criterion, we can note that, when r 1 , d are known and p 1 ¼ p 2 ¼ p, the effort (measured in terms of computing time) is heavier for the bAIC criterion; this is because in the bootstrap iterations, for each candidate p ¼ 1; 2; . . .; p max , we need to estimate a nonlinear threshold model instead of a linear AR(2p) model, as in the L-AIC case. In practice, a further advantage of the L-AIC procedure is that the computationally intensive steps of the bootstrap iterations are confined to the linear domain characterized by reduced complexity relative to the nonlinear domain.
Finally, to evaluate how the variability of the SETAR(2; p) process is explained by the linear ARMA(2p, 2p) approximation, we have compared the variance of the SETAR generating process with the variance of its linear ARMA approximation. For all models, M1, M2, M3 and M4, we have generated 1000 artificial time series and for each of them we have computed the ratio between the two variances. In Table 2 the means of these ratios and the corresponding standard deviations, are presented for n ¼ f200; 500g. The variance explained by the ARMA approximation is higher for model M1 whereas this ratio is lower for model M4 that is characterized by high nonlinearity. The artificial time series have been further evaluated to investigate the dependence between X ð2Þ t and the generating process X t . For this aim, in Table 2 we have considered, for each model, the mean of the correlations computed between the ARMA approximation X ð2Þ t

Conclusion
It is widely known that when the behaviour of the autocorrelation function (ACF) is observed for a time series X t generated by a SETAR(2; p) model, we can easily confuse the ACF of the threshold process with that of a linear ARMA structure. Different motivations can be cited for this empirical evaluation: among them is the fact that the linear autoregressive model is nested in the SETAR model because it can be seen as a degeneration of the SETAR structure when all data belong to the same regime.
In this study we investigated the relation between the SETAR and ARMA models. We showed that when X t $ SETAR(2; p), under proper conditions, the linear approximation (10) of X t is X ð2Þ t $ ARMA(2p, 2p). We theoretically showed this result and even clarified when it could not be applied and how it can be generalized when the number of regimes is greater than two or the regimes have a different autoregressive order. Further, using linear approximation, we proposed an order estimation procedure called Linear-AIC to estimate the autoregressive order of the two regimes of X t ; the consistency of this procedure was proved and the extension of these results to the BIC domain is discussed.
The L-AIC procedure is based on two main steps: in the first step we focus on the nonlinear generating process X t and on its linear approximation X ð2Þ t , whereas in the second step we estimate the autoregressive order of the SETAR model using a parametric bootstrap approach.
Our Monte Carlo study gives evidence of the performance of the L-AIC procedure and of its BIC extension (called L-BIC). The results highlight that the L-AIC generally does a better job than both the competing SETAR-AIC in (17) and Fenga and Politis (2015) criterion (and, correspondingly, the L-BIC performs better than the SETAR-BIC criterion), even in the presence of the parametrization of the SETAR model that makes it difficult to distinguish among regimes.
The results presented herein will serve as the topic of future research that further investigates the extensions discussed in Sect. 4; future research can also study how Table 2 Mean values, over 1000 Monte Carlo replicates, of the ratios between the variance of the ARMA(2p, 2p) approximation, X ð2Þ t , and the variance of the corresponding generating process X t $ SETAR(2; p); mean values of the correlation between X ð2Þ t and Y t ¼ X t À X Lemma 1 Let X t , t 2 Z, be a stationary and ergodic SETAR(2;p) process (4), with E½je t j\1, c ¼ pkU 1 k þ ð1 À pÞkU 2 k and E½IfX tÀ1 0g ¼ p (with 0 p 1), if c\1 then X t can be given by: A I tÀ1Às e tÀi þ e t ; a:s:; for n ! 1; where A I tÀ1Às is defined in (5) and k Á k is any matrix norm.
Proof Using Theorem 1.1 in Bougerol and Picard (1992) and let it is sufficient to show that c k \0, for any k 2 N and t 2 Z. Let k Á k any matrix norm, now we have that Given the ergodicity and stationarity of X t and considered the representation (5), then E kA I tÀ1Às k ð Þ¼c, for any t and s. Noting that the assumptions of the present Lemma fit all conditions of Theorem 1.1 of Bougerol and Picard (1992), we have that c\1, for any k, and that this result holds for any matrix norm. h The next lemma proves the existence of moments of the SETAR process.
Lemma 2 Let X t , t 2 Z, be a stationary and ergodic SETAR(2;p) process with E½je t j r \1, and c r \1, where c r ¼ p U 1 k k r þð1 À pÞ U 2 k k r , for r 2 ½1; 1Þ and E½IfX tÀ1 0g ¼ p (with 0 p 1), then E½jX t j r \1 (where Á k k is any matrix norm).
Proof Let ðkX t kÞ L r ¼ E kX t k ð Þ 1=r , be the L r norm, as given in Stelzer (2009, Definition 4.1). We state with k Á k a generic matrix or vector norm on R p , since the result of this lemma holds for any matrix or vector norm. By Lemma 1 and using the same arguments given in the first part of the proof of Theorem 4.1 in Stelzer (2009) , that can be applied even in this SETAR domain, we need to prove the sufficient condition that X t belongs to L r ðR p Þ, that is given by: Define By using the same arguments as in the proof of Lemma 1, we have that since E A I tÀ1Às k k r ¼ c r by stationarity and ergodicity of X t and Eje t j r \1. Then, by (21) Since c r \1 and Q i ¼ Oðc i r Þ, the result in (20) follows. Thus, we have that E½jX t j r \1. The proof is complete. h Proof of Proposition 1 Using the results of Lemma 1, we need to minimize the following quantity Qðg 1 ; g 2 ; . . .; I tÀ1 Þ ¼ jjX t À X L t jI tÀ1 jj 2 Making the first partial derivatives of Qðg 1 ; g 2 ; . . .; I tÀ1 Þ with respect to g k , with k ! 1, and putting each equation equal to zero, we have Since the process X t is also measurable with respect toĨ tÀ1 ¼ fe tÀ1 ; e tÀ2 ; . . .g, then it is equivalent to consider eitherĨ tÀ1 in place of I tÀ1 in the conditional means. Now, it is sufficient to derive g j , 8j ! 1, in the following equations.
Without loss of generality, we can consider both e tÀj and e tÀk different from zero. So, we can write (24) as It follows that the solution in (25) is Since (22) is a square function, if we replace the solution (26) in (22), then the latter is zero. This means that we have a minimum. Then, the proof of the first part of Proposition is complete. Now, for the second part, we know that the optimal one step ahead L 2 predictor of X t is X t ð1Þ ¼ E X t jI tÀ1 ð Þ. By using again Lemma 1, we have that The proof is so complete. h Now we can prove Theorem 1.
Proof of Theorem 1 We start from the linear process (10) with coefficients, By using the stationarity and ergodicity assumption on X t , it follows that also the indicator process, I t , is stationary and ergodic. Moreover, we can apply, on the expression for g j , the Markov property and arguments similar to those given in Timmermann (2000) on the recursive representation and generation of moments of the regimes switching models. So, it follows that where p and K are defined in (7) and (8), respectively and K jÀ1 ¼ K Á K Á . . . Á K, with the matrix K multiplied ðj À 1Þ times. Then, we can write the linear process in (10) as where B is the back-shift operator such that B j e t ¼ e tÀj . For now, suppose that all the eigenvalues of the matrix K are different. So, we can write K ¼ CDC À1 , where D is a diagonal matrix with the eigenvalues on the main diagonal and C is the matrix with the corresponding eigenvectors. Therefore, we can write (28) as By Remark 1, it follows that the series P 1 j¼1 jg ð2Þ j j is convergent. Note that in (29) we have a power series and so it follows that qðKÞ\1, with qðKÞ the maximum absolute eigenvalue of K.
Set P T 1 ¼ e T 1 p T C and P 2 ¼ C À1 U 1 U 2 e 1 and let c k be the component-wise product of the corresponding elements in the vectors P 1 and P 2 , k ¼ 1; . . .; 2p. Now, we can write (29) as where k k , k ¼ 1; . . .; 2p, are the eigenvalues in the matrix D. So, we can conclude, that: X ð2Þ t $ ARMAð2p; 2pÞ: To complete the proof, we need to consider the case when two or more eigenvalues in the matrix K are equal. In this case, we can apply the Jordan matrix decomposition of K (among the others, see Lütkepohl (1996)) and the previous results still hold. The proof is so complete. h Corollary 1 states that the linear approximation of the SETAR model could be null. We here give the proof.
Proof of Corollary 1 Starting from (27) in the proof of Theorem 2, we have that g ð2Þ j ¼ e T 1 p T K jÀ1 U 1 U 2 e 1 ; j ! 1; and so it is sufficient to find one case such that g ð2Þ j ¼ 0, 8j ! 1. For this aim, suppose that the transition probability matrix (7) is : Then p ¼ 1=2. Further, suppose that the parameters in the first regime of the SETAR(2;p) model are the same, in absolute value, to the corresponding parameters of the second regime but with the opposite sign. It is easy to verify that g 8j ! 1. Since the elements in the first row of the matrix U 1 have opposite sign with respect to the elements in the first row of the matrix U 2 , we can write U 1 þ U 2 ¼ 2C, where the matrix C has all zeros in the first row and the same elements of U 1 or U 2 in the other rows. It is clear that the matrix C is idempotent and so C j ¼ C, 8j ! 1. Then, we can write (31) as g ð2Þ j ¼ 2 Àj e > 1 2C ð Þ j e 1 ¼ e > 1 C j e 1 ¼ e > 1 Ce 1 ¼ g ð2Þ 1 ¼ 0: So, the result follows. h We prove here the consistency of the identification procedure proposed.
Proof of Theorem 2 Before to enter into the details of the proof, we need to discuss the assumptions of Theorem 2. For simplicity, suppose that the eigenvalues of the matrix K, defined in (8), are all different. The assumption p 11 6 ¼ p guarantees that the linear approximation, X t , shows, at least, one coefficient different from zero (in fact, by Corollary 1, we could also have that X  (32), (37) asymptotically becomes Q 1 1 À hða 1 ; . . .; a 2p Þ Â Ã ¼ 0 with hða 1 ; . . .; a 2p Þ a continuous function and it is less than one because 1 À hða 1 ; . . .; a 2p Þ is the same expression we get to derive c X ð0Þ. So, we can conclude that Q 1 cannot be positive. The unique solution is