## 1 Introduction

The complexity of nonlinear time series models often makes it difficult to analyze the main features of their dynamics, the derivation of their statistical properties and the identification and estimation of the models. In this case the issues developed in the linear domain cannot be further adapted and new tools need to be introduced.

We herein propose a bridge between the linear AutoRegressive Moving Average (ARMA) model and the nonlinear Self-Exciting Threshold AutoRegressive (SETAR) model (Tong and Lim 1980; Tong 1990). We introduce a linear approximation of the threshold model that offers the opportunity to investigate the dynamics of the generating process.

In more detail, we show that the proposed linear approximation of the nonlinear SETAR process is given by an ARMA model. Even though one might expect that this approximation is given by an autoregressive (AR) process, we show that there is also a moving average (MA) component.

The linear approximation is obtained using the best (in $$L^2$$ norm) one-step-ahead predictor of the SETAR process, and the corresponding coefficients of the ARMA approximation are theoretically derived.

Then, leveraging these results which allow us to derive the ARMA model that best approximates the threshold structure, we introduce a new procedure by which to estimate the order of the SETAR process, in what can be considered one of the potential applications of the proposed linear approximation.

Before going into the details of our proposal and to better introduce our contribution, we need to clarify the distinction between linear representation and linear approximation of a nonlinear process. The linear representation of nonlinear models was extensively presented in Francq and Zakoïan (1998), and in the references therein. It is based on a new parametrization of the nonlinear model that leads to a weak ARMA representation, where the adjective weak relates to the fact that the assumptions, usually given on the noises of the ARMA model (such as independence), are not satisfied by the linear ARMA representation. (For an example of linear representation of the subdiagonal bilinear model, see Pham 1985).

In this domain, more recently, Kokoszka and Politis (2011) used the definition of weak linear time series, and even showed that Autoregressive Conditional Heteroscedasticity-type and Stochastic Volatility-type processes do not belong to this class.

The definition of linear approximation of $$X_t$$ is instead based on different issues. The aim is to introduce of an approximation that allows one to distinguish two components in the process $$X_t$$:

\begin{aligned} X_t=X_t^L+Y_t \end{aligned}

with $$X_t^L$$ the linear approximation and $$Y_t$$ the nonlinear component, where $$X_t^L$$ is given by the linear causal process:

\begin{aligned} X_t^L=\sum _{j=0}^\infty \psi _j\varepsilon _{t-j},\end{aligned}
(1)

with $$\psi _0=1$$, $$\sum _{j=0}^\infty |\psi _j|<\infty$$ and $$\{\varepsilon _t\}$$ a sequence of independent and identically distributed (i.i.d.) random variables, with $$E[\varepsilon _t]=0$$ and $$E[\varepsilon _t^2]=\sigma ^2$$.

The derivation of the linear approximation $$X_t^L$$ is associated in the literature with the use of the Volterra series expansion of $$X_t$$, where $$X_t^L$$ is obtained considering the first-order term of the expansion of $$X_t$$. (For an example of the bilinear model, see Priestley 1988, p. 58.)

The spirit of the Volterra expansion has inspired various contributions to linear approximation, with applications in many domains (among others see Huang et al. 2009 and Schoukens et al. 2016).

In this study we obtain a linear approximation of $$X_t$$ by borrowing the definition of the best mean square one-step-ahead predictor (see among others Brockwell and Davis 1991, Sect. 2.7).

The first results are presented in Proposition 1, whereas the elements included in $$X_t^L$$ are further discussed in Remark 1 that leads to the proposed linear approximation of $$X_{t}$$. Moreover, we show that the proposed linear approximation is an ARMA process and we use this result in the order estimation of the threshold process, thus confining the computational effort needed to estimate the order of a nonlinear process to the linear domain.

In particular, in Sect. 2 we establish the conditions under which a linear approximation of $$X_t$$ can be obtained. In Sect. 3 the linear approximation is used to implement a new order estimation procedure whose consistency property is shown; in Sect. 4 some extensions of the linear approximation are given after removing some conditions fixed in the previous pages, and the role played by the SETAR parameters in the approximation is discussed; in Sect. 5 the order estimation procedure is evaluated and compared to other information criteria, through a Monte Carlo study. Some final comments are given at the end, and all proofs are given in the Appendix.

## 2 Linear approximation of the SETAR process: main results

Let $$\{X_t\}_{t\in \mathbb Z}$$ be a SETAR nonlinear process, given by:

\begin{aligned} X_t=\sum _{j=1}^k\left( \phi _0^{(j)}+\sum _{i=1}^{p_j}\phi _i^{(j)}X_{t-i}\right) \mathbb {I}\left( X_{t-d}\in \mathbb R_j\right) +\varepsilon _t, \end{aligned}
(2)

where $$\{\varepsilon _t \}$$ is a sequence of continuous i.i.d. $$\mathbb R$$-valued random variables with mean zero and $$E[\varepsilon _t^2]=\sigma _{\varepsilon }^2<\infty$$, $$\mathbb I(\cdot )$$ is the indicator function, k is the number of regimes, $$p_j$$ is the order of the autoregressive regimes, $$X_{t-d}$$ is the threshold variable, d is an integer representing the threshold delay, $$p_j$$ is a nonnegative integer, k and d are positive integers, $$\mathbb R_j=(r_{j-1}, r_j]$$ for $$j=1, \ldots , k-1$$ with $$-\infty =r_0<r_1<\ldots<r_{k-1}<r_k=\infty$$, and $$\mathbb R_k=(r_{k-1}, +\infty )$$ are subsets of the real line such that $$\mathbb R=\bigcup _{j=1}^k\mathbb R_j$$.

In the following, to find a stochastic linear approximation of the process $$X_t$$, we use the $$L^2$$ norm given by $$\Vert X_t-X_t^L|\mathcal I_{t-1}\Vert _{L^{2}}=E\left[ (X_t-X_t^L)^2|\mathcal I_{t-1}\right] ^{1/2}$$, with $$\mathcal I_{t-1}$$ the set of information on $$X_t$$ available up to time $$t-1$$.

To show the theoretical results, we take advantage of an alternative representation of the threshold process that, to avoid heavy notation (which would not help in understanding the issues), is assumed to have $$k=2$$ regimes (the case with $$k>2$$ is discussed in Sect. 4.3) and null intercepts ($$\phi _0^{(j)}=0$$, for $$j=1,2,\ldots , k$$). This guarantees the following form to process (2):

\begin{aligned} \mathbf{X}_t=\varvec{\Phi }_1\mathbf{X}_{t-1}\mathbb I(X_{t-d}\le r_1)+\varvec{\Phi }_2\mathbf{X}_{t-1}\left[ 1-\mathbb I(X_{t-d}\le r_1)\right] + \varvec{\varepsilon }_t, \end{aligned}
(3)

where

\begin{aligned} \underset{(p \times 1)}{\varvec{X}_t}=\begin{bmatrix} X_t \\ \ldots \\ X_{t-p+1}\end{bmatrix}, \quad \underset{(p\times p)}{\varvec{\Phi }_j}= \left[ \begin{array}{ccc|c} \phi _1^{(j)} &{} \ldots &{}\phi _{p-1}^{(j)} &{} \phi _p^{(j)}\\ &{} {\varvec{I}}_{p-1} &{} &{} \varvec{ 0} \end{array} \right] , \quad \underset{(p \times 1)}{\varvec{\varepsilon }_t}=\begin{bmatrix}\varepsilon _t \\ \varvec{ 0}\end{bmatrix}\end{aligned}
(4)

with $${\varvec{ I}}$$ the identity matrix, $$\varvec{ 0}$$ the null vector, and the autoregressive order p (the assumption that the two regimes have the same autoregressive order simplifies the presentation of results and can be easily met including null coefficients in the model (see Sect. 4.2)). Without loss of generality, we can suppose that $$d=1$$ and $$r_1=0$$ (these assumptions will be further discussed in Sect. 4.1); then, the process (3) can be written as

\begin{aligned} \mathbf{X}_t=\mathbf {A}_{I_{t-1}}\mathbf{X}_{t-1}+ \varvec{\varepsilon }_t, \end{aligned}
(5)

where

\begin{aligned} \mathbf {A}_{I_{t-1}}=\left\{ \begin{array}{ll} \varvec{\Phi }_1&{}\quad \text{ if } \quad X_{t-1}\le 0\\ \varvec{\Phi }_2&{}\quad \text{ if } \quad X_{t-1}> 0.\\ \end{array} \right. \end{aligned}
(6)

Note that all results given in the following are based on the assumptions of strict stationarity (refereed to as stationarity in the next pages) and ergodicity of the process. In the threshold domain the milestone in this framework is given by Petruccelli and Woolford (1984), who state a set of sufficient and necessary conditions for the stationarity and ergodicity of a SETAR(2;1) model; sufficient conditions for the more general SETAR(2;p) process, are given in Chan and Tong (1985) and Bec et al. (2004).

There are at least two difficulties inherent in obtaining the linear approximation of the SETAR model; these make this model different from other nonlinear models, such as Markov Switching structures (Hamilton 1989). First, the SETAR process (5) has stochastic coefficients that depend on the process $$X_t$$ itself: it implies, among others things, that it is not easy to obtain its moments (whose sufficient conditions for their existence are given in Lemma 2 in the Appendix). Second, the SETAR process does not fulfil many regularity conditions that can assist in obtaining the linear approximation. (For example, the first derivative of $$X_t$$, carried out to derive the coefficients of the first-order Volterra expansion (see Priestley 1988, p. 26), may not exist when the skeleton of the process assumes a value equal to the threshold value.)

Starting from these two points, to derive a linear approximation of a SETAR(2; p) process, we need to introduce an additional notation that simplifies the presentation.

Let $$X_t=\mathbf {e}_1^T\mathbf {X}_t$$ be the SETAR(2;p) process (5), with $$\mathbf {e}_1$$ a $$(p\times 1)$$ vector with 1 as its first element and all remaining $$(p-1)$$ elements are zero, whereas $$\mathbf{A}^\top$$ is the transpose of $$\mathbf{A}$$. Further, let

\begin{aligned} \varvec{\pi }=\left( \begin{array}{c}\pi \\ (1-\pi ) \end{array} \right) \otimes \mathbf {I}_p \qquad \text{ and } \qquad \mathbf {P}=\left( \begin{array}{cc} \pi _{11}&{}\pi _{12}\\ \pi _{21}&{}\pi _{22} \end{array} \right) \end{aligned}
(7)

be two matrices where $$\varvec{\pi }$$ is a $$(2p\times p)$$ matrix, with $$\pi =Pr(X_t\le 0)$$, $$\mathbf{I}_p$$ is an identity matrix of order p and $$\otimes$$ is the Kronecker product, whereas $$\mathbf{P}$$ is a $$(2\times 2)$$ matrix, with $$\pi _{ij}=Pr(X_t\in \mathbb {R}_j|X_{t-1}\in \mathbb {R}_i)$$, for $$i,j=1,2$$ and $$\mathbb {R}_1=(-\infty ,0]$$, $$\mathbb {R}_2=(0,\infty )$$. Moreover, consider the following matrix:

\begin{aligned} \mathbf {K}=\left( \begin{array}{cc} \varvec{\Phi }_1&{}\mathbf {0}\\ \mathbf {0}&{}\varvec{\Phi }_2 \end{array} \right) \left( \mathbf{P}\otimes \mathbf {I}_p\right) , \end{aligned}
(8)

with $$\varvec{\Phi }_i$$, for $$i=1,2$$, defined in (4).

Given the previous notation, we can now introduce the linear approximation of $$X_t$$. Let $$X_t(1)$$ be the best (in $$L^2$$ norm) one-step-ahead predictor for the SETAR(2;p) process, $$X_t(1)=E[X_{t}|\mathcal I_{t-1}]$$; we can now state the following results.

### Proposition 1

Let $$X_t$$, $$t\in \mathbb Z$$, be a stationary and ergodic SETAR(2;p) process with $$E(\varepsilon _t^2)<\infty$$ and $$\gamma _2<1$$ (with $$\gamma _2$$ given in Lemma 2 in the Appendix); then, given the linear process $$X_t^L$$ as in (1), we have the following results in $$L^2$$ norm:

1. (i)

$$\arg \min _{\{g_j\}}||X_t-X_t^L|\mathcal {I}_{t-1}||_{L^2}^2=g_j(\mathcal {I}_{t-1})=\mathbf {e}_1^T\prod _{s=0}^{j-1}\mathbf {A}_{I_{t-1-s}}\mathbf {e}_1,\forall j\ge 1$$, -

that is, the minimizer - say, $$X_{t|t-1}^L$$, is given by $$X_{t|t-1}^L=\sum _{j=1}^{\infty }g_j(\mathcal {I}_{t-1})\varepsilon _{t-j}$$;

2. (ii)

$$X_t(1)=\sum _{j=1}^{\infty }g_j(\mathcal {I}_{t-1})\varepsilon _{t-j}$$,

where $$\mathcal {I}_{t-1}=\{X_{t-1},X_{t-2},\ldots \}$$ and the subscript $$t-1$$ in $$X_{t|t-1}^L$$ is due to the conditional set $$\mathcal {I}_{t-1}$$.

### Proof

See the Appendix.

The result of Proposition 1 is twofold. First it gives, for the SETAR(2; p) model, a representation of the optimal one-step-ahead predictor, $$X_t(1)$$, which is a generalized linear process with coefficients $$g_j(\mathcal I_{t-1})$$ that relate to the process itself. Second, this optimal predictor corresponds to $$X_{t|t-1}^L$$ when the minimization is done with respect to the linear process $$X_t^L$$.

Now we need to emphasize the following key remark.

### Remark 1

Proposition 1 gives a representation of the best one-step-ahead predictor for the SETAR(2;p) process. However, the quantities $$g_j(\mathcal {I}_{t-1})$$ are not easly managed because they are nonlinear functions of the observations. An easy way to derive a linear process, as defined in (1), is to consider the expectation of $$g_j(\mathcal {I}_{t-1})$$ - that is

\begin{aligned} g_0=1,\qquad g_j^{(2)}=E\left( g_j(\mathcal {I}_{t-1})\right) =E\left( \mathbf {e}_1^T\prod _{s=0}^{j-1}\mathbf {A}_{I_{t-1-s}}\mathbf {e}_1\right) , \quad j\ge 1. \end{aligned}
(9)

If we suppose that $$\gamma _1<1$$ (see Lemma 2) and use the same arguments as in the proof of Lemma 2, we have that $$g_j^{(2)}=O(\gamma _1^j)$$, $$\forall j\ge 1$$. So, it follows that $$\sum _{j=0}^{\infty }|g_j^{(2)}|<\infty$$. Then, we have the following linear approximation of the SETAR(2;p) process

\begin{aligned} X_t^{(2)}=\varepsilon _t+\sum _{j=1}^{\infty }g_j^{(2)}\varepsilon _{t-j}. \end{aligned}
(10)

$$\square$$

Further note that Proposition 1 gives the best (in $$L^2$$ norm) one-step-ahead predictor for the SETAR(2;p) process, whereas the linear process in (10) is a projection of the SETAR(2;p) predictor in the space of the linear predictors. Many features of this linear process will be analysed in the following; the extension to the SETAR(2;$$p_1, p_2$$) case, with $$p_1\ne p_2$$, is discussed in Sect. 4.2.

First of all, we provide an example to illustrate the linear approximation in (10).

### Example 1

Let $$X_t$$ be a stationary and ergodic SETAR(2;1) process given by

\begin{aligned} X_t=\phi _1X_{t-1}\mathbb {I}(X_{t-1}\le 0)+\phi _2X_{t-1}(1-\mathbb {I}(X_{t-1}\le 0))+\varepsilon _t. \end{aligned}

To simplify the computations, we suppose that the transition matrix $$\mathbf {P}=\left( \begin{array}{ll}0 &{} 1\\ 1 &{} 0\end{array}\right)$$, which implies that $$\pi =1/2$$ and that the two parameters $$\phi _1$$ and $$\phi _2$$ are negative. Moreover, the matrix $$\mathbf {K}$$, defined in (8), is

\begin{aligned} \mathbf {K}=\left( \begin{array}{ll}\phi _1&{} 0\\ 0&{}\phi _2\end{array}\right) \mathbf {P}=\left( \begin{array}{ll}0&{}\phi _1\\ \phi _2&{}0\end{array}\right) . \end{aligned}

By (9) and (27), we have that

\begin{aligned} g_1^{(2)}=\frac{1}{2}(1;1)\left( \begin{array}{l} \phi _1\\ \phi _2\end{array}\right) \qquad \text{ and } \qquad g_2^{(2)}=\frac{1}{2}(1;1)\mathbf {K}\left( \begin{array}{l} \phi _1\\ \phi _2\end{array}\right) =\phi _1\phi _2. \end{aligned}

Set $$j=2u$$ and $$j=2u-1$$ for even and odd cases of j, respectively. Since

\begin{aligned} \mathbf {K}^{2u}=\left( \phi _1\phi _2\right) ^u \mathbf {I}_2\qquad \text{ and } \qquad \mathbf {K}^{2u-1}=\left( \phi _1\phi _2\right) ^{u-1}\mathbf {K}, \end{aligned}

it follows that

\begin{aligned} g_{2u}^{(2)}=\left( \phi _1\phi _2\right) ^u \quad \text{ and } \quad g_{2u-1}^{(2)}=\left( \phi _1\phi _2\right) ^{u-1}g_1^{(2)} \qquad \forall u\ge 1. \end{aligned}

Finally, note that the coefficients $$g_j^{(2)}$$ derived through Proposition 1 and Remark 1 exhibit the same ergodic conditions given in Petruccelli and Woolford (1984). $$\blacksquare$$

The next theorem shows a more precise characterization of the linear approximation (10) of the SETAR(2;p) process.

### Theorem 1

Let $$X_t$$, $$t\in \mathbb Z$$, be a stationary and ergodic SETAR(2;p) process with $$E(\varepsilon _t^2)<\infty$$ and $$\gamma _2<1$$; then the linear approximation (10) is the ARMA(2p, 2p) process.

### Proof

See the Appendix.

What is stated in Theorem 1 allows us to make some interesting remarks.

### Remark 2

The assumptions in Theorem 1, $$E(\varepsilon _t^2)<\infty$$ and $$\gamma _2<1$$, are sufficient to have a finite second moment, $$E(X_t^2)<\infty$$ (see Lemma 2 in the Appendix). Further, $$\gamma _r$$ given by $$\gamma _r=\pi \Vert \varvec{\Phi }_1\Vert ^r+(1-\pi )\Vert \varvec{\Phi }_2\Vert ^r$$ is always less than one when the two regimes are also stationary, with $$\Vert \varvec{\Phi }_1\Vert <1$$ and $$\Vert \varvec{\Phi }_2\Vert <1$$; in the other cases, however, this condition needs to be verified case by case.$$\square$$

### Remark 3

Theorem 1 provides the linear approximation of the SETAR(2;p) process, given by the ARMA(2p,2p) model. Looking at the proof of Theorem 1 in the Appendix (Eq. (30)) we have the exact expression of the linear process $$X_t^{(2)}$$ in (10) that even allows to theoretically derive the coefficients of the ARMA(2p,2p) model. For simplicity, suppose that the matrix $$\mathbf{K}$$, in (8), has different eigenvalues - say $$\lambda _1,\ldots ,\lambda _{2p}$$. By (30) and after some easy but long algebra, we have two polynomials of degree 2p related to the autoregressive and moving average component of $$X_t^{(2)}$$ whose coefficients are given respectively by:

\begin{aligned} a_i= & {} (-1)^{i-1}\sum _{|\mathbf{v}|=i}\prod _{u=1}^{2p}\lambda _{u}^{v_u},\qquad i=1,\ldots , 2p,\\ b_j= & {} -a_j+(-1)^{j-1}\sum _{k=1}^{2p}\left( c_k \sum _{|\mathbf{v}_{-k}|=j-1} \prod _{\begin{array}{c} u=1 \\ u\ne k \end{array}}^{2p}\lambda _{u}^{v_u}\right) ,\qquad j=1,\ldots , 2p,\nonumber \end{aligned}
(11)

with $$|\mathbf{v}|=\sum _{u=1}^{2p}v_u$$, $$|\mathbf{v}_{-k}|=\sum _{\begin{array}{c} u=1 \\ u\ne k \end{array}}^{2p}v_u$$ for $$v_u\in \{0;1\}$$ and where the sums in (11) are made with respect to all different i and $$j-1$$ groups of ones in the vectors $$\mathbf{v}$$ and $$\mathbf{v}_{-k}$$, respectively. $$\square$$

To emphasize the issues in Theorem 1 and illustrate how the coefficients (11) can be used, consider the following example.

### Example 2

Let $$X_t$$ be a stationary and ergodic SETAR(2;1) process with the matrices of the coefficients $$\varvec{\Phi }_1=\{\phi _1\}$$ and $$\varvec{\Phi }_2=\{\phi _2\}$$.

The matrix $$\mathbf {K}$$ in (8) becomes

\begin{aligned} \mathbf {K}=\left( \begin{array}{cc} \phi _1&{}0\\ 0&{}\phi _2 \end{array} \right) \left( \begin{array}{cc} \pi _{11}&{}\pi _{12}\\ \pi _{21}&{}\pi _{22} \end{array} \right) . \end{aligned}

For simplicity, suppose that the matrix $$\mathbf {K}$$ has two different eigenvalues - say $$\lambda _1$$ and $$\lambda _2$$. Write $$\mathbf {K}=\varvec{\Gamma }\varvec{\Delta }\varvec{\Gamma }^{-1}$$ and set $$\mathbf {P}_1^T=\varvec{\pi }^T\varvec{\Gamma }$$ and $$\mathbf {P}_2=\varvec{\Gamma }^{-1}\left( \begin{array}{c}\phi _1\\ \phi _2\end{array} \right)$$ as in the proof of Theorem 1. Let $$c_k$$, $$k=1,2$$, be the component-wise product of the elements in the vectors $$\mathbf {P}_1$$ and $$\mathbf {P}_2$$. By Theorem 1, we have that the linear approximation of $$X_t$$ is given by an ARMA(2,2) model - that is,

\begin{aligned} X_t^{(2)}-a_1X_{t-1}^{(2)}-a_2X_{t-2}^{(2)}=\varepsilon _{t}+b_1\varepsilon _{t-1}+b_2\varepsilon _{t-2}, \end{aligned}
(12)

where $$a_1=\lambda _1+\lambda _2$$, $$a_2=-\lambda _1\lambda _2$$, $$b_1=-a_1+c_1+c_2$$, and $$b_2=-(a_2+c_1\lambda _2+c_2\lambda _1)$$ by using (11) in Remark 3. $$\qquad \blacksquare$$

Furthermore note that from Remark 1 it follows that the coefficients of the approximation (10) are based on the expectation (9) that makes $$X_t^{(2)}$$ a suboptimal linear approximation, in the sense that it is not the best linear one-step-ahead predictor of $$X_t$$ (see Proposition 1). Moreover, it has the great advantage of establishing a clear correspondence between the ARMA and SETAR orders, as stated in Theorem 1. This correspondence is the key element that overshadows the need to evaluate how $$X_t^{(2)}$$ approximates $$X_t$$, which becomes only a theoretical issue lacking any empirical application in identifying the SETAR model (as discussed in Sect. 3).

Another key remark needs to be introduced to discuss the order of the ARMA approximation.

### Remark 4

The order 2p of the ARMA approximation is the maximum value of the AR and MA components, and this correspondence between the order of the ARMA approximation and the order of the autoregressive regimes of the SETAR process could be used to estimate the autoregressive order of the threshold process (usually left to information criteria).$$\square$$

To emphasize this last remark and to give empirical evidence that 2p is the maximum order of the ARMA approximation (where 2p could be greater than the “true” ARMA order), consider the following example.

### Example 3

Let $$X_t\sim$$SETAR(2;1) with coefficients $$\phi _1^{(1)}=\phi _1^{(2)}=\phi _1$$, with $$|\phi _1|<1$$. It is easy to note that this last equality implies the degeneration of the SETAR(2;1) model to an AR(1) structure, and that this degeneration needs to be found even in (10). This can be verified because, in this case, the transpose of the vector $$\varvec{\pi }$$ in (7) becomes (1, 0) (or (0, 1)), whereas $$\mathbf{P}=\mathbf{I}_2$$ and then $$\mathbf{K}=\phi _1\mathbf{I}_2$$. From (27) it follows that $$g_j^{(2)}=\phi _1^j$$ for $$j=1,2,\ldots ,$$ and so:

\begin{aligned} X_t^{(2)}=\varepsilon _t+\sum _{j=1}^\infty \phi _1^j\varepsilon _{t-j}, \end{aligned}

which is the MA($$\infty$$) representation of the AR(1) process.$$\qquad \blacksquare$$

The result given in Example 3 leads to an important evaluation: when $$X_t$$ is a linear autoregressive process then $$X_t^{(2)}$$ is identically equal to $$X_t$$ and is no longer an approximation.

Another important consequence of the result of Theorem 1 and Remark 4 is the fact that we can build a SETAR(2;p) process whose linear approximation is null. This is important for two reasons. First we can subtract from $$X_t$$ the linear approximation, and then we can mainly focus on the structure of the residuals. Second, by a proper parametrization, we can find a SETAR(2;p) process whose linear approximation (10) is null.

This last evaluation is summarized in the following Corollary.

### Corollary 1

Let $$X_t$$, $$t\in \mathbb Z$$, be a stationary and ergodic SETAR(2;p) process. Under the assumptions of Theorem 1, there exists a SETAR(2;p) process such that its linear approximation (10) is null - that is $$g_j^{(2)}=0$$, $$\forall j\ge 1$$.

### Proof

See the Appendix.

To illustrate the result of this Corollary consider the following example.

### Example 4

Let $$X_t$$ be an ergodic SETAR(2;1) process. As shown in the proof of Corollary 1, a sufficient condition to obtain a SETAR(2; p) process with null linear approximation, is given by $$\phi _1=-\phi _2$$, with $$\pi _{11}=\pi _{22}=0.5$$. To give empirical evidence of this result, we generated $$n=200$$ artificial data from a SETAR(2;1) model with autoregressive coefficients $$\phi _1=-0.58$$ and $$\phi _2=0.58$$. The correlogram on the left side of Fig. 1 clearly shows the absence of the linear component whereas the correlogram of $$X^2_t$$ on the right side, gives evidence of nonlinearity in the generating process. $$\qquad \blacksquare$$

Corollary 1 also plays an important role in the order estimation presented in the next section, because if we have a SETAR(2;p) model with $$p>0$$, it can happen that the linear approximation in Theorem 1 is a white noise, and then the 2p order of the ARMA process will not match that of the SETAR(2;p) model (as discussed in Remark 4). This implies that in this case, the results of Theorem 1 cannot be used in the order estimation. To manage this problem and broadly use the results of Theorem 1 to estimate the autoregressive order of the threshold model, some additional conditions need to be added (see Theorem 2).

In the following Remark we give a bound for the residual part, coming out from the proposed linear approximation, $$X_t^{(2)}$$, in $$L^2$$ norm.

### Remark 5

First, for simplicity, suppose that $$X_t^{(2)}=\varepsilon _t$$, that is the linear approximation is null. Otherwise, we can always consider the process $$X_t-X_t^{(2)}$$. So, we have to evaluate the following quantity

\begin{aligned} \left\| X_t-\varepsilon _t\right\| ^2_{L^2}. \end{aligned}
(13)

By using the representation of the SETAR process as in Lemma 1, if $$\gamma _4<1$$ the main upper bound (in the sense that we take into account the series of the squared terms) of (13) is

\begin{aligned} C\mathbf {e}_1^T\varvec{\pi }^T\left( \mathbf {I}_{2p}-\mathbf {K}_2\right) ^{-1}\left( \begin{array}{c} \varvec{\Phi }_1^2\\ \varvec{\Phi }_2^2 \end{array} \right) \mathbf {e}_1, \end{aligned}

for some positive constant C, $$\mathbf {K}_2=\left( \begin{array}{cc} \varvec{\Phi }_1^2&{}\mathbf {0}\\ \mathbf {0}&{}\varvec{\Phi }_2^2 \end{array} \right) \left( \mathbf{P}\otimes \mathbf {I}_p\right)$$ whereas the other quantities correspond to those defined in (8). This result follows by using the same arguments as in the proof of Theorem 1. $$\qquad \blacksquare$$

Then, by focusing on the proposed linear approximation, the threshold autoregressive order can be consistently estimated, thus restricting attention to the linear time series domain. As will be discussed in Sect. 3, this has remarkable advantages not only from a theoretical point of view but even empirically, because the computational burden of the order estimation procedure of a nonlinear time series model is restricted to its linear (and less complex) approximation.

## 3 Order estimation

Identify the time series models is a crucial step in the ‘iterative stages in the selection of a model’ (Box and Jenkins 1976) that needs to preserve the parsimony of the model and makes a heavy impact on the computational effort needed to estimate the parameters.

In the linear domain, and in particular in the ARMA context, the identification has well-established results based on the relation between the ARMA parameters and the total/partial autocorrelation (at different lags) of the generating process.

As expected, these results - which relate strictly to the linear dependence - cannot be extended to the nonlinear domain. The complexity of the nonlinear time series structures has led to the model selection and identification largely discussed in the literature, which often focuses on information criteria and their performance (see Psaradakis et al. 2009; Emiliano et al. 2014; Rinke and Sibbertsen 2016). If we focus on order estimation in the SETAR domain, we find it looks different: Kapetanios (2001) proposes a consistent information criterion by which to estimate the lag order of the autoregressive regimes; Wong and Li (1998) introduce a correction to the Akaike Information Criterion (AIC; Akaike 1974) to correct its bias in the presence of SETAR models; and De Gooijer (2001) proposes cross-validation criteria to select the autoregressive order of these nonlinear structures. Galeano and Peña (2007), modifying the model selection criteria introduced by Hurvich et al. (1990), propose the inclusion of a determinant term related to the estimated parameters of each regime; furthermore, more recently, Fenga and Politis (2015) evaluated a bootstrapped version of the AIC in the SETAR domain, and defined the procedure step by step.

The results given in Sect. 2 could be used in this context to introduce a new approach to estimating the autoregressive order of the threshold regimes. For simplicity, assume that both regimes have the same order. (As noted before, this is not a limitation because zeroes could be included in the vector of parameters.) The proposed procedure is based on the linear approximation of the SETAR model as given in Theorem 1. We call this procedure Linear AIC (L-AIC) and it is based on the two main steps that are discussed below and summarized with the pseudo-code in Algorithm 1.

Let $$X_t$$ be a stationary SETAR(2; p) process. The first step starts by fixing the maximum order for p ($$p_{max}$$) and then for each $$p=1,2,\ldots , p_{max}$$:

1. (1.a)

estimate the parameters of the SETAR(2; p) model;

2. (1.b)

compute the eigenvalues of the matrix $$\mathbf{K}$$ in (8)

3. (1.c)

compute the estimates of the autoregressive parameters - say $$\hat{a}_{j}$$ - for $$j=1,2,\ldots , 2p$$, of the linear approximation

\begin{aligned} X_t^{(2)}=a_{1} X_{t-1}^{(2)}+\ldots +a_{2p} X_{t-2p}^{(2)}+\varepsilon _t, \end{aligned}
(14)

using the results in (11).

Given the $$p_{max}$$-estimated SETAR models, the second step is based on a parametric bootstrap approach with B bootstrap replications. In particular, for $$b=1,\ldots B$$:

1. (2.a)

generate the bootstrap independent innovations $$\{\varepsilon _t^*\}$$, $$t=1,\ldots ,n+2p_{max}$$, from a random variable with mean 0, variance $$\sigma ^2$$, and with n the time series length;

for $$i=1,\ldots ,p_{max}$$ run the steps 2.b) and 2.c):

1. (2.b)

generate the artificial time series from the AR(2i) models by using the innovations $$\{\varepsilon _t^*\}$$ in 2.a) and the coefficients estimated in 1.c). Then we have $$X_t^{*(2)}$$ as

\begin{aligned} X_t^{*(2)}=\hat{a}_{1} X_{t-1}^{*(2)}+\ldots +\hat{a}_{2i} X_{t-2i}^{*(2)}+\varepsilon _t^*; \end{aligned}
(15)
2. (2.c)

for each artificial time series i, select the autoregressive order $$\hat{p}_{b}^{(i)}$$ such that $$\hat{p}_{b}^{(i)}=\arg \underset{ j\in \{1,\ldots ,2i\}}{\min }AIC(j)$$, with j the order of the autoregressive process fitted to $$X_t^{*(2)}$$, whose coefficients are obtained through the Yule-Walker estimators;

3. (2.d)

using the $$\hat{p}_b^{(i)}$$ obtained from steps 2.b) and 2.c), estimate the autoregressive order of the SETAR($$2;\hat{p}_b$$) model such that:

\begin{aligned} \hat{p}_b=\max \{i: |\hat{p}_{b}^{(i)}-\hat{p}_{b}^{(i+1)}|\ne 0\}, \quad \text {for } i=1,2,\ldots , p_{max}-1. \end{aligned}

Given the B selected orders (one for each bootstrap replication), the SETAR model has autoregressive order $$\hat{p}$$ such that

\begin{aligned} \hat{p}=\arg \underset{\hat{p}_b\in \{1,\ldots ,p_{max}\}}{\max }\{\# \hat{p}_b\}, \end{aligned}
(16)

with $$\# \hat{p}_b$$ being the empirical frequency of $$\hat{p}_b$$.

Note that: in step 1.b), given the ergodicity of the process $$X_t$$, the probabilities included in the matrix $$\mathbf{P}$$ of (8) can be consistently estimated using the empirical frequencies $$n_{ij}=\sum _{t=2}^n\mathbb {I}\left( X_t\in \mathbb {R}_i|X_{t-1}\in \mathbb {R}_j\right)$$, for $$i,j=1,2$$, such that $$\hat{\pi }_{ij}=n_{ij}/\sum _{j=1}^2n_{ij}$$ (see among others Anderson and Goodman 1957); in step 1.c)) and all the elements included in the second step of the procedure, only the linear AR model is involved; this is because, given the results of Theorem 1 and Remark 3, the AR and MA components have the same order 2p. Moreover, the AR part of the ARMA approximation shares all the elements used in order estimation. Therefore, to simplify the procedure, we can only choose the AR component. (The extension to SETAR models with different autoregressive order is discussed in Sect. 4.2.)

Further, from the computational point of view, the use of theoretical issues related to linear models (instead of to nonlinear ones) in the second step, makes the algorithm quite fast.

This procedure has two important aspects. First, the linear approximation process, $$X_t^{(2)}$$, is not observable, and so it is a sort of latent process. For this reason, in step 2.a) we generate for each p a sequence of i.i.d. innovations (for example, from the standard Gaussian distribution) and we estimate the parameters of the linear approximation by using (11). In this way, we build the process in (15) that is always a well-defined AR(2i) for each i; this is the key point for using the selection rule in step 2.d).

Moreover, our procedure is motivated by two main considerations. First, when we estimate a linear model in the SETAR(2;p) data generating process, the residuals are not a sequence of i.i.d. random variables, because they also capture the nonlinear structure of the SETAR process. Second, even if we could estimate the parameters of the linear approximation by using the results in Francq and Zakoïan (1998), it is not easy to derive the variance-covariance matrix of the estimators, in which case, the results (11) should be preferred.

Before stating the next results about the consistency of the L-AIC procedure, we need to report some regularity assumptions that refer to the Assumptions of Theorem 1 of Chan (1993).

Assumptions (H1). Let $$X_t$$ be a nondegenerate SETAR(2;p) process where the coefficients of the two regimes are not equal and $$0<\pi <1$$. Moreover,

1. (1)

the process $$X_t$$ is stationary and ergodic (see Chan and Tong 1985; Bec et al. 2004);

2. (2)

there exists the univariate density function of $$X_t$$ with respect to its invariant distribution, which is positive everywhere. $$\square$$

Finally, starting with the AIC behaviour in Shibata (1976), we define an asymptotically type-AIC consistent procedure if the asymptotic distribution of the chosen order is

\begin{aligned}&\lim _{n\rightarrow \infty }Pr(\hat{p}=p)=0\qquad \text{ if } p<p_0;\\ \lim _{n\rightarrow \infty }Pr(\hat{p}=p)>0\qquad \text{ if } p\ge p_0; \end{aligned}

with $$p_0$$ the true order and $$\hat{p}$$ the estimator of the autoregressive order that depends on the series length n.

### Theorem 2

Let $$X_t$$, $$t\in \mathbb {Z}$$, be a stationary and ergodic SETAR(2;$$p_0$$), defined in (2). Under Assumptions (H1) and if $$\pi _{11}\ne \pi$$, $$E(\varepsilon _t^2)<\infty$$ and $$\gamma _2<1$$, then $$\hat{p}$$ in (16) is type-AIC consistent.

### Proof

See the Appendix.

Note that the assumption $$\pi _{11}\ne \pi$$ in Theorem 2 is relevant because it guarantees that the autoregressive process in Theorem 1 is exactly of order 2p (see Remark 4) and then what stated in Corollary 1 is prevented.

The proposed procedure and its consistency were empirically evaluated in a Monte Carlo study (see Sect. 5) that considers various SETAR models characterized by various degrees of complexity and nonlinearity.

## 4 Extensions

In this section we examine the proposed linear approximation of the SETAR process and evaluate three different aspects. First, we show the role of the threshold and delay parameters in this linear approximation. Second, we study the case where regimes have different orders. Third, we generalize the result of Theorem 1 when the number of regimes is more than two.

### 4.1 Threshold and delay parameters

To write the linear approximation for any values of the threshold and delay parameters, we consider the results in Sect. 2 presented for the case of two regimes with the same order. For simplicity, we consider only the AR part of the linear approximation. So, the matrix K in (8) becomes

\begin{aligned} \mathbf {K}_{r,d}=\left( \begin{array}{cc} \varvec{\Phi }_1&{}\mathbf {0}\\ \mathbf {0}&{}\varvec{\Phi }_2 \end{array} \right) \left( \mathbf{P}_{r,d}\otimes \mathbf {I}_p\right) , \end{aligned}

where the threshold and delay parameters are r and d, respectively. The matrix $$\mathbf {P}_{r,d}$$ is defined as

\begin{aligned} \mathbf {P}_{r,d}=\left( \begin{array}{cc} \pi _{11,r}^{(d)} &{} \pi _{12,r}^{(d)}\\ \pi _{21,r}^{(d)} &{} \pi _{22,r}^{(d)} \end{array} \right) , \end{aligned}

with $$\pi _{ij,r}^{(d)}=Pr\left( X_t\in \mathbb {R}_j|X_{t-d}\in \mathbb {R}_i\right)$$, $$i,j=1,2$$ and $$\mathbb {R}_1=(-\infty ,r]$$, $$\mathbb {R}_2=(r,+\infty )$$. All the other quantities are the same as in (8).

First, note that the delay parameter has the role of the order for the transition probabilities in $$\mathbf {P}_{r,d}$$. It follows that $$\mathbf {P}_{r,d}=\left( \mathbf {P}_{r,1}\right) ^d$$. Then, the matrix $$\mathbf {K}_{r,d}$$ can be written as

\begin{aligned} \mathbf {K}_{r,d}=\left( \begin{array}{cc} \varvec{\Phi }_1&{}\mathbf {0}\\ \mathbf {0}&{}\varvec{\Phi }_2 \end{array} \right) \left( \left( \mathbf{P}_{r,1}\right) ^d\otimes \mathbf {I}_p\right) . \end{aligned}

Now, looking at the form of the matrix $$\mathbf {K}_{r,d}$$, we can note that:

1. 1.

the parameters r and d involve only the probabilities in the matrix $$\mathbf {P}_{r,d}$$;

2. 2.

in general, assuming that the SETAR model does not degenerate into an AR process, the rank of the matrix $$\mathbf {K}_{r,d}$$ depends only on the matrices of the coefficients $$\varvec{\Phi }_1$$ and $$\varvec{\Phi }_2$$;

3. 3.

the procedure for the order estimation (proposed in Sect. 3) needs only evaluate the difference between two successive orders. Then, it mainly depends on the matrices of the coefficients and not on the matrix $$\mathbf {P}_{r,d}$$.

By using the previous considerations, we can argue that our procedure for the order estimation is independent of the threshold and delay parameters. This means that in identifying the SETAR model, we can separate the estimation of the threshold and delay parameters from the order estimation of the regimes.

### 4.2 Different orders in the regimes

Suppose that we have a SETAR model with two regimes with two different autoregressive orders - say $$p_1$$ and $$p_2$$. Moreover, suppose that the threshold and delay parameters are zero and one, respectively. Set $$p_0=\max \{p_1,p_2\}$$. Let $$p_1<p_2$$ and $$p_2=p_1+m$$. Then, the matrix K is the same as in (8), except that matrix $$\varvec{\Phi }_1$$ becomes:

\begin{aligned} {\varvec{\Phi }_1}= \left( \begin{array}{cccc|c} \phi _1^{(1)} &{} \ldots &{}\phi _{p_1}^{(1)} &{} \mathbf {0}_{m-1} &{}0\\ &{} &{} {\varvec{ I}}_{p_0-1} &{} &{} \varvec{ 0} \end{array} \right) . \end{aligned}

In this case, $$\mathbf{K}$$ is a $$2p_0$$ matrix with $$a_{2p_0}=a_{2p_0-1}=\ldots =a_{2p_0-m+1}=0$$ by (11) and using the fact that the matrix $$\mathbf{K}$$ has m eigenvalues equal to zero. This means that the AR part of the linear approximation is $$AR(p_1+p_2)$$.

Note that the transition probabilities in the matrix P are well defined in the sense that the irreducibility property of the Markov Chain with respect to the two regimes is still valid.

In this case, the procedure for the order estimation (see Sect. 3) is applied properly, changing Algorithm 1. The first step is modified by fixing a grid of candidate values for $$p_1$$ and $$p_2$$ such that $$p_1=1,\ldots , p_{1, \max }$$ and $$p_2=1,\ldots , p_{2, \max }$$. In the second step, the $$p_{1, \max }\times p_{2, \max }$$ estimated SETAR models are used to carry out the bootstrap replications, with $$p_{1, \max }+p_{2, \max }$$ innovations in row 13. A double cycle given by $$i=1,\ldots p_{1, \max }$$ and $$s=1,\ldots p_{2, \max }$$ is considered in row 14, whereas i is replaced by $$i+s$$ in row 16. All these changes imply that the maximum in row 17 becomes:

\begin{aligned}(\hat{p}_{b_1}, \hat{p}_{b_2})=\max \{i,s:|\hat{p}_{b}^{(i,s)}-\hat{p}_{b}^{(i_*,s_*)}|\not = 0\}\end{aligned}

for $$i=1,2,\ldots , p_{1,\max }-1$$, $$s=1,2,\ldots , p_{2,\max }-1$$ and with $$i_*\ge i$$, $$s_*\ge s$$, such that at least $$i_*$$ or $$s_*$$ is strictly greater than the corresponding i or s value. Finally, the computation of the empirical frequencies of $$(\hat{p}_{b_1}, \hat{p}_{b_2})$$ allows to conclude the algorithm.

### 4.3 More regimes

Suppose that we have k regimes with $$k\ge 2$$. For simplicity, assume that all regimes have the same order - say p. Moreover, suppose that the delay is $$d=1$$ and the thresholds are $$r_i$$, for $$i=1, \ldots , k-1$$, with $$-\infty<r_1<\ldots< r_{k-1}<\infty$$. Then, the quantities in (7), become

\begin{aligned} \varvec{\pi }=\left( \begin{array}{c}\pi _1\\ \vdots \\ \pi _k \end{array} \right) \otimes \mathbf {I}_p \qquad \text{ and } \qquad \mathbf {P}=\left( \begin{array}{ccc} \pi _{11}&{}\ldots &{}\pi _{1k}\\ \vdots &{} \vdots &{} \vdots \\ \pi _{k1}&{}\ldots &{} \pi _{kk} \end{array} \right) , \end{aligned}

with $$\sum _{i=1}^k\pi _i=1$$. The matrix K in (8) becomes

\begin{aligned} \mathbf {K}=\left( \begin{array}{cccc} \varvec{\Phi }_1&{}\mathbf {0} &{} \ldots &{} \mathbf {0}\\ \mathbf {0}&{}\varvec{\Phi }_2 &{} \ldots &{} \mathbf {0}\\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ \mathbf {0} &{} \mathbf {0} &{} \ldots &{} \varvec{\Phi }_k \end{array} \right) \left( \mathbf{P}\otimes \mathbf {I}_p\right) , \end{aligned}

where $$\varvec{\Phi }_i$$, $$i=1,\ldots ,k$$ are the matrices of coefficients of each regime.

Repeating the proof of Theorem 1, we get the expression in (30) as the sum of $$k\cdot p$$ quantities instead of 2p. This implies that we have as a linear approximation the ARMA($$k\cdot p,k\cdot p$$) process. Finally, in this case the procedure for the order estimation can be applied by replacing 2p with $$k\cdot p$$ in Algorithm 1.

## 5 Simulation study

To evaluate the L-AIC procedure for the order estimation of the SETAR(2; p) process, we generated time series from four data generating processes:

$$M1: \quad X_t=-0.8X_{t-1}\mathbb I_{t-1}-0.2X_{t-1}(1-\mathbb I_{t-1})+\varepsilon _t$$

$$M2: \quad X_t=0.5X_{t-1}\mathbb I_{t-1}-0.5X_{t-1}(1-\mathbb I_{t-1})+\varepsilon _t$$

$$M3: \quad X_t=(-0.4X_{t-1}-0.5X_{t-2})\mathbb I_{t-1}+(0.2X_{t-1}-0.6X_{t-2})(1-\mathbb I_{t-1})+\varepsilon _t$$

$$M4: \quad X_t=(-0.7X_{t-1}-0.2X_{t-2})\mathbb I_{t-1}+(0.3X_{t-1}-0.6X_{t-2})(1-\mathbb I_{t-1})+\varepsilon _t$$

with threshold value $$r_1=0$$, and $$\varepsilon _t\sim N(0,1)$$. For each model we considered time series of length $$n={{50, 75, 150, 200, 500}}$$ (with burn-in at 500 to discharge the effects of the initial conditions). The models M1 and M2 are those considered in Galeano and Peña (2007), where even for model M1 the condition of Theorem 2, $$\pi \ne \pi _{11}$$, is satisfied; Models M3 and M4 were chosen to guarantee the assumptions of Theorem 1 and change the percentage of observations generated from each regime. (In M3 this percentage is almost identical in the two regimes, whereas in M4 the percentage of observations generated from the first regime is, on average, less than 40%.) Further, note that not all models include the intercepts: it makes it less easy to distinguishing between the two regimes, and the performance of the order estimation procedure could be affected from it.

After fixing $$p_{max}=5$$, we estimated the SETAR(2;p) models, for $$p=1,\ldots , 5$$ using conditional least squares estimators, with $$d=1$$ and $$r_1=0$$. For each estimated model we obtained the linear autoregressive approximation $$\hat{X}_{t}^{*(2)}$$ and then we started, for each approximation, the $$B=125$$ bootstrap replicates with $$\varepsilon _t\sim N(0,1)$$. The number of Monte Carlo runs is 1000.

In the Monte Carlo study the L-AIC procedure has been compared to two other approaches.

The SETAR-AIC (Tsay 1989):

\begin{aligned} AIC(p_1, p_2)=\sum _{j=1}^2\left( T_j \ln (\hat{\sigma }^2_{\varepsilon _j})+2p_j\right) ,\end{aligned}
(17)

where $$T_j$$, $$p_j$$ and $$\hat{\sigma }^2_{\varepsilon _j}$$ are the number of observations, the candidate autoregressive order, and the residual variance of regime j, for $$j=1,2$$, respectively, whereas the second approach considered for the order estimation of the SETAR(2; p) model is the $$\beta$$AIC of Fenga and Politis (2015), which is a bootstrapped version of the SETAR-AIC.

Comparisons of the L-AIC with both the SETAR-AIC and the $$\beta$$AIC are presented in the left side of Table 1 where the relative empirical frequency of the selection of the true autoregressive orders of the SETAR(2; p) model are summarized. (The complement to one of the relative frequencies in the table is related to overparametrized models.)

It is widely known that the AIC tends to overfit models (see among others Koeher and Murphree 1988), whereas this overparametrization is penalized by the Bayesian Information Criterion (BIC), so we extended our procedure to the BIC domain giving rise to the Linear BIC (L-BIC) procedure.

For this purpose we considered steps 1.a) - 2.d) of the L-AIC procedure and replaced in step 2.c) the AIC with the BIC. Then we replicated the simulation study and compared the performance of the SETAR-BIC (Galeano and Peña 2007):

\begin{aligned} BIC(p_1, p_2)=\sum _{j=1}^2\left( T_j \ln (\hat{\sigma }^2_{\varepsilon _j})+\log (T_j)p_j\right) \end{aligned}
(18)

to that of the L-BIC procedure. (Note that Fenga and Politis 2015 do not consider the BIC.) The results of each model are presented in the right side of Table 1.

Finally note that the L-BIC procedure is made consistent by using the same arguments as in the proof of Theorem 2, together with the consistency of the BIC measure in the linear domain.

The results in Table 1 clearly show the improvement in the L-AIC procedure with respect to the SETAR-AIC and the $$\beta$$AIC approaches in all considered cases and correspondingly the good performance of the L-BIC if compared to the BIC criterion, mainly for small values of n. In particular, one can be note that when the distinction between the two regimes is more marked, as in model M2 where the autoregressive coefficients have opposite signs, there is an overall improvement in the competing order estimation approaches (mainly in the L-AIC case).

As the complexity of the data generating process grows with models M3 and M4, all procedures confirm their pertinence to estimating the autoregressive order of the nonlinear threshold processes, but even in this case the frequency with which the L-AIC procedure correctly detects is generally higher than that of the competing approaches based on the Akaike Criterion and this result is confirmed in the L-BIC case, even if with less marked differences between the L-BIC and the SETAR-BIC.

From the computational point of view, if we compare the L-AIC procedure with the Fenga and Politis (2015) criterion, we can note that, when $$r_1$$, d are known and $$p_1=p_2=p$$, the effort (measured in terms of computing time) is heavier for the $$\beta$$AIC criterion; this is because in the bootstrap iterations, for each candidate $$p=1,2,\ldots , p_{max}$$, we need to estimate a nonlinear threshold model instead of a linear AR(2p) model, as in the L-AIC case. In practice, a further advantage of the L-AIC procedure is that the computationally intensive steps of the bootstrap iterations are confined to the linear domain characterized by reduced complexity relative to the nonlinear domain.

Finally, to evaluate how the variability of the SETAR(2; p) process is explained by the linear ARMA(2p, 2p) approximation, we have compared the variance of the SETAR generating process with the variance of its linear ARMA approximation. For all models, M1, M2, M3 and M4, we have generated 1000 artificial time series and for each of them we have computed the ratio between the two variances. In Table 2 the means of these ratios and the corresponding standard deviations, are presented for $$n=\{200, 500\}$$. The variance explained by the ARMA approximation is higher for model M1 whereas this ratio is lower for model M4 that is characterized by high nonlinearity. The artificial time series have been further evaluated to investigate the dependence between $$X_t^{(2)}$$ and the generating process $$X_t$$. For this aim, in Table 2 we have considered, for each model, the mean of the correlations computed between the ARMA approximation $$X_t^{(2)}$$ and $$Y_t=X_t-X_t^{(2)}$$. The results show different correlations as the structure of the model changes.

## 6 Conclusion

It is widely known that when the behaviour of the autocorrelation function (ACF) is observed for a time series $$X_t$$ generated by a SETAR(2; p) model, we can easily confuse the ACF of the threshold process with that of a linear ARMA structure. Different motivations can be cited for this empirical evaluation: among them is the fact that the linear autoregressive model is nested in the SETAR model because it can be seen as a degeneration of the SETAR structure when all data belong to the same regime.

In this study we investigated the relation between the SETAR and ARMA models. We showed that when $$X_t\sim$$SETAR(2; p), under proper conditions, the linear approximation (10) of $$X_t$$ is $$X_t^{(2)}\sim$$ARMA(2p, 2p). We theoretically showed this result and even clarified when it could not be applied and how it can be generalized when the number of regimes is greater than two or the regimes have a different autoregressive order. Further, using linear approximation, we proposed an order estimation procedure called Linear-AIC to estimate the autoregressive order of the two regimes of $$X_t$$; the consistency of this procedure was proved and the extension of these results to the BIC domain is discussed.

The L-AIC procedure is based on two main steps: in the first step we focus on the nonlinear generating process $$X_t$$ and on its linear approximation $$X_t^{(2)}$$, whereas in the second step we estimate the autoregressive order of the SETAR model using a parametric bootstrap approach.

Our Monte Carlo study gives evidence of the performance of the L-AIC procedure and of its BIC extension (called L-BIC). The results highlight that the L-AIC generally does a better job than both the competing SETAR-AIC in (17) and Fenga and Politis (2015) criterion (and, correspondingly, the L-BIC performs better than the SETAR-BIC criterion), even in the presence of the parametrization of the SETAR model that makes it difficult to distinguish among regimes.

The results presented herein will serve as the topic of future research that further investigates the extensions discussed in Sect. 4; future research can also study how the L-AIC and the L-BIC can be included in an overall identification procedure for SETAR models.