Abstract
The \(\ell _{1}\) trend filtering enables us to estimate a continuous piecewise linear trend of univariate time series. This filter and its variants have subsequently been applied in various fields, including astronomy, climatology, economics, electronics, environmental science, finance, and geophysics. Although the \(\ell _{1}\) trend filtering can estimate a continuous piecewise linear trend of univariate time series, it cannot estimate a common continuous piecewise linear trend of multiple time series. This paper develops a statistical procedure that enables us to estimate it, which is a multivariate extension of the \(\ell _{1}\) trend filtering. We provide an algorithm for estimating it and a clue to specify the tuning parameter of the procedure, both required for its application. We also (i) numerically illustrate how well the algorithm works, (ii) provide an empirical illustration, and (iii) introduce a generalization of our novel method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The \(\ell _{1}\) trend filtering, which was developed by Steidl et al. (2006), Steidl (2006), Kim et al. (2009), Tibshirani (2014), and Guntuboyina et al. (2020), enables us to extract a continuous piecewise linear trend of univariate time series.Footnote 1 Figure 1 illustrates a continuous piecewise linear trend. The filter and its variants have been subsequently applied in various fields, including astronomy (Politsch et al. 2020), climatology (Khodadadi and McDonald 2019), economics (Yamada and Jin 2013; Yamada and Yoon 2014; Winkelried 2016; Yamada 2017; Klein 2018), electronics (Suo et al. 2019), environmental science (Brantley et al. 2019), finance (Mitra and Rohit 2018), and geophysics (Wu et al. 2018).
The \(\ell _{1}\) trend filtering is defined by replacing the squared \(\ell _{2}\)-norm penalty of the Hodrick–Prescott (HP) (1997) filtering with the \(\ell _{1}\)-norm penalty.Footnote 2 It is notable that, even though the modification seems to be somewhat minor, the \(\ell _{1}\) trend filtering provides a continuous piecewise linear trend, whereas the HP filtering provides a smooth trend. In econometrics, such a continuous piecewise linear trend was dealt with by Perron (1989) and Rappoport and Reichlin (1989) and it reflects the idea that ‘economic events that have large permanent effects are relatively rare’ (Hamilton 1994). Thus, it is possible to say that the \(\ell _{1}\) trend filtering is a method to obtain the trend considered by Perron (1989) and Rappoport and Reichlin (1989).
Although the \(\ell _{1}\) trend filtering can estimate a continuous piecewise linear trend of univariate time series, it cannot estimate a common continuous piecewise linear trend of multiple time series. In this paper, we develop a statistical procedure that enables us to estimate it, which is a multivariate extension of the \(\ell _{1}\) trend filtering. To explain more precisely, let \(y_{i,t}\) be an observation of a univariate time series i at t, where \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\), and suppose that it has a continuous piecewise linear trend \(x_{i,t}\). As stated, the \(\ell _{1}\) trend filtering can be applied for estimating \(x_{i,t}\) from \(y_{i,t}\). In this paper, we consider the situation such that \(x_{i,t}\) can be expressed as
where \(x_{t}\) is a continuous piecewise linear trend and \(a_{i}\) is a loading coefficient. Given that (1) can be represented as
even though \(y_{1,t},\ldots ,y_{n,t}\) commonly have \(x_{t}\), their linear combination \(\beta _{1}y_{1,t}+\cdots +\beta _{n}y_{n,t}\) no longer has \(x_{t}\) if \([\beta _{1},\ldots ,\beta _{n}]'\) is a vector that belongs to the orthogonal complement of the space spanned by \([a_{1},\ldots ,a_{n}]'\). Hatanaka and Yamada (2003) referred to it as ‘co-trending.’Footnote 3
In this paper, by extending the \(\ell _{1}\) trend filtering, we develop a novel method to estimate \(x_{t}\) and \(a_{i}\) from \(y_{i,t}\). Recall that \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\), where n (resp. T) represents the number of univariate time series (resp. observations). We refer to the novel filtering method as ‘\(\ell _{1}\) common trend filtering.’ We provide an algorithm for estimating this and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also (i) numerically illustrate how well the algorithm works, (ii) provide an empirical illustration, and (iii) introduce a generalization of our novel method.
Here, we remark that (2) is not an unlikely model of trends in macroeconomic time series but has strong relevance. To explain more precisely, let \(y_{1,t},\ldots ,y_{n,t}\) be macroeconomic time series in natural logarithms and \(e_{1,t},\ldots ,e_{n,t}\) be such that
Let \(g_{i,t}=\Delta y_{i,t}(=y_{i,t}-y_{i,t-1})\). Accordingly, \(g_{i,t}\) for \(i=1,\ldots ,n\) represent the growth rates of the original time series. Then, (2) and (3) are equivalent to
with initial conditions such as \(y_{i,1}=a_{i}x_{1}+e_{i,1}\) and \(\Delta y_{i,2}=a_{i}\Delta x_{2}+\Delta e_{i,2}\), where \(b_{t}=\Delta x_{t}-\Delta x_{t-1}\) and \(v_{i,t}=\Delta e_{i,t}-\Delta e_{i,t-1}\). Recall that \(\Delta g_{i,t}\) in (4) denotes the difference of growth rates of variable i at t. Given that \(x_{t}\) in (2) is a continuous piecewise linear trend, only a few of \(b_{3},\ldots ,b_{T}\) are not equal to 0. We may regard such nonzero \(b_{t}\)s in (4) as occasional permanent shocks that shift growth rates of multiple time series simultaneously and \(a_{i}\) for \(i=1,\ldots ,n\) in (4) represent individual reaction coefficients of the time series. A typical example of such occasional permanent shocks is the oil price shock in 1973. It is natural to consider that, at the time, the growth rates of macroeconomic time series changed simultaneously with their own reaction rates.
This paper is organized as follows. Section 2 introduces the novel filtering method and provides its reduced-rank-regression (RRR) representations. Section 3 discusses a numerical computation method for \(x_{t}\) and \(a_{i}\) in (1). Section 4 provides a clue to specify the tuning parameter of the procedure required for its application. Section 5 numerically illustrates how well the novel statistical procedure works. Section 6 includes an empirical illustration. Section 7 mentions a generalization of our method. Section 8 concludes the paper.
Notations Let \({\varvec{y}}_{i}=[y_{i,1},\ldots ,y_{i,T}]'\), \({\varvec{x}}_{i}=[x_{i,1},\ldots ,x_{i,T}]'\), \({\varvec{x}}=[x_{1},\ldots ,x_{T}]'\), \({\varvec{Y}}=[{\varvec{y}}_{1},\ldots ,{\varvec{y}}_{n}]\in \mathbb {R}^{T\times n}\), \({\varvec{I}}_{m}\) is the identity matrix of order m, \({\varvec{J}}=[{\varvec{0}},{\varvec{I}}_{T-2}]\in \mathbb {R}^{(T-2)\times T}\), \({\varvec{a}}=[a_{1},\ldots ,a_{n}]'\), and \({\varvec{D}}\in \mathbb {R}^{(T-2)\times T}\) be the second order difference matrix such that \({\varvec{D}}{\varvec{x}}_{i}=[\Delta ^{2}x_{i,3},\ldots ,\Delta ^{2}x_{i,T}]'\). Explicitly, \({\varvec{D}}\) is the \((T-2)\times T\) Toeplitz matrix of which the first and last rows are \([1,-2,1,0,\ldots ,0]\) and \([0,\ldots ,0,1,-2,1]\), respectively. In addition, let
Finally, for a vector \({\varvec{\gamma }}=[\gamma _{1},\ldots ,\gamma _{m}]'\), \(\Vert {\varvec{\gamma }}\Vert _{2}^{2}={\varvec{\gamma }}'{\varvec{\gamma }}=\sum _{t=1}^{m}\gamma _{t}^{2}\), \(\Vert {\varvec{\gamma }}\Vert _{1}=\sum _{t=1}^{m}|\gamma _{t}|\), \(\Vert {\varvec{\gamma }}\Vert _{\infty }=\max \{|\gamma _{1}|,\ldots ,|\gamma _{m}|\}\), and, for a matrix \({\varvec{\Gamma }}\in \mathbb {R}^{r\times s}\) whose (i, j) entry is denoted by \(\gamma _{ij}\), \(\Vert {\varvec{\Gamma }}\Vert _{\mathrm {F}}^{2}=\sum _{i=1}^{r}\sum _{j=1}^{s}\gamma _{ij}^{2}\).
A small note (i) The null space \({\varvec{D}}\) is identical to the column space of \({\varvec{\Pi }}\) and accordingly \({\varvec{D}}{\varvec{\Pi }}={\varvec{0}}\), (ii) \({\varvec{\Psi }}\) is a right inverse of \({\varvec{D}}\), i.e., \({\varvec{D}}{\varvec{\Psi }}={\varvec{I}}_{T-2}\) (Paige and Trindade 2010), (iii) \({\mathsf {det}}({\varvec{X}})=1\) and thus \({\varvec{X}}\) is nonsingular, and (iv) given (1), we have \([{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]=[a_{1}{\varvec{x}},\ldots ,a_{n}{\varvec{x}}]={\varvec{x}}{\varvec{a}}'\in \mathbb {R}^{T\times n}\).
2 \(\ell _{1}\) Common Trend Filtering
2.1 \(\ell _{1}\) Trend Filtering
The \(\ell _{1}\) trend filtering is defined by
where \(\psi >0\) is a tuning parameter. In matrix notation, it is expressed as
2.2 \(\ell _{1}\) Common Trend Filtering
In this paper, we extend the \(\ell _{1}\) trend filtering so that we may estimate a common continuous piecewise linear trend of multiple time series, \(y_{1,t},\ldots ,y_{n,t}\). The filtering method we introduce in this paper is:
where \(\lambda >0\) is a tuning parameter. We refer to the filtering method described by (8) as \(\ell _{1}\) common trend filtering. In matrix notation, the filtering is expressed as
Furthermore, given that \(\sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}=\Vert {\varvec{Y}}-{\varvec{x}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}\) and \(\sum _{i=1}^{n}a_{i}^{2}=\Vert {\varvec{a}}\Vert _{2}^{2}\), (9) can be represented by
This is an \(\ell _{1}\)-norm penalized RRR.
2.3 Another Representation
Let \({\varvec{b}}\in \mathbb {R}^{T}\) be a column vector such as \({\varvec{x}}={\varvec{X}}{\varvec{b}}\). Given that (i) \({\varvec{X}}\) is nonsingular and (ii) \({\varvec{D}}{\varvec{X}}={\varvec{J}}\) (Paige and Trindade 2010), we obtain another RRR representation of (10):
We remark that when \({\varvec{J}}{\varvec{b}}\) is sparse, \({\varvec{x}}={\varvec{X}}{\varvec{b}}\) represents a continuous piecewise linear trend.Footnote 4 In addition, interestingly, (11) is similar to Eq. (8) in Chen and Huang (2012).Footnote 5
3 Numerical Solution
3.1 Two Key Results
Given \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), the objective function in (11) can be represented as
3.1.1 The Case Where \({\varvec{b}}\in \mathbb {R}^{T}\) is Given
Suppose that \({\varvec{b}}\in \mathbb {R}^{T}\) is given. Then, \({\varvec{X}}{\varvec{b}}(={\varvec{x}})\in \mathbb {R}^{T}\) is a known column vector. Because both \({\mathsf {tr}}({\varvec{Y}}'{\varvec{Y}})\) and \({\mathsf {tr}}({\varvec{X}}{\varvec{b}}{\varvec{b}}'{\varvec{X}}')\) in (12) do not depend on \({\varvec{a}}\), when \({\varvec{b}}\in \mathbb {R}^{T}\) is given, (11) reduces to
We remark that, given \({\varvec{x}}={\varvec{X}}{\varvec{b}}\), it follows that \({\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} =\sum _{i=1}^{n}{\varvec{y}}_{i}'(a_{i}{\varvec{x}})\), which is quite reasonable as an objective function for estimating \(a_{1},\ldots ,a_{n}\). Moreover, letting \({\varvec{\phi }}={\varvec{Y}}'{\varvec{X}}{\varvec{b}}\in \mathbb {R}^{n}\), it follows that \({\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} ={\varvec{\phi }}'{\varvec{a}}\). Therefore, instead of (13), we may consider the following constrained maximization problem:
Given \(\Vert {\varvec{a}}\Vert _{2}=1\), by the Cauchy–Schwarz inequality, we obtain \(g({\varvec{a}})={\varvec{\phi }}'{\varvec{a}}\le |{\varvec{\phi }}'{\varvec{a}}|\le \Vert {\varvec{\phi }}\Vert _{2}\Vert {\varvec{a}}\Vert _{2}=\Vert {\varvec{\phi }}\Vert _{2}\), from which we have \(g({\varvec{a}})<g(\widehat{{\varvec{a}}})\) if \({\varvec{a}}\ne \widehat{{\varvec{a}}}\), where \(\widehat{{\varvec{a}}} =({\varvec{\phi }}'{\varvec{\phi }})^{-1/2}{\varvec{\phi }} =({\varvec{b}}'{\varvec{X}}'{\varvec{Y}}{\varvec{Y}}'{\varvec{X}}{\varvec{b}})^{-1/2}{\varvec{Y}}'{\varvec{X}}{\varvec{b}}\). Consequently, given \({\varvec{b}}\in \mathbb {R}^{T}\), we have the following inequality:
if \({\varvec{a}}\ne \widehat{{\varvec{a}}}\).
3.1.2 The Case Where \({\varvec{a}}\in \mathbb {R}^{n}\) is Given
Next, suppose that \({\varvec{a}}\in \mathbb {R}^{n}\) such that \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\) is given. Then, (11) reduces to
Let \({\varvec{A}}_{\perp }\in \mathbb {R}^{n\times (n-1)}\) be a matrix of an orthonormal basis of the orthogonal complement of the space spanned by \({\varvec{a}}\). Then, given \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), \([{\varvec{a}},{\varvec{A}}_{\perp }]\in \mathbb {R}^{n\times n}\) is an orthogonal matrix. Thus, given that \( ({\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}')[{\varvec{a}},{\varvec{A}}_{\perp }] =[{\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}},{\varvec{Y}}{\varvec{A}}_{\perp }], \) it follows that \(\Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2} =\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}+\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}\). Given that \(\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}\) does not depend on \({\varvec{b}}\), (16) becomes
We remark here that (17) can be represented as
See, e.g., Eq. (9) in Kim et al. (2009).Footnote 6
As (17) is a problem whose objective function is coercive and strictly convex over \(\mathbb {R}^{T}\), it has a unique global minimizer. Thus, denoting the solution by \(\widehat{{\varvec{b}}}\), we have the following result. Given \({\varvec{a}}\in \mathbb {R}^{n}\) such that \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), we have the following inequality:
if \({\varvec{b}}\ne \widehat{{\varvec{b}}}\).
3.2 A Numerical Algorithm
Based on the above two inequalities, (15) and (19), we introduce a numerical algorithm. Given \(\widehat{{\varvec{a}}}_{1}\in \mathbb {R}^{n}\) and \(\widehat{{\varvec{b}}}_{1}\in \mathbb {R}^{T}\), for \(i\in \mathbb {N}\), we define \(\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i+1}\) by
Then, we have the following result.
Lemma 3.1
For \(i\in \mathbb {N}\), it follows that \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\).
Proof
(i) From (15) and (20), we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f({\varvec{a}},\widehat{{\varvec{b}}}_{i})\) for any \({\varvec{a}}\in \mathbb {R}^{n}\) and we thus have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\). (ii) Likewise, from (19) and (21), we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},{\varvec{b}})\) for any \({\varvec{b}}\in \mathbb {R}^{T}\) and we thus have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\). (iii) Combining these inequalities, we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i}) \le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), which leads to \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\). \(\square \)
Given Lemma 3.1, we have the following result.
Proposition 3.2
Let \(f_{i}=f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\) for \(i\in \mathbb {N}\). A sequence of real numbers, \((f_{i})_{i\in \mathbb {N}}\), has a finite limit.
Proof
From Lemma 3.1, \((f_{i})_{i\in \mathbb {N}}\) is a nonincreasing sequence. In addition, as \(f_{i}\ge 0\) for \(i\in \mathbb {N}\), it is bounded below. Consequently, it has a finite limit. \(\square \)
Proposition 3.2 implies that the objective function in (11) converges by alternatively minimizing it over \({\varvec{a}}\) and \({\varvec{b}}\). Denote \({\varvec{a}}\) and \({\varvec{b}}\) such that the objective function in (11) is converged by \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\). A Matlab user-defined function for estimating \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\) and \(\lambda \), l1_common_trend_filter, is provided in the supplementary material.
4 A Clue for Specifying the Tuning Parameter
Applying the \(\ell _{1}\) common trend filtering requires the specification of its tuning parameter. In this section, we provide a clue for specifying it.
Consider the following convex problem:
where \(c>0\) and \({\varvec{a}}\in \mathbb {R}^{n}\) is a given vector. Given that \({\varvec{X}}\) is nonsingular and \(h({\varvec{b}})\ge 0\), if \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) is feasible, i.e., \(\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\le c\), then \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) is the solution of the above convex problem. Here, recall \({\varvec{J}}{\varvec{X}}^{-1}={\varvec{D}}\). In the case, \(h({\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}})=0\). If it is not feasible, i.e., \(\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}>c\), then \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) cannot be the solution. In the case, the solution, denoted by \(\widetilde{{\varvec{b}}}\), locates at the boundary. Thus, we have \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c\).
More precisely, concerning \(\widetilde{{\varvec{b}}}\), we have the following results.
Proposition 4.1
If \(0<c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then (i) \(\widetilde{{\varvec{b}}}\) equals \(\widehat{{\varvec{b}}}\) estimated by (16)/(17) with
and (ii) \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c\) holds.
Proof
See Section A.4 in the Appendix. \(\square \)
Proposition 4.1 implies that we can obtain \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) by specifying c in (22) instead of \(\lambda \) in (17). A Matlab user-defined function for calculating \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\) and c, l1_common_trend_filter_c, is provided in the supplementary material. Here, we point out that specifying c is much easier than specifying \(\lambda \). We do not have any useful information for specifying \(\lambda \), whereas we have such an information for specifying c. As stated in Proposition 4.1, we may estimate \(\widehat{{\varvec{b}}}\) such that
We may utilize this relation for specifying c. See, e.g., (A.1). In the case,
Thus, we may specify rough range of c from the plots of first differences of multiple time series.Footnote 7
Finally, given \({\varvec{X}}{\varvec{b}}={\varvec{x}}\) and \({\varvec{J}}{\varvec{b}}={\varvec{D}}{\varvec{x}}\), we remark that the convex problem (22) may be replaced with the following convex problem:
where \(c>0\).
5 Numerical Illustrations
In this section, we numerically illustrate how well the algorithm described in the last section works. Figure 1 plots generated \({\varvec{X}}{\varvec{b}}\), where \(T=100\), \(b_{1}=10\), \(b_{2}=0.5\), and \(\sum _{t=3}^{T}|b_{t}|=2.7243\). Ten bullets in the figure depict the kink points and accordingly, 10 entries of \({\varvec{J}}{\varvec{b}}=[b_{3},\ldots ,b_{T}]'\) are not equal to 0. Using \({\varvec{X}}{\varvec{b}}\) shown in Fig. 1, we generated \({\varvec{y}}_{1}\), \({\varvec{y}}_{2}\), and \({\varvec{y}}_{3}\) by
where \({\varvec{a}}=[1,0.6,0.2]'\) and \(\mathrm {vec}({\varvec{E}})\sim \mathrm {N}({\varvec{0}},\sigma ^{2}{\varvec{I}}_{3T})\) with \(\sigma =1\). Figure 2 plots them.
For obtaining \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\), we used l1_common_trend_filter_c, which is presented in the supplementary material, with \(c=3\). This required three iterations for convergence. As a result, we obtained
and \(\widehat{{\varvec{b}}}\) such that \(\Vert {\varvec{J}}\widehat{{\varvec{b}}}\Vert _{1}=3\), the latter of which is consistent with Proposition 4.1(ii). The value of \(\lambda \) for obtaining \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\widehat{{\varvec{a}}}\) is 10.3730.
Figure 3 illustrates the results. The solid line in Fig. 3 plots the estimated \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\). The dashed line in the figure plots \(a_{1}{\varvec{X}}{\varvec{b}}\). Note that, given \(a_{1}=1\), it is identical to the solid line depicted in Fig. 1. From the figure, we can see that \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\) looks very much like \(a_{1}{\varvec{X}}{\varvec{b}}\). Figure 4 also illustrates the results. The solid lines in Fig. 4 are identical to those in Fig. 2. The dashed lines on y\(_{1}\), y\(_{2}\) and y\(_{3}\) respectively plot \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\), \(\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}\), and \(\widehat{a}_{3}{\varvec{X}}\widehat{{\varvec{b}}}\). Again, the figure shows that our novel procedure works well.
As a supplementary examination, we generated an additional data set, repeated the same analysis, and revealed similar results. For example, we obtained
See also Figures B.1–B.4 in the supplementary material.
6 An Empirical Illustration
Figure 5, which is identical to Figure 1.2 of Hatanaka and Yamada (2003), plots two quarterly macroeconomic time series. More precisely, the upper [resp. lower] panel of the figure depicts the natural logarithm of Japanese M2\(+\)CD [resp. real gross domestic product (GDP)], from the first quarter of 1980 to the third quarter of 2001. From the figure, we can observe that these two time series seem to contain a common piecewise linear trend such that a major kink point is located at around 1991, which corresponds to the peak of the Japanese asset price bubble. Actually, the statistical procedure developed by Hatanaka and Yamada (2003) detected a common piecewise linear trend. [See Section 8.5 of Hatanaka and Yamada (2003).]
Figure 6 depicts the corresponding demeaned series. We estimated a common piecewise linear trend of these demeaned data. Denote it by \({\varvec{X}}\widehat{{\varvec{b}}}\). For the estimation, we used l1_common_trend_filter_c with \(c=0.018\). We specified the value of c by reference to the plots of time series in first differences shown in Fig. 7. The upper panel (resp. lower panel) of Fig. 8 depicts \({\varvec{X}}\widehat{{\varvec{b}}}\) (resp. \(\widehat{b}_{2},\ldots ,\widehat{b}_{T}\), where \(T=87\)). From the panels, we may observe that (i) a major kink point is located at around 1991 and (ii) \(\sum _{t=3}^{T}|\widehat{b}_{t}|\) equals the value of c. The solid lines in Fig. 9 are identical to those plotted in Fig. 6. The dashed line in the upper (resp. lower) panel plots \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\) (resp. \(\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}\)), where
Finally, the dashed lines in Fig. 10 are the mean-restored estimated piecewise linear trends. The solid lines in the figure are identical to those in Fig. 5.
7 A Generalization
In this section, we mention a generalization of our method briefly. Let \({\varvec{D}}_{p}\in \mathbb {R}^{(T-p)\times T}\) be the p-th order difference matrix such that \({\varvec{D}}_{p}{\varvec{x}}_{i}=[\Delta ^{p}x_{i,p+1},\ldots ,\Delta ^{p}x_{i,T}]'\). Explicitly, \({\varvec{D}}_{p}\) is a Toeplitz matrix as follows:
where \(a_{k}=(-1)^{p-k}\left( {\begin{array}{c}p\\ k\end{array}}\right) \) for \(k=0,\ldots ,p\). Accordingly, \({\varvec{D}}_{2}\) equals \({\varvec{D}}\). Then, without any difficulty, we may extend our procedure to
where \(p\in \mathbb {N}\) and \(\lambda _{p}>0\) is a tuning parameter. We refer to (31) as ‘\(\ell _{1}\) common polynomial trend filtering.’ The solution of the problem represents a continuous piecewise \((p-1)\)-th order polynomial trend. Note that the corresponding \({\varvec{X}}_{p}=[{\varvec{\Pi }}_{p},{\varvec{\Psi }}_{p}]\) such that \({\varvec{D}}_{p}{\varvec{X}}_{p}=[{\varvec{D}}_{p}{\varvec{\Pi }}_{p},{\varvec{D}}_{p} {\varvec{\Psi }}_{p}]=[{\varvec{0}},{\varvec{I}}_{T-p}]\in \mathbb {R}^{(T-p)\times T}\) is given in Yamada (2015).
8 Concluding Remarks
In this paper, we developed an extension of the \(\ell _{1}\) trend filtering. The \(\ell _{1}\) trend filtering can estimate a continuous piecewise linear trend of univariate time series. However, it cannot estimate a common continuous piecewise linear trend of multiple time series, which the novel statistical procedure developed in this paper enables. We provided an algorithm and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also numerically illustrated how well the algorithm works, provided an empirical illustration, and introduced a generalization of our novel method.
Change history
24 November 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10614-021-10217-3
Notes
‘\(\ell _{1}\) trend filtering’ is the terminology used by Kim et al. (2009). The approach is a form of \(\ell _{1}\)-norm penalized least squares and may also be regarded as a type of generalized lasso regression (Tibshirani 1996; Kim et al. 2009; Tibshirani and Taylor 2011) and as a generalization of one-dimensional total variation denoising (Rudin et al. 1992; Steidl et al. 2006; Guntuboyina et al. 2020).
In econometrics, we have observed growing interest in the HP filtering. Recent studies about it include those by de Jong and Sakarya (2016), Cornea-Madeira (2017), Hamilton (2018), Phillips and Jin (2020), Phillips and Shi (2020), Sakarya and de Jong (2020), and Yamada (2020). HP filtering has been used to extract the cyclical component of a univariate time series. For other such methods, see, e.g., Pollock (2016) and Michaelides et al. (2018). Also, it is a type of the Whittaker–Henderson (WH) method of graduation. For a historical survey of the WH method of graduation, see, e.g., Weinert (2007) and Phillips (2010).
See Section A.1 in the Appendix.
See Section A.2 in the Appendix.
The dual problem of (18) and its implication are given in Section A.3 in the Appendix.
See Sect. 6 for an empirical illustration.
References
Bertsekas, D. P. (1999). Nonlinear programming (2nd ed.). Belmont: Athena Scientific.
Brantley, H. L., Guinness, J., & Chi, E. C. (2019). Baseline drift estimation for air quality data using quantile trend filtering, forthcoming. In Annals of Applied Statistics.
Chen, L., & Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107, 1533–1545.
Cornea-Madeira, A. (2017). The explicit formula for the Hodrick–Prescott filter in a finite sample. Review of Economics and Statistics, 99(2), 314–318.
de Jong, R. M., & Sakarya, N. (2016). The econometrics of the Hodrick–Prescott filter. Review of Economics and Statistics, 98(2), 310–317.
Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2), 251–276.
Guntuboyina, A., Lieu, D., Chatterjee, S., & Sen, B. (2020). Adaptive risk bounds in univariate total variation denoising and trend filtering. Annals of Statistics, 48(1), 205–229.
Hamilton, J. D. (1994). Time series analysis. Princeton: Princeton University Press.
Hamilton, J. D. (2018). Why you should never use the Hodrick–Prescott filter. Review of Economics and Statistics, 100, 831–843.
Hatanaka, M. (1996). Time-series-based econometrics: Unit roots and co-integrations. New York: Oxford University Press.
Hatanaka, M., & Yamada, H. (2003). Co-trending: A statistical system analysis of economic trends. Tokyo: Springer.
Hodrick, R. J., & Prescott, E. C. (1997). Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking, 29(1), 1–16.
Kim, S., Koh, K., Boyd, S., & Gorinevsky, D. (2009). \(\ell _{1}\) trend filtering. SIAM Review, 51(2), 339–360.
Klein, T. (2018). Trends and contagion in WTI and Brent crude oil spot and futures markets–the role of OPEC in the last decade. Energy Economics, 75, 636–646.
Khodadadi, A., & McDonald, D. J. (2019). Algorithms for estimating trends in global temperature volatility. (arXiv: 1805.07376v2).
Michaelides, P. G., Tsionas, E. G., Vouldis, A. T., Konstantakis, K. N., & Patrinos, P. (2018). A semi-parametric non-linear neural network filter: Theory and empirical evidence. Computational Economics, 51, 637–675.
Mitra, S., & Rohit, A. (2018). Momentum trading with the \(ell_{1}\)-filter: Are the markets efficient?, International Review of Finance.
Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). On the LASSO and its dual. Journal of Computation and Graphical Statistics, 9(2), 319–337.
Paige, R. L., & Trindade, A. A. (2010). The Hodrick–Prescott filter: A special case of penalized spline smoothing. Electronic Journal of Statistics, 4, 856–874.
Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis. Econometrica, 57(6), 1361–1401.
Phillips, P. C. B. (2010). Two New Zealand pioneer econometricians. New Zealand Economic Papers, 44(1), 1–26.
Phillips, P. C. B., & Jin, S. (2020). Business cycles, trend elimination, and the HP filter. International Economic Review, first online 02 December 2020. https://doi.org/10.1111/iere.12494.
Phillips, P. C. B., & Shi, Z. (2020). Boosting: Why you can use the HP filter, International Economic Review, first online 01 December 2020. https://doi.org/10.1111/iere.12495.
Politsch, C. A., Cisewski-Kehe, J., Croft, R. A. C., & Wasserman, L. (2020). Trend filtering - I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. Monthly Notices of the Royal Astronomical Society, 492(3), 4005–4018.
Pollock, D. S. G. (2016). Econometric filters. Computational Economics, 48, 669–691.
Rappoport, P., & Reichlin, L. (1989). Segmented trends and non-stationary time series. Economic Journal, 99, 168–177.
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60, 259–268.
Sakarya, N., & de Jong, R. M. (2020). A property of the Hodrick–Prescott filter and its application. Econometric Theory, 36(5), 840–870.
Schechter, M. (1977). A subgradient duality theorem. Journal of Mathematical Analysis and Applications, 61(3), 850–855.
Steidl, G. (2006). A note on the dual treatment of higher-order regularization functionals. Computing, 76, 135–148.
Steidl, G., Didas, S., & Neumann, J. (2006). Splines in higher order TV regularization. International Journal of Computer Vision, 70, 241–255.
Suo, C., Li, Z., Sun, Y., & Han, Y. (2019). Application of L1 trend filtering technology on the current time domain spectroscopy of dielectrics. Electronics, 8(9), 1046.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.
Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Annals of Statistics, 42(1), 285–323.
Tibshirani, R. J., & Taylor, J. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.
Weinert, H. L. (2007). Efficient computation for Whittaker–Henderson smoothing. Computational Statistics and Data Analysis, 52(2), 959–974.
Weir, T. (1988). Subgradient duality using Fritz John conditions. Journal of Information and Optimization Sciences, 9(2), 287–296.
Winkelried, D. (2016). Piecewise linear trends and cycles in primary commodity prices. Journal of International Money and Finance, 64, 196–213.
Wu, D., Yan, H., & Yuan, S. (2018). L1 regularization for detecting offsets and trend change points in GNSS time series. GPS Solutions, 22, 88.
Yamada, H. (2015). Ridge regression representations of the generalized Hodrick–Prescott filter. Journal of the Japan Statistical Society, 45(2), 121–128.
Yamada, H. (2017). Estimating the trend in US real GDP using the \(\ell _{1}\) trend filtering. Applied Economics Letters, 24(10), 713–716.
Yamada, H. (2020). A smoothing method that looks like the Hodrick–Prescott filter. Econometric Theory, 36(5), 961–981.
Yamada, H., & Jin, L. (2013). Japan’s output gap estimation and \(\ell _{1}\) trend filtering. Empirical Economics, 45(1), 81–88.
Yamada, H., & Yoon, G. (2014). When Grilli and Yang meet Prebisch and Singer: Piecewise linear trends in primary commodity prices. Journal of International Money and Finance, 42, 193–207.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors thank Eiji Kurozumi, Kazuhiko Hayakawa, Hiroaki Mukaidani, Heewon Park, Takashi Yamagata, and two anonymous referees for their valuable comments on an earlier version of this paper. The first author dedicates this research to Michio Hatanaka. The usual caveat applies. The Japan Society for the Promotion of Science supported this work through KAKENHI Grant Number 20H01484.
The original online version of this article was revised due to Retrospective Open access
Supplementary Information
Below is the link to the electronic supplementary material.
A Appendix
A Appendix
1.1 A.1 \({\varvec{x}}={\varvec{X}}{\varvec{b}}\) when \({\varvec{J}}{\varvec{b}}\) is sparse
We illustrate how \({\varvec{x}}={\varvec{X}}{\varvec{b}}\) represents a continuous piecewise linear trend when \({\varvec{J}}{\varvec{b}}=[b_{3},\ldots ,b_{T}]'\) is sparse. Consider the case where \(T=7\) and \({\varvec{b}}=[b_{1},b_{2},0,b_{4},0,b_{6},0]'\). In the case, \(x_{t}\) \((t=1,\ldots ,7)\) are expressed as follows:
Moreover, these three trends are connected at \(t=3\) and \(t=5\) as
Thus, \(x_{1},\ldots ,x_{7}\) are on the following continuous piecewise linear function of t:
where \((\zeta )_{+}=\zeta \) if \(\zeta >0\) and \((\zeta )_{+}=0\) if \(\zeta \le 0\). In this function, \((3,\alpha (3))\) and \((6,\alpha (6))\) are kink points. Accordingly, \(\{x_{1},x_{2},x_{3}\}\) are on the same straight line whose slope is \(b_{2}\), \(\{x_{3},x_{4},x_{5}\}\) are on the same straight line whose slope is \(b_{2}+b_{4}\), and \(\{x_{5},x_{6},x_{7}\}\) are on the same straight line whose slope is \(b_{2}+b_{4}+b_{6}\).
1.2 A.2 (11) and Chen and Huang (2012)
We point out a similarity between the \(\ell _{1}\)-norm penalized RRR (11) and Eq. (8) in Chen and Huang (2012):
where \({\varvec{Z}}\in \mathbb {R}^{T\times p}\) is a matrix of full column rank, \({\varvec{B}}^{(i)}\) denotes the ith row of \({\varvec{B}}\), and \(\lambda _{i}>0\) (\(i=1,\ldots ,p\)) are tuning parameters. To clarify the similarity, let \(r=1\), \(p=T\), and \(\lambda _{i}=\lambda \) for \(i=1,\ldots ,T\) in (A.3). Then, \({\varvec{A}}\in \mathbb {R}^{n}\), \({\varvec{B}}\in \mathbb {R}^{T}\), and \({\varvec{Z}}\in \mathbb {R}^{T\times T}\), and we denote them by \({\varvec{a}}\), \({\varvec{b}}\), and \({\varvec{X}}\), respectively. Given \(\Vert {\varvec{B}}^{(i)}\Vert _{2}=|b_{i}|\) in this setting, where \({\varvec{b}}=[b_{1},\ldots ,b_{T}]'\), (A.3) finally becomes
which is not identical to (11), but they are similar.
1.3 A.3 The dual problem of (18)
The dual problem of (18) is
See, e.g., Proposition 4.1 in Steidl et al. (2006) and Eq. (13) in Kim et al. (2009). The problem (A.5) is of use not only because it can be applied for solving (18) [and accordingly (17)] but also because we may see that \(\widehat{{\varvec{\nu }}}=({\varvec{D}}{\varvec{D}}')^{-1}{\varvec{D}}{\varvec{Y}}{\varvec{a}}\) is feasible if \(2\Vert ({\varvec{D}}{\varvec{D}}')^{-1}{\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{\infty }\le \lambda \) and in the case it follows that \(\widehat{{\varvec{x}}}={\varvec{Y}}{\varvec{a}}-{\varvec{D}}'\widehat{{\varvec{\nu }}}= {\varvec{\Pi }}({\varvec{\Pi }}'{\varvec{\Pi }})^{-1}{\varvec{\Pi }}'{\varvec{Y}}{\varvec{a}}\), which represents a linear trend estimated by ordinary least squares. Here, \(\widehat{{\varvec{\nu }}}\) [resp. \(\widehat{{\varvec{x}}}\)] denotes a unique global minimizer of (A.5) [resp. (18)].
1.4 A.4 Proof of Proposition 4.1
For proving Proposition 4.1, we give the following two lemmata:
Lemma A.1
There exists \(\mu \ge 0\) such that:
where \({\varvec{J}}'\widetilde{{\varvec{v}}}\) in the stationarity condition denotes a subgradient vector of \(\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}\) at \({\varvec{b}}=\widetilde{{\varvec{b}}}\). Explicitly, \(\widetilde{{\varvec{v}}}=[\widetilde{v}_{3},\ldots ,\widetilde{v}_{T}]'\) is defined by \(\widetilde{v}_{t}\in \{1\}\) if \(\widetilde{b}_{t}>0\), \(\widetilde{v}_{t}\in \{-1\}\) if \(\widetilde{b}_{t}<0\), and \(\widetilde{v}_{t}\in [-1,1]\) if \(\widetilde{b}_{t}=0\), for \(t=3,\ldots ,T\).
Proof
See Section A.5.1. \(\square \)
Lemma A.2
If \(c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then \({\varvec{X}}\widetilde{{\varvec{b}}}\ne {\varvec{Y}}{\varvec{a}}\) follows.
Proof
See Section A.5.2. \(\square \)
Now, we are ready to give a proof of Proposition 4.1. From Lemma A.2, if \(c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then \(({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})\) in the stationarity condition in Lemma A.1 is not equal to \({\varvec{0}}\). In addition, \({\varvec{X}}'\) is of full column rank. Hence, if \(c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then we have \(\mu {\varvec{J}}'\widetilde{{\varvec{v}}}=2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})\ne {\varvec{0}}\), which leads to \(\mu \ne 0\). Thus, given \(\mu \ge 0\), it follows that \(\mu >0\). From the complementary slackness condition in Lemma A.1, it follows that \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c>0\). Given that \(({\varvec{J}}\widetilde{{\varvec{b}}})'\widetilde{{\varvec{v}}}=\sum _{t=3}^{T}|\widetilde{b}_{t}| =\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}\), premultiplying the stationarity condition by \(\widetilde{{\varvec{b}}}'\) yields \(-2({\varvec{X}}\widetilde{{\varvec{b}}})'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}) +\mu \Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=0\). Then, given \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c>0\), we obtain (23). In addition, given \(\mu >0\), from the stationarity condition, it follows that \(\widetilde{{\varvec{b}}}\) equals \(\widehat{{\varvec{b}}}\) estimated with \(\lambda =\mu \).
1.5 A.5 Miscellaneous proofs
1.5.1 A.5.1 Proof of Lemma A.1
As \(\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}-c=-c<0\) if \({\varvec{b}}={\varvec{0}}\), Slater’s condition is satisfied. Then, from Schechter (1977) [and Theorem 1.2 in Weir (1988)], there exists \(\mu \ge 0\) such that
where \(\partial \Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}\Vert _{2}^{2}\) [resp. \(\partial (\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}-c)\)] in (A.6) denotes the subdifferential of \(\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}\) [resp. \((\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}-c)\)] at \({\varvec{b}}=\widetilde{{\varvec{b}}}(=[\widetilde{b}_{1},\ldots ,\widetilde{b}_{T}]')\). More precisely, (i) \(\partial \Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}\Vert _{2}^{2} =\{ -2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}) \}\), which is a singleton because \(\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}\) is a differentiable function of \({\varvec{b}}\in \mathbb {R}^{T}\), and (ii) by Proposition B.24(e) in Bertsekas (1999) and Osborne et al. (2000), \(\partial \Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}={\varvec{J}}'\widetilde{{\varvec{\xi }}}\), where \(\widetilde{{\varvec{\xi }}}=[\widetilde{\xi }_{3},\ldots ,\widetilde{\xi }_{T}]'\). Here, \(\widetilde{\xi }_{t}=\{1\}\) if \(\widetilde{b}_{t}>0\), \(\widetilde{\xi }_{t}=\{-1\}\) if \(\widetilde{b}_{t}<0\), and \(\widetilde{\xi }_{t}=[-1,1]\) if \(\widetilde{b}_{t}=0\), for \(t=3,\ldots ,T\). Accordingly, (A.6) can be represented as \(-2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})+\mu {\varvec{J}}'\widetilde{{\varvec{v}}}={\varvec{0}}\), where \(\widetilde{{\varvec{v}}}=[\widetilde{v}_{3},\ldots ,\widetilde{v}_{T}]'\) and \(\widetilde{v}_{t}\) is an element of \(\widetilde{\xi }_{t}\) for \(t=3,\ldots ,T\).
1.5.2 A.5.2 Proof of Lemma A.2
Let \({\varvec{\kappa }}=\widetilde{{\varvec{b}}}-{\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\). Given \(c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), \({\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) is infeasible. Accordingly, if \(c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then we have \({\varvec{\kappa }}\ne {\varvec{0}}\), which leads to \({\varvec{X}}{\varvec{\kappa }}={\varvec{X}}\widetilde{{\varvec{b}}}-{\varvec{Y}}{\varvec{a}}\ne {\varvec{0}}\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yamada, H., Bao, R. \(\ell _{1}\) Common Trend Filtering. Comput Econ 59, 1005–1025 (2022). https://doi.org/10.1007/s10614-021-10114-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-021-10114-9
Keywords
- \(\ell _{1}\) trend filtering
- Common trend
- Lasso regression
- Reduced rank regression
- Total variation denoising
- Hodrick–Prescott filtering