$$\ell _{1}$$ Common Trend Filtering

Yamada, Hiroshi; Bao, Ruoyi

doi:10.1007/s10614-021-10114-9

$\ell _{1}$ Common Trend Filtering

Open access
Published: 11 April 2021

Volume 59, pages 1005–1025, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Economics Aims and scope Submit manuscript

$\ell _{1}$ Common Trend Filtering

Download PDF

Hiroshi Yamada¹ &
Ruoyi Bao²

A Correction to this article was published on 24 November 2021

This article has been updated

Abstract

The $\ell _{1}$ trend filtering enables us to estimate a continuous piecewise linear trend of univariate time series. This filter and its variants have subsequently been applied in various fields, including astronomy, climatology, economics, electronics, environmental science, finance, and geophysics. Although the $\ell _{1}$ trend filtering can estimate a continuous piecewise linear trend of univariate time series, it cannot estimate a common continuous piecewise linear trend of multiple time series. This paper develops a statistical procedure that enables us to estimate it, which is a multivariate extension of the $\ell _{1}$ trend filtering. We provide an algorithm for estimating it and a clue to specify the tuning parameter of the procedure, both required for its application. We also (i) numerically illustrate how well the algorithm works, (ii) provide an empirical illustration, and (iii) introduce a generalization of our novel method.

A trend filtering method closely related to $$\ell _{1}$$ trend filtering

Article 19 December 2017

Time Series

Modelling Stochastic Processes with Time Series Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The $\ell _{1}$ trend filtering, which was developed by Steidl et al. (2006), Steidl (2006), Kim et al. (2009), Tibshirani (2014), and Guntuboyina et al. (2020), enables us to extract a continuous piecewise linear trend of univariate time series.^{Footnote 1} Figure 1 illustrates a continuous piecewise linear trend. The filter and its variants have been subsequently applied in various fields, including astronomy (Politsch et al. 2020), climatology (Khodadadi and McDonald 2019), economics (Yamada and Jin 2013; Yamada and Yoon 2014; Winkelried 2016; Yamada 2017; Klein 2018), electronics (Suo et al. 2019), environmental science (Brantley et al. 2019), finance (Mitra and Rohit 2018), and geophysics (Wu et al. 2018).

The $\ell _{1}$ trend filtering is defined by replacing the squared $\ell _{2}$-norm penalty of the Hodrick–Prescott (HP) (1997) filtering with the $\ell _{1}$-norm penalty.^{Footnote 2} It is notable that, even though the modification seems to be somewhat minor, the $\ell _{1}$ trend filtering provides a continuous piecewise linear trend, whereas the HP filtering provides a smooth trend. In econometrics, such a continuous piecewise linear trend was dealt with by Perron (1989) and Rappoport and Reichlin (1989) and it reflects the idea that ‘economic events that have large permanent effects are relatively rare’ (Hamilton 1994). Thus, it is possible to say that the $\ell _{1}$ trend filtering is a method to obtain the trend considered by Perron (1989) and Rappoport and Reichlin (1989).

Although the $\ell _{1}$ trend filtering can estimate a continuous piecewise linear trend of univariate time series, it cannot estimate a common continuous piecewise linear trend of multiple time series. In this paper, we develop a statistical procedure that enables us to estimate it, which is a multivariate extension of the $\ell _{1}$ trend filtering. To explain more precisely, let $y_{i,t}$ be an observation of a univariate time series i at t, where $i=1,\ldots ,n$ and $t=1,\ldots ,T$, and suppose that it has a continuous piecewise linear trend $x_{i,t}$. As stated, the $\ell _{1}$ trend filtering can be applied for estimating $x_{i,t}$ from $y_{i,t}$. In this paper, we consider the situation such that $x_{i,t}$ can be expressed as

$$\begin{aligned} x_{i,t}=a_{i}x_{t},\quad i=1,\ldots ,n,\quad t=1,\ldots ,T, \end{aligned}$$

(1)

where $x_{t}$ is a continuous piecewise linear trend and $a_{i}$ is a loading coefficient. Given that (1) can be represented as

$$\begin{aligned} \begin{bmatrix} x_{1,t}\\ \vdots \\ x_{n,t}\\ \end{bmatrix} =\begin{bmatrix} a_{1}x_{t}\\ \vdots \\ a_{n}x_{t}\\ \end{bmatrix} =\begin{bmatrix} a_{1}\\ \vdots \\ a_{n}\\ \end{bmatrix} x_{t},\quad t=1,\ldots ,T, \end{aligned}$$

(2)

even though $y_{1,t},\ldots ,y_{n,t}$ commonly have $x_{t}$, their linear combination $\beta _{1}y_{1,t}+\cdots +\beta _{n}y_{n,t}$ no longer has $x_{t}$ if $[\beta _{1},\ldots ,\beta _{n}]'$ is a vector that belongs to the orthogonal complement of the space spanned by $[a_{1},\ldots ,a_{n}]'$. Hatanaka and Yamada (2003) referred to it as ‘co-trending.’^{Footnote 3}

In this paper, by extending the $\ell _{1}$ trend filtering, we develop a novel method to estimate $x_{t}$ and $a_{i}$ from $y_{i,t}$. Recall that $i=1,\ldots ,n$ and $t=1,\ldots ,T$, where n (resp. T) represents the number of univariate time series (resp. observations). We refer to the novel filtering method as ‘$\ell _{1}$ common trend filtering.’ We provide an algorithm for estimating this and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also (i) numerically illustrate how well the algorithm works, (ii) provide an empirical illustration, and (iii) introduce a generalization of our novel method.

Here, we remark that (2) is not an unlikely model of trends in macroeconomic time series but has strong relevance. To explain more precisely, let $y_{1,t},\ldots ,y_{n,t}$ be macroeconomic time series in natural logarithms and $e_{1,t},\ldots ,e_{n,t}$ be such that

$$\begin{aligned} y_{i,t}=x_{i,t}+e_{i,t},\quad i=1,\ldots ,n,\quad t=1,\ldots ,T. \end{aligned}$$

(3)

Let $g_{i,t}=\Delta y_{i,t}(=y_{i,t}-y_{i,t-1})$. Accordingly, $g_{i,t}$ for $i=1,\ldots ,n$ represent the growth rates of the original time series. Then, (2) and (3) are equivalent to

$$\begin{aligned} \Delta g_{i,t}=a_{i}b_{t}+v_{i,t},\quad i=1,\ldots ,n,\quad t=3,\ldots ,T, \end{aligned}$$

(4)

with initial conditions such as $y_{i,1}=a_{i}x_{1}+e_{i,1}$ and $\Delta y_{i,2}=a_{i}\Delta x_{2}+\Delta e_{i,2}$, where $b_{t}=\Delta x_{t}-\Delta x_{t-1}$ and $v_{i,t}=\Delta e_{i,t}-\Delta e_{i,t-1}$. Recall that $\Delta g_{i,t}$ in (4) denotes the difference of growth rates of variable i at t. Given that $x_{t}$ in (2) is a continuous piecewise linear trend, only a few of $b_{3},\ldots ,b_{T}$ are not equal to 0. We may regard such nonzero $b_{t}$s in (4) as occasional permanent shocks that shift growth rates of multiple time series simultaneously and $a_{i}$ for $i=1,\ldots ,n$ in (4) represent individual reaction coefficients of the time series. A typical example of such occasional permanent shocks is the oil price shock in 1973. It is natural to consider that, at the time, the growth rates of macroeconomic time series changed simultaneously with their own reaction rates.

This paper is organized as follows. Section 2 introduces the novel filtering method and provides its reduced-rank-regression (RRR) representations. Section 3 discusses a numerical computation method for $x_{t}$ and $a_{i}$ in (1). Section 4 provides a clue to specify the tuning parameter of the procedure required for its application. Section 5 numerically illustrates how well the novel statistical procedure works. Section 6 includes an empirical illustration. Section 7 mentions a generalization of our method. Section 8 concludes the paper.

Notations Let ${\varvec{y}}_{i}=[y_{i,1},\ldots ,y_{i,T}]'$, ${\varvec{x}}_{i}=[x_{i,1},\ldots ,x_{i,T}]'$, ${\varvec{x}}=[x_{1},\ldots ,x_{T}]'$, ${\varvec{Y}}=[{\varvec{y}}_{1},\ldots ,{\varvec{y}}_{n}]\in \mathbb {R}^{T\times n}$, ${\varvec{I}}_{m}$ is the identity matrix of order m, ${\varvec{J}}=[{\varvec{0}},{\varvec{I}}_{T-2}]\in \mathbb {R}^{(T-2)\times T}$, ${\varvec{a}}=[a_{1},\ldots ,a_{n}]'$, and ${\varvec{D}}\in \mathbb {R}^{(T-2)\times T}$ be the second order difference matrix such that ${\varvec{D}}{\varvec{x}}_{i}=[\Delta ^{2}x_{i,3},\ldots ,\Delta ^{2}x_{i,T}]'$. Explicitly, ${\varvec{D}}$ is the $(T-2)\times T$ Toeplitz matrix of which the first and last rows are $[1,-2,1,0,\ldots ,0]$ and $[0,\ldots ,0,1,-2,1]$, respectively. In addition, let

(5)

Finally, for a vector ${\varvec{\gamma }}=[\gamma _{1},\ldots ,\gamma _{m}]'$, $\Vert {\varvec{\gamma }}\Vert _{2}^{2}={\varvec{\gamma }}'{\varvec{\gamma }}=\sum _{t=1}^{m}\gamma _{t}^{2}$, $\Vert {\varvec{\gamma }}\Vert _{1}=\sum _{t=1}^{m}|\gamma _{t}|$, $\Vert {\varvec{\gamma }}\Vert _{\infty }=\max \{|\gamma _{1}|,\ldots ,|\gamma _{m}|\}$, and, for a matrix ${\varvec{\Gamma }}\in \mathbb {R}^{r\times s}$ whose (i, j) entry is denoted by $\gamma _{ij}$, $\Vert {\varvec{\Gamma }}\Vert _{\mathrm {F}}^{2}=\sum _{i=1}^{r}\sum _{j=1}^{s}\gamma _{ij}^{2}$.

A small note (i) The null space ${\varvec{D}}$ is identical to the column space of ${\varvec{\Pi }}$ and accordingly ${\varvec{D}}{\varvec{\Pi }}={\varvec{0}}$, (ii) ${\varvec{\Psi }}$ is a right inverse of ${\varvec{D}}$, i.e., ${\varvec{D}}{\varvec{\Psi }}={\varvec{I}}_{T-2}$ (Paige and Trindade 2010), (iii) ${\mathsf {det}}({\varvec{X}})=1$ and thus ${\varvec{X}}$ is nonsingular, and (iv) given (1), we have $[{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]=[a_{1}{\varvec{x}},\ldots ,a_{n}{\varvec{x}}]={\varvec{x}}{\varvec{a}}'\in \mathbb {R}^{T\times n}$.

2 $\ell _{1}$ Common Trend Filtering

2.1 $\ell _{1}$ Trend Filtering

The $\ell _{1}$ trend filtering is defined by

$$\begin{aligned} \min _{x_{i,1},\ldots ,x_{i,T}\in \mathbb {R}}\, \sum _{t=1}^{T}(y_{i,t}-x_{i,t})^{2}+\psi \sum _{t=3}^{T}|\Delta ^{2} x_{i,t}|, \end{aligned}$$

(6)

where $\psi >0$ is a tuning parameter. In matrix notation, it is expressed as

$$\begin{aligned} \min _{{\varvec{x}}_{i}\in \mathbb {R}^{T}}\, \Vert {\varvec{y}}_{i}-{\varvec{x}}_{i}\Vert _{2}^{2}+\psi \Vert {\varvec{D}}{\varvec{x}}_{i}\Vert _{1}. \end{aligned}$$

(7)

2.2 $\ell _{1}$ Common Trend Filtering

In this paper, we extend the $\ell _{1}$ trend filtering so that we may estimate a common continuous piecewise linear trend of multiple time series, $y_{1,t},\ldots ,y_{n,t}$. The filtering method we introduce in this paper is:

$$\begin{aligned} \min _{\begin{array}{c} x_{1},\ldots ,x_{T}\in \mathbb {R}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\sum _{t=1}^{T}(y_{i,t}-a_{i}x_{t})^{2}+\lambda \sum _{t=3}^{T}|\Delta ^{2} x_{t}|,\quad {\text {s.t.}}\,\sum _{i=1}^{n}a_{i}^{2}=1, \end{aligned}$$

(8)

where $\lambda >0$ is a tuning parameter. We refer to the filtering method described by (8) as $\ell _{1}$ common trend filtering. In matrix notation, the filtering is expressed as

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1},\quad {\text {s.t.}}\,\sum _{i=1}^{n}a_{i}^{2}=1. \end{aligned}$$

(9)

Furthermore, given that $\sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}=\Vert {\varvec{Y}}-{\varvec{x}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}$ and $\sum _{i=1}^{n}a_{i}^{2}=\Vert {\varvec{a}}\Vert _{2}^{2}$, (9) can be represented by

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ {\varvec{a}}\in \mathbb {R}^{n} \end{array}}\, \Vert {\varvec{Y}}-{\varvec{x}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$

(10)

This is an $\ell _{1}$-norm penalized RRR.

2.3 Another Representation

Let ${\varvec{b}}\in \mathbb {R}^{T}$ be a column vector such as ${\varvec{x}}={\varvec{X}}{\varvec{b}}$. Given that (i) ${\varvec{X}}$ is nonsingular and (ii) ${\varvec{D}}{\varvec{X}}={\varvec{J}}$ (Paige and Trindade 2010), we obtain another RRR representation of (10):

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{a}}\in \mathbb {R}^{n}\\ {\varvec{b}}\in \mathbb {R}^{T} \end{array}}\, f({\varvec{a}},{\varvec{b}})=\Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$

(11)

We remark that when ${\varvec{J}}{\varvec{b}}$ is sparse, ${\varvec{x}}={\varvec{X}}{\varvec{b}}$ represents a continuous piecewise linear trend.^{Footnote 4} In addition, interestingly, (11) is similar to Eq. (8) in Chen and Huang (2012).^{Footnote 5}

3 Numerical Solution

3.1 Two Key Results

Given $\Vert {\varvec{a}}\Vert _{2}^{2}=1$, the objective function in (11) can be represented as

$$\begin{aligned} \Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}&={\mathsf {tr}}({\varvec{Y}}'{\varvec{Y}})+{\mathsf {tr}}({\varvec{X}}{\varvec{b}}{\varvec{b}}'{\varvec{X}}') -2{\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} . \end{aligned}$$

(12)

3.1.1 The Case Where ${\varvec{b}}\in \mathbb {R}^{T}$ is Given

Suppose that ${\varvec{b}}\in \mathbb {R}^{T}$ is given. Then, ${\varvec{X}}{\varvec{b}}(={\varvec{x}})\in \mathbb {R}^{T}$ is a known column vector. Because both ${\mathsf {tr}}({\varvec{Y}}'{\varvec{Y}})$ and ${\mathsf {tr}}({\varvec{X}}{\varvec{b}}{\varvec{b}}'{\varvec{X}}')$ in (12) do not depend on ${\varvec{a}}$, when ${\varvec{b}}\in \mathbb {R}^{T}$ is given, (11) reduces to

$$\begin{aligned} \max _{{\varvec{a}}\in \mathbb {R}^{n}}\,{\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} ,\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$

(13)

We remark that, given ${\varvec{x}}={\varvec{X}}{\varvec{b}}$, it follows that ${\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} =\sum _{i=1}^{n}{\varvec{y}}_{i}'(a_{i}{\varvec{x}})$, which is quite reasonable as an objective function for estimating $a_{1},\ldots ,a_{n}$. Moreover, letting ${\varvec{\phi }}={\varvec{Y}}'{\varvec{X}}{\varvec{b}}\in \mathbb {R}^{n}$, it follows that ${\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} ={\varvec{\phi }}'{\varvec{a}}$. Therefore, instead of (13), we may consider the following constrained maximization problem:

$$\begin{aligned} \max _{{\varvec{a}}\in \mathbb {R}^{n}}\,g({\varvec{a}})={\varvec{\phi }}'{\varvec{a}},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$

(14)

Given $\Vert {\varvec{a}}\Vert _{2}=1$, by the Cauchy–Schwarz inequality, we obtain $g({\varvec{a}})={\varvec{\phi }}'{\varvec{a}}\le |{\varvec{\phi }}'{\varvec{a}}|\le \Vert {\varvec{\phi }}\Vert _{2}\Vert {\varvec{a}}\Vert _{2}=\Vert {\varvec{\phi }}\Vert _{2}$, from which we have $g({\varvec{a}})<g(\widehat{{\varvec{a}}})$ if ${\varvec{a}}\ne \widehat{{\varvec{a}}}$, where $\widehat{{\varvec{a}}} =({\varvec{\phi }}'{\varvec{\phi }})^{-1/2}{\varvec{\phi }} =({\varvec{b}}'{\varvec{X}}'{\varvec{Y}}{\varvec{Y}}'{\varvec{X}}{\varvec{b}})^{-1/2}{\varvec{Y}}'{\varvec{X}}{\varvec{b}}$. Consequently, given ${\varvec{b}}\in \mathbb {R}^{T}$, we have the following inequality:

$$\begin{aligned} f(\widehat{{\varvec{a}}},{\varvec{b}})< f({\varvec{a}},{\varvec{b}}) \end{aligned}$$

(15)

if ${\varvec{a}}\ne \widehat{{\varvec{a}}}$.

3.1.2 The Case Where ${\varvec{a}}\in \mathbb {R}^{n}$ is Given

Next, suppose that ${\varvec{a}}\in \mathbb {R}^{n}$ such that $\Vert {\varvec{a}}\Vert _{2}^{2}=1$ is given. Then, (11) reduces to

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\, \Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1}. \end{aligned}$$

(16)

Let ${\varvec{A}}_{\perp }\in \mathbb {R}^{n\times (n-1)}$ be a matrix of an orthonormal basis of the orthogonal complement of the space spanned by ${\varvec{a}}$. Then, given $\Vert {\varvec{a}}\Vert _{2}^{2}=1$, $[{\varvec{a}},{\varvec{A}}_{\perp }]\in \mathbb {R}^{n\times n}$ is an orthogonal matrix. Thus, given that $ ({\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}')[{\varvec{a}},{\varvec{A}}_{\perp }] =[{\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}},{\varvec{Y}}{\varvec{A}}_{\perp }], $ it follows that $\Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2} =\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}+\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}$. Given that $\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}$ does not depend on ${\varvec{b}}$, (16) becomes

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1}. \end{aligned}$$

(17)

We remark here that (17) can be represented as

$$\begin{aligned} \min _{{\varvec{x}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{x}}\Vert _{2}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1}. \end{aligned}$$

(18)

See, e.g., Eq. (9) in Kim et al. (2009).^{Footnote 6}

As (17) is a problem whose objective function is coercive and strictly convex over $\mathbb {R}^{T}$, it has a unique global minimizer. Thus, denoting the solution by $\widehat{{\varvec{b}}}$, we have the following result. Given ${\varvec{a}}\in \mathbb {R}^{n}$ such that $\Vert {\varvec{a}}\Vert _{2}^{2}=1$, we have the following inequality:

$$\begin{aligned} f({\varvec{a}},\widehat{{\varvec{b}}})< f({\varvec{a}},{\varvec{b}}) \end{aligned}$$

(19)

if ${\varvec{b}}\ne \widehat{{\varvec{b}}}$.

3.2 A Numerical Algorithm

Based on the above two inequalities, (15) and (19), we introduce a numerical algorithm. Given $\widehat{{\varvec{a}}}_{1}\in \mathbb {R}^{n}$ and $\widehat{{\varvec{b}}}_{1}\in \mathbb {R}^{T}$, for $i\in \mathbb {N}$, we define $\widehat{{\varvec{a}}}_{i+1}$ and $\widehat{{\varvec{b}}}_{i+1}$ by

$$\begin{aligned}&\widehat{{\varvec{a}}}_{i+1} =(\widehat{{\varvec{b}}}_{i}'{\varvec{X}}'{\varvec{Y}}{\varvec{Y}}'{\varvec{X}} \widehat{{\varvec{b}}}_{i})^{-1/2}{\varvec{Y}}'{\varvec{X}}\widehat{{\varvec{b}}}_{i}, \end{aligned}$$

(20)

$$\begin{aligned}&\widehat{{\varvec{b}}}_{i+1}={\mathrm {arg\,}}\min _{{\varvec{b}}_{i+1}\in \mathbb {R}^{T}}\, \Vert {\varvec{Y}}\widehat{{\varvec{a}}}_{i+1}-{\varvec{X}}{\varvec{b}}_{i+1}\Vert _{2}^{2} +\lambda \Vert {\varvec{J}}{\varvec{b}}_{i+1}\Vert _{1}. \end{aligned}$$

(21)

Then, we have the following result.

Lemma 3.1

For $i\in \mathbb {N}$, it follows that $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})$, where the equality holds only if $\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}$ and $\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}$.

Proof

(i) From (15) and (20), we have $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f({\varvec{a}},\widehat{{\varvec{b}}}_{i})$ for any ${\varvec{a}}\in \mathbb {R}^{n}$ and we thus have $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})$, where the equality holds only if $\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}$. (ii) Likewise, from (19) and (21), we have $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},{\varvec{b}})$ for any ${\varvec{b}}\in \mathbb {R}^{T}$ and we thus have $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})$, where the equality holds only if $\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}$. (iii) Combining these inequalities, we have $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i}) \le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})$, which leads to $f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})$, where the equality holds only if $\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}$ and $\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}$. $\square $

Given Lemma 3.1, we have the following result.

Proposition 3.2

Let $f_{i}=f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})$ for $i\in \mathbb {N}$. A sequence of real numbers, $(f_{i})_{i\in \mathbb {N}}$, has a finite limit.

Proof

From Lemma 3.1, $(f_{i})_{i\in \mathbb {N}}$ is a nonincreasing sequence. In addition, as $f_{i}\ge 0$ for $i\in \mathbb {N}$, it is bounded below. Consequently, it has a finite limit. $\square $

Proposition 3.2 implies that the objective function in (11) converges by alternatively minimizing it over ${\varvec{a}}$ and ${\varvec{b}}$. Denote ${\varvec{a}}$ and ${\varvec{b}}$ such that the objective function in (11) is converged by $\widehat{{\varvec{a}}}$ and $\widehat{{\varvec{b}}}$. A Matlab user-defined function for estimating $\widehat{{\varvec{a}}}$ and $\widehat{{\varvec{b}}}$ from ${\varvec{Y}}$ and $\lambda $, l1_common_trend_filter, is provided in the supplementary material.

4 A Clue for Specifying the Tuning Parameter

Applying the $\ell _{1}$ common trend filtering requires the specification of its tuning parameter. In this section, we provide a clue for specifying it.

Consider the following convex problem:

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\,h({\varvec{b}})=\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2},\quad {\text {s.t.}}\,\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}\le c, \end{aligned}$$

(22)

where $c>0$ and ${\varvec{a}}\in \mathbb {R}^{n}$ is a given vector. Given that ${\varvec{X}}$ is nonsingular and $h({\varvec{b}})\ge 0$, if ${\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}$ is feasible, i.e., $\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\le c$, then ${\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}$ is the solution of the above convex problem. Here, recall ${\varvec{J}}{\varvec{X}}^{-1}={\varvec{D}}$. In the case, $h({\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}})=0$. If it is not feasible, i.e., $\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}>c$, then ${\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}$ cannot be the solution. In the case, the solution, denoted by $\widetilde{{\varvec{b}}}$, locates at the boundary. Thus, we have $\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c$.

More precisely, concerning $\widetilde{{\varvec{b}}}$, we have the following results.

Proposition 4.1

If $0<c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, then (i) $\widetilde{{\varvec{b}}}$ equals $\widehat{{\varvec{b}}}$ estimated by (16)/(17) with

$$\begin{aligned} \lambda =\frac{2({\varvec{X}}\widetilde{{\varvec{b}}})'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})}{\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}} \end{aligned}$$

(23)

and (ii) $\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c$ holds.

Proof

See Section A.4 in the Appendix. $\square $

Proposition 4.1 implies that we can obtain $\widehat{{\varvec{a}}}$ and $\widehat{{\varvec{b}}}$ by specifying c in (22) instead of $\lambda $ in (17). A Matlab user-defined function for calculating $\widehat{{\varvec{a}}}$ and $\widehat{{\varvec{b}}}$ from ${\varvec{Y}}$ and c, l1_common_trend_filter_c, is provided in the supplementary material. Here, we point out that specifying c is much easier than specifying $\lambda $. We do not have any useful information for specifying $\lambda $, whereas we have such an information for specifying c. As stated in Proposition 4.1, we may estimate $\widehat{{\varvec{b}}}$ such that

$$\begin{aligned} c=\Vert {\varvec{J}}\widehat{{\varvec{b}}}\Vert _{1}=\sum _{t=3}^{T}|\widehat{b}_{t}|. \end{aligned}$$

(24)

We may utilize this relation for specifying c. See, e.g., (A.1). In the case,

$$\begin{aligned} x_{t}-x_{t-1} = {\left\{ \begin{array}{ll} b_{2}, &{} \text {if }t=2,3,\\ b_{2}+b_{4}, &{} \text {if }t=4,5,\\ b_{2}+b_{4}+b_{6}, &{} \text {if }t=6,7. \end{array}\right. } \end{aligned}$$

(25)

Thus, we may specify rough range of c from the plots of first differences of multiple time series.^{Footnote 7}

Finally, given ${\varvec{X}}{\varvec{b}}={\varvec{x}}$ and ${\varvec{J}}{\varvec{b}}={\varvec{D}}{\varvec{x}}$, we remark that the convex problem (22) may be replaced with the following convex problem:

$$\begin{aligned} \min _{{\varvec{x}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{x}}\Vert _{2}^{2},\quad {\text {s.t.}}\,\Vert {\varvec{D}}{\varvec{x}}\Vert _{1}\le c, \end{aligned}$$

(26)

where $c>0$.

5 Numerical Illustrations

In this section, we numerically illustrate how well the algorithm described in the last section works. Figure 1 plots generated ${\varvec{X}}{\varvec{b}}$, where $T=100$, $b_{1}=10$, $b_{2}=0.5$, and $\sum _{t=3}^{T}|b_{t}|=2.7243$. Ten bullets in the figure depict the kink points and accordingly, 10 entries of ${\varvec{J}}{\varvec{b}}=[b_{3},\ldots ,b_{T}]'$ are not equal to 0. Using ${\varvec{X}}{\varvec{b}}$ shown in Fig. 1, we generated ${\varvec{y}}_{1}$, ${\varvec{y}}_{2}$, and ${\varvec{y}}_{3}$ by

$$\begin{aligned} {[}{\varvec{y}}_{1},{\varvec{y}}_{2},{\varvec{y}}_{3}]={\varvec{X}}{\varvec{b}}{\varvec{a}}'+{\varvec{E}}, \end{aligned}$$

(27)

where ${\varvec{a}}=[1,0.6,0.2]'$ and $\mathrm {vec}({\varvec{E}})\sim \mathrm {N}({\varvec{0}},\sigma ^{2}{\varvec{I}}_{3T})$ with $\sigma =1$. Figure 2 plots them.

For obtaining $\widehat{{\varvec{a}}}$ and $\widehat{{\varvec{b}}}$, we used l1_common_trend_filter_c, which is presented in the supplementary material, with $c=3$. This required three iterations for convergence. As a result, we obtained

$$\begin{aligned} \widehat{{\varvec{a}}}=\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2}\\ \widehat{a}_{3}\\ \end{bmatrix} =\begin{bmatrix} 0.8455\\ 0.5066\\ 0.1688\\ \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \widehat{a}_{3}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.5991\\ 0.1996\\ \end{bmatrix} \approx \begin{bmatrix} 1\\ 0.6\\ 0.2\\ \end{bmatrix}, \end{aligned}$$

(28)

and $\widehat{{\varvec{b}}}$ such that $\Vert {\varvec{J}}\widehat{{\varvec{b}}}\Vert _{1}=3$, the latter of which is consistent with Proposition 4.1(ii). The value of $\lambda $ for obtaining $\widehat{{\varvec{b}}}$ from ${\varvec{Y}}\widehat{{\varvec{a}}}$ is 10.3730.

Figure 3 illustrates the results. The solid line in Fig. 3 plots the estimated $\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}$. The dashed line in the figure plots $a_{1}{\varvec{X}}{\varvec{b}}$. Note that, given $a_{1}=1$, it is identical to the solid line depicted in Fig. 1. From the figure, we can see that $\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}$ looks very much like $a_{1}{\varvec{X}}{\varvec{b}}$. Figure 4 also illustrates the results. The solid lines in Fig. 4 are identical to those in Fig. 2. The dashed lines on y$_{1}$, y$_{2}$ and y$_{3}$ respectively plot $\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}$, $\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}$, and $\widehat{a}_{3}{\varvec{X}}\widehat{{\varvec{b}}}$. Again, the figure shows that our novel procedure works well.

As a supplementary examination, we generated an additional data set, repeated the same analysis, and revealed similar results. For example, we obtained

$$\begin{aligned} \widehat{{\varvec{a}}}=\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2}\\ \widehat{a}_{3}\\ \end{bmatrix} =\begin{bmatrix} 0.8462\\ 0.5060\\ 0.1670\\ \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \widehat{a}_{3}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.5980\\ 0.1974\\ \end{bmatrix} \approx \begin{bmatrix} 1\\ 0.6\\ 0.2\\ \end{bmatrix}. \end{aligned}$$

(29)

See also Figures B.1–B.4 in the supplementary material.

6 An Empirical Illustration

Figure 5, which is identical to Figure 1.2 of Hatanaka and Yamada (2003), plots two quarterly macroeconomic time series. More precisely, the upper [resp. lower] panel of the figure depicts the natural logarithm of Japanese M2$+$CD [resp. real gross domestic product (GDP)], from the first quarter of 1980 to the third quarter of 2001. From the figure, we can observe that these two time series seem to contain a common piecewise linear trend such that a major kink point is located at around 1991, which corresponds to the peak of the Japanese asset price bubble. Actually, the statistical procedure developed by Hatanaka and Yamada (2003) detected a common piecewise linear trend. [See Section 8.5 of Hatanaka and Yamada (2003).]

Figure 6 depicts the corresponding demeaned series. We estimated a common piecewise linear trend of these demeaned data. Denote it by ${\varvec{X}}\widehat{{\varvec{b}}}$. For the estimation, we used l1_common_trend_filter_c with $c=0.018$. We specified the value of c by reference to the plots of time series in first differences shown in Fig. 7. The upper panel (resp. lower panel) of Fig. 8 depicts ${\varvec{X}}\widehat{{\varvec{b}}}$ (resp. $\widehat{b}_{2},\ldots ,\widehat{b}_{T}$, where $T=87$). From the panels, we may observe that (i) a major kink point is located at around 1991 and (ii) $\sum _{t=3}^{T}|\widehat{b}_{t}|$ equals the value of c. The solid lines in Fig. 9 are identical to those plotted in Fig. 6. The dashed line in the upper (resp. lower) panel plots $\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}$ (resp. $\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}$), where

$$\begin{aligned} \widehat{{\varvec{a}}} =\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2} \end{bmatrix} =\begin{bmatrix} 0.8962\\ 0.4437 \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.4951 \end{bmatrix}. \end{aligned}$$

Finally, the dashed lines in Fig. 10 are the mean-restored estimated piecewise linear trends. The solid lines in the figure are identical to those in Fig. 5.

7 A Generalization

In this section, we mention a generalization of our method briefly. Let ${\varvec{D}}_{p}\in \mathbb {R}^{(T-p)\times T}$ be the p-th order difference matrix such that ${\varvec{D}}_{p}{\varvec{x}}_{i}=[\Delta ^{p}x_{i,p+1},\ldots ,\Delta ^{p}x_{i,T}]'$. Explicitly, ${\varvec{D}}_{p}$ is a Toeplitz matrix as follows:

$$\begin{aligned} {\varvec{D}}_{p}=\begin{bmatrix} a_{0} &{} \cdots &{} a_{p} &{} 0 &{} \cdots &{} 0 \\ 0 &{} \ddots &{} &{}\ddots &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} &{} \ddots &{} 0 \\ 0 &{} \cdots &{} 0 &{} a_{0} &{} \cdots &{} a_{p} \\ \end{bmatrix}, \end{aligned}$$

(30)

where $a_{k}=(-1)^{p-k}\left( {\begin{array}{c}p\\ k\end{array}}\right) $ for $k=0,\ldots ,p$. Accordingly, ${\varvec{D}}_{2}$ equals ${\varvec{D}}$. Then, without any difficulty, we may extend our procedure to

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}+\lambda _{p}\Vert {\varvec{D}}_{p}{\varvec{x}}\Vert _{1}, \quad {\text {s.t.}}\,&\sum _{i=1}^{n}a_{i}^{2}=1, \end{aligned}$$

(31)

where $p\in \mathbb {N}$ and $\lambda _{p}>0$ is a tuning parameter. We refer to (31) as ‘$\ell _{1}$ common polynomial trend filtering.’ The solution of the problem represents a continuous piecewise $(p-1)$-th order polynomial trend. Note that the corresponding ${\varvec{X}}_{p}=[{\varvec{\Pi }}_{p},{\varvec{\Psi }}_{p}]$ such that ${\varvec{D}}_{p}{\varvec{X}}_{p}=[{\varvec{D}}_{p}{\varvec{\Pi }}_{p},{\varvec{D}}_{p} {\varvec{\Psi }}_{p}]=[{\varvec{0}},{\varvec{I}}_{T-p}]\in \mathbb {R}^{(T-p)\times T}$ is given in Yamada (2015).

8 Concluding Remarks

In this paper, we developed an extension of the $\ell _{1}$ trend filtering. The $\ell _{1}$ trend filtering can estimate a continuous piecewise linear trend of univariate time series. However, it cannot estimate a common continuous piecewise linear trend of multiple time series, which the novel statistical procedure developed in this paper enables. We provided an algorithm and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also numerically illustrated how well the algorithm works, provided an empirical illustration, and introduced a generalization of our novel method.

Change history

24 November 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10614-021-10217-3

Notes

‘$\ell _{1}$ trend filtering’ is the terminology used by Kim et al. (2009). The approach is a form of $\ell _{1}$-norm penalized least squares and may also be regarded as a type of generalized lasso regression (Tibshirani 1996; Kim et al. 2009; Tibshirani and Taylor 2011) and as a generalization of one-dimensional total variation denoising (Rudin et al. 1992; Steidl et al. 2006; Guntuboyina et al. 2020).
In econometrics, we have observed growing interest in the HP filtering. Recent studies about it include those by de Jong and Sakarya (2016), Cornea-Madeira (2017), Hamilton (2018), Phillips and Jin (2020), Phillips and Shi (2020), Sakarya and de Jong (2020), and Yamada (2020). HP filtering has been used to extract the cyclical component of a univariate time series. For other such methods, see, e.g., Pollock (2016) and Michaelides et al. (2018). Also, it is a type of the Whittaker–Henderson (WH) method of graduation. For a historical survey of the WH method of graduation, see, e.g., Weinert (2007) and Phillips (2010).
It is a similar concept to cointegration (Engle and Granger 1987). In the case of cointegration, variables have common stochastic trends. For more details, see, e.g., Hatanaka (1996).
See Section A.1 in the Appendix.
See Section A.2 in the Appendix.
The dual problem of (18) and its implication are given in Section A.3 in the Appendix.
See Sect. 6 for an empirical illustration.

References

Bertsekas, D. P. (1999). Nonlinear programming (2nd ed.). Belmont: Athena Scientific.
Google Scholar
Brantley, H. L., Guinness, J., & Chi, E. C. (2019). Baseline drift estimation for air quality data using quantile trend filtering, forthcoming. In Annals of Applied Statistics.
Chen, L., & Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107, 1533–1545.
Article Google Scholar
Cornea-Madeira, A. (2017). The explicit formula for the Hodrick–Prescott filter in a finite sample. Review of Economics and Statistics, 99(2), 314–318.
Article Google Scholar
de Jong, R. M., & Sakarya, N. (2016). The econometrics of the Hodrick–Prescott filter. Review of Economics and Statistics, 98(2), 310–317.
Article Google Scholar
Engle, R. F., & Granger, C. W. J. (1987). Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2), 251–276.
Article Google Scholar
Guntuboyina, A., Lieu, D., Chatterjee, S., & Sen, B. (2020). Adaptive risk bounds in univariate total variation denoising and trend filtering. Annals of Statistics, 48(1), 205–229.
Article Google Scholar
Hamilton, J. D. (1994). Time series analysis. Princeton: Princeton University Press.
Book Google Scholar
Hamilton, J. D. (2018). Why you should never use the Hodrick–Prescott filter. Review of Economics and Statistics, 100, 831–843.
Article Google Scholar
Hatanaka, M. (1996). Time-series-based econometrics: Unit roots and co-integrations. New York: Oxford University Press.
Book Google Scholar
Hatanaka, M., & Yamada, H. (2003). Co-trending: A statistical system analysis of economic trends. Tokyo: Springer.
Book Google Scholar
Hodrick, R. J., & Prescott, E. C. (1997). Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking, 29(1), 1–16.
Article Google Scholar
Kim, S., Koh, K., Boyd, S., & Gorinevsky, D. (2009). $\ell _{1}$ trend filtering. SIAM Review, 51(2), 339–360.
Article Google Scholar
Klein, T. (2018). Trends and contagion in WTI and Brent crude oil spot and futures markets–the role of OPEC in the last decade. Energy Economics, 75, 636–646.
Article Google Scholar
Khodadadi, A., & McDonald, D. J. (2019). Algorithms for estimating trends in global temperature volatility. (arXiv: 1805.07376v2).
Michaelides, P. G., Tsionas, E. G., Vouldis, A. T., Konstantakis, K. N., & Patrinos, P. (2018). A semi-parametric non-linear neural network filter: Theory and empirical evidence. Computational Economics, 51, 637–675.
Article Google Scholar
Mitra, S., & Rohit, A. (2018). Momentum trading with the $ell_{1}$-filter: Are the markets efficient?, International Review of Finance.
Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). On the LASSO and its dual. Journal of Computation and Graphical Statistics, 9(2), 319–337.
Google Scholar
Paige, R. L., & Trindade, A. A. (2010). The Hodrick–Prescott filter: A special case of penalized spline smoothing. Electronic Journal of Statistics, 4, 856–874.
Article Google Scholar
Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis. Econometrica, 57(6), 1361–1401.
Article Google Scholar
Phillips, P. C. B. (2010). Two New Zealand pioneer econometricians. New Zealand Economic Papers, 44(1), 1–26.
Article Google Scholar
Phillips, P. C. B., & Jin, S. (2020). Business cycles, trend elimination, and the HP filter. International Economic Review, first online 02 December 2020. https://doi.org/10.1111/iere.12494.
Phillips, P. C. B., & Shi, Z. (2020). Boosting: Why you can use the HP filter, International Economic Review, first online 01 December 2020. https://doi.org/10.1111/iere.12495.
Politsch, C. A., Cisewski-Kehe, J., Croft, R. A. C., & Wasserman, L. (2020). Trend filtering - I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. Monthly Notices of the Royal Astronomical Society, 492(3), 4005–4018.
Article Google Scholar
Pollock, D. S. G. (2016). Econometric filters. Computational Economics, 48, 669–691.
Article Google Scholar
Rappoport, P., & Reichlin, L. (1989). Segmented trends and non-stationary time series. Economic Journal, 99, 168–177.
Article Google Scholar
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60, 259–268.
Article Google Scholar
Sakarya, N., & de Jong, R. M. (2020). A property of the Hodrick–Prescott filter and its application. Econometric Theory, 36(5), 840–870.
Article Google Scholar
Schechter, M. (1977). A subgradient duality theorem. Journal of Mathematical Analysis and Applications, 61(3), 850–855.
Article Google Scholar
Steidl, G. (2006). A note on the dual treatment of higher-order regularization functionals. Computing, 76, 135–148.
Article Google Scholar
Steidl, G., Didas, S., & Neumann, J. (2006). Splines in higher order TV regularization. International Journal of Computer Vision, 70, 241–255.
Article Google Scholar
Suo, C., Li, Z., Sun, Y., & Han, Y. (2019). Application of L1 trend filtering technology on the current time domain spectroscopy of dielectrics. Electronics, 8(9), 1046.
Article Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.
Google Scholar
Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Annals of Statistics, 42(1), 285–323.
Article Google Scholar
Tibshirani, R. J., & Taylor, J. (2011). The solution path of the generalized lasso. Annals of Statistics, 39(3), 1335–1371.
Article Google Scholar
Weinert, H. L. (2007). Efficient computation for Whittaker–Henderson smoothing. Computational Statistics and Data Analysis, 52(2), 959–974.
Article Google Scholar
Weir, T. (1988). Subgradient duality using Fritz John conditions. Journal of Information and Optimization Sciences, 9(2), 287–296.
Article Google Scholar
Winkelried, D. (2016). Piecewise linear trends and cycles in primary commodity prices. Journal of International Money and Finance, 64, 196–213.
Article Google Scholar
Wu, D., Yan, H., & Yuan, S. (2018). L1 regularization for detecting offsets and trend change points in GNSS time series. GPS Solutions, 22, 88.
Article Google Scholar
Yamada, H. (2015). Ridge regression representations of the generalized Hodrick–Prescott filter. Journal of the Japan Statistical Society, 45(2), 121–128.
Article Google Scholar
Yamada, H. (2017). Estimating the trend in US real GDP using the $\ell _{1}$ trend filtering. Applied Economics Letters, 24(10), 713–716.
Article Google Scholar
Yamada, H. (2020). A smoothing method that looks like the Hodrick–Prescott filter. Econometric Theory, 36(5), 961–981.
Article Google Scholar
Yamada, H., & Jin, L. (2013). Japan’s output gap estimation and $\ell _{1}$ trend filtering. Empirical Economics, 45(1), 81–88.
Article Google Scholar
Yamada, H., & Yoon, G. (2014). When Grilli and Yang meet Prebisch and Singer: Piecewise linear trends in primary commodity prices. Journal of International Money and Finance, 42, 193–207.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Data Science, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima, 739-8525, Japan
Hiroshi Yamada
Graduate School of Humanities and Social Sciences, Hiroshima University, Higashi-Hiroshima, Japan
Ruoyi Bao

Authors

Hiroshi Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Ruoyi Bao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Yamada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors thank Eiji Kurozumi, Kazuhiko Hayakawa, Hiroaki Mukaidani, Heewon Park, Takashi Yamagata, and two anonymous referees for their valuable comments on an earlier version of this paper. The first author dedicates this research to Michio Hatanaka. The usual caveat applies. The Japan Society for the Promotion of Science supported this work through KAKENHI Grant Number 20H01484.

The original online version of this article was revised due to Retrospective Open access

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 72 KB)

A Appendix

1.1 A.1 ${\varvec{x}}={\varvec{X}}{\varvec{b}}$ when ${\varvec{J}}{\varvec{b}}$ is sparse

We illustrate how ${\varvec{x}}={\varvec{X}}{\varvec{b}}$ represents a continuous piecewise linear trend when ${\varvec{J}}{\varvec{b}}=[b_{3},\ldots ,b_{T}]'$ is sparse. Consider the case where $T=7$ and ${\varvec{b}}=[b_{1},b_{2},0,b_{4},0,b_{6},0]'$. In the case, $x_{t}$ $(t=1,\ldots ,7)$ are expressed as follows:

$$\begin{aligned} x_{t}= {\left\{ \begin{array}{ll} b_{1}+b_{2}t, &{} \text {if }t=1,2,3,\\ b_{1}+b_{2}t+b_{4}(t-3), &{} \text {if }t=4,5,\\ b_{1}+b_{2}t+b_{4}(t-3)+b_{6}(t-5), &{} \text {if }t=6,7. \end{array}\right. } \end{aligned}$$

(A.1)

Moreover, these three trends are connected at $t=3$ and $t=5$ as

$$\begin{aligned}&b_{1}+b_{2}\cdot 3=b_{1}+b_{2}\cdot 3+\underbrace{b_{4}\cdot (3-3)}_{=0},\\&\quad b_{1}+b_{2}\cdot 5+b_{4}\cdot (5-3)=b_{1}+b_{2}\cdot 5+b_{4}\cdot (5-3)+\underbrace{b_{6}\cdot (5-5)}_{=0}. \end{aligned}$$

Thus, $x_{1},\ldots ,x_{7}$ are on the following continuous piecewise linear function of t:

$$\begin{aligned} \alpha (t)=b_{1}+b_{2}t+b_{4}(t-3)_{+}+b_{6}(t-5)_{+},\quad (1\le t\le 7), \end{aligned}$$

(A.2)

where $(\zeta )_{+}=\zeta $ if $\zeta >0$ and $(\zeta )_{+}=0$ if $\zeta \le 0$. In this function, $(3,\alpha (3))$ and $(6,\alpha (6))$ are kink points. Accordingly, $\{x_{1},x_{2},x_{3}\}$ are on the same straight line whose slope is $b_{2}$, $\{x_{3},x_{4},x_{5}\}$ are on the same straight line whose slope is $b_{2}+b_{4}$, and $\{x_{5},x_{6},x_{7}\}$ are on the same straight line whose slope is $b_{2}+b_{4}+b_{6}$.

1.2 A.2 (11) and Chen and Huang (2012)

We point out a similarity between the $\ell _{1}$-norm penalized RRR (11) and Eq. (8) in Chen and Huang (2012):

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{A}}\in \mathbb {R}^{n\times r}\\ {\varvec{B}}\in \mathbb {R}^{p\times r} \end{array}}\, \Vert {\varvec{Y}}-{\varvec{Z}}{\varvec{B}}{\varvec{A}}'\Vert _{\mathrm {F}}^{2}+\sum _{i=1}^{p}\lambda _{i}\Vert {\varvec{B}}^{(i)}\Vert _{2},\quad {\text {s.t.}}\,{\varvec{A}}'{\varvec{A}}={\varvec{I}}_{r}, \end{aligned}$$

(A.3)

where ${\varvec{Z}}\in \mathbb {R}^{T\times p}$ is a matrix of full column rank, ${\varvec{B}}^{(i)}$ denotes the ith row of ${\varvec{B}}$, and $\lambda _{i}>0$ ($i=1,\ldots ,p$) are tuning parameters. To clarify the similarity, let $r=1$, $p=T$, and $\lambda _{i}=\lambda $ for $i=1,\ldots ,T$ in (A.3). Then, ${\varvec{A}}\in \mathbb {R}^{n}$, ${\varvec{B}}\in \mathbb {R}^{T}$, and ${\varvec{Z}}\in \mathbb {R}^{T\times T}$, and we denote them by ${\varvec{a}}$, ${\varvec{b}}$, and ${\varvec{X}}$, respectively. Given $\Vert {\varvec{B}}^{(i)}\Vert _{2}=|b_{i}|$ in this setting, where ${\varvec{b}}=[b_{1},\ldots ,b_{T}]'$, (A.3) finally becomes

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{a}}\in \mathbb {R}^{n}\\ {\varvec{b}}\in \mathbb {R}^{T} \end{array}}\, \Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{b}}\Vert _{1},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1, \end{aligned}$$

(A.4)

which is not identical to (11), but they are similar.

1.3 A.3 The dual problem of (18)

The dual problem of (18) is

$$\begin{aligned} \min _{{\varvec{\nu }}\in \mathbb {R}^{T-2}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{D}}'{\varvec{\nu }}\Vert _{2}^{2},\quad {\text {s.t.}}\,2\Vert {\varvec{\nu }}\Vert _{\infty }\le \lambda . \end{aligned}$$

(A.5)

See, e.g., Proposition 4.1 in Steidl et al. (2006) and Eq. (13) in Kim et al. (2009). The problem (A.5) is of use not only because it can be applied for solving (18) [and accordingly (17)] but also because we may see that $\widehat{{\varvec{\nu }}}=({\varvec{D}}{\varvec{D}}')^{-1}{\varvec{D}}{\varvec{Y}}{\varvec{a}}$ is feasible if $2\Vert ({\varvec{D}}{\varvec{D}}')^{-1}{\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{\infty }\le \lambda $ and in the case it follows that $\widehat{{\varvec{x}}}={\varvec{Y}}{\varvec{a}}-{\varvec{D}}'\widehat{{\varvec{\nu }}}= {\varvec{\Pi }}({\varvec{\Pi }}'{\varvec{\Pi }})^{-1}{\varvec{\Pi }}'{\varvec{Y}}{\varvec{a}}$, which represents a linear trend estimated by ordinary least squares. Here, $\widehat{{\varvec{\nu }}}$ [resp. $\widehat{{\varvec{x}}}$] denotes a unique global minimizer of (A.5) [resp. (18)].

1.4 A.4 Proof of Proposition 4.1

For proving Proposition 4.1, we give the following two lemmata:

Lemma A.1

There exists $\mu \ge 0$ such that:

$$\begin{aligned}&(\mathrm {stationarity})\quad -2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})+\mu {\varvec{J}}'\widetilde{{\varvec{v}}}={\varvec{0}},\\&\quad (\mathrm {complementary\; slackness})\quad \mu (\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}-c)=0, \end{aligned}$$

where ${\varvec{J}}'\widetilde{{\varvec{v}}}$ in the stationarity condition denotes a subgradient vector of $\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}$ at ${\varvec{b}}=\widetilde{{\varvec{b}}}$. Explicitly, $\widetilde{{\varvec{v}}}=[\widetilde{v}_{3},\ldots ,\widetilde{v}_{T}]'$ is defined by $\widetilde{v}_{t}\in \{1\}$ if $\widetilde{b}_{t}>0$, $\widetilde{v}_{t}\in \{-1\}$ if $\widetilde{b}_{t}<0$, and $\widetilde{v}_{t}\in [-1,1]$ if $\widetilde{b}_{t}=0$, for $t=3,\ldots ,T$.

Proof

See Section A.5.1. $\square $

Lemma A.2

If $c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, then ${\varvec{X}}\widetilde{{\varvec{b}}}\ne {\varvec{Y}}{\varvec{a}}$ follows.

Proof

See Section A.5.2. $\square $

Now, we are ready to give a proof of Proposition 4.1. From Lemma A.2, if $c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, then $({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})$ in the stationarity condition in Lemma A.1 is not equal to ${\varvec{0}}$. In addition, ${\varvec{X}}'$ is of full column rank. Hence, if $c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, then we have $\mu {\varvec{J}}'\widetilde{{\varvec{v}}}=2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})\ne {\varvec{0}}$, which leads to $\mu \ne 0$. Thus, given $\mu \ge 0$, it follows that $\mu >0$. From the complementary slackness condition in Lemma A.1, it follows that $\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c>0$. Given that $({\varvec{J}}\widetilde{{\varvec{b}}})'\widetilde{{\varvec{v}}}=\sum _{t=3}^{T}|\widetilde{b}_{t}| =\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}$, premultiplying the stationarity condition by $\widetilde{{\varvec{b}}}'$ yields $-2({\varvec{X}}\widetilde{{\varvec{b}}})'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}) +\mu \Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=0$. Then, given $\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c>0$, we obtain (23). In addition, given $\mu >0$, from the stationarity condition, it follows that $\widetilde{{\varvec{b}}}$ equals $\widehat{{\varvec{b}}}$ estimated with $\lambda =\mu $.

1.5 A.5 Miscellaneous proofs

1.5.1 A.5.1 Proof of Lemma A.1

As $\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}-c=-c<0$ if ${\varvec{b}}={\varvec{0}}$, Slater’s condition is satisfied. Then, from Schechter (1977) [and Theorem 1.2 in Weir (1988)], there exists $\mu \ge 0$ such that

$$\begin{aligned} {\varvec{0}}\in \partial \Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}\Vert _{2}^{2}+\mu \cdot \partial (\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}-c),\quad \mu (\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}-c)=0, \end{aligned}$$

(A.6)

where $\partial \Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}\Vert _{2}^{2}$ [resp. $\partial (\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}-c)$] in (A.6) denotes the subdifferential of $\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}$ [resp. $(\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}-c)$] at ${\varvec{b}}=\widetilde{{\varvec{b}}}(=[\widetilde{b}_{1},\ldots ,\widetilde{b}_{T}]')$. More precisely, (i) $\partial \Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}\Vert _{2}^{2} =\{ -2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}}) \}$, which is a singleton because $\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}$ is a differentiable function of ${\varvec{b}}\in \mathbb {R}^{T}$, and (ii) by Proposition B.24(e) in Bertsekas (1999) and Osborne et al. (2000), $\partial \Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}={\varvec{J}}'\widetilde{{\varvec{\xi }}}$, where $\widetilde{{\varvec{\xi }}}=[\widetilde{\xi }_{3},\ldots ,\widetilde{\xi }_{T}]'$. Here, $\widetilde{\xi }_{t}=\{1\}$ if $\widetilde{b}_{t}>0$, $\widetilde{\xi }_{t}=\{-1\}$ if $\widetilde{b}_{t}<0$, and $\widetilde{\xi }_{t}=[-1,1]$ if $\widetilde{b}_{t}=0$, for $t=3,\ldots ,T$. Accordingly, (A.6) can be represented as $-2{\varvec{X}}'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})+\mu {\varvec{J}}'\widetilde{{\varvec{v}}}={\varvec{0}}$, where $\widetilde{{\varvec{v}}}=[\widetilde{v}_{3},\ldots ,\widetilde{v}_{T}]'$ and $\widetilde{v}_{t}$ is an element of $\widetilde{\xi }_{t}$ for $t=3,\ldots ,T$.

1.5.2 A.5.2 Proof of Lemma A.2

Let ${\varvec{\kappa }}=\widetilde{{\varvec{b}}}-{\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}$. Given $c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, ${\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}$ is infeasible. Accordingly, if $c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}$, then we have ${\varvec{\kappa }}\ne {\varvec{0}}$, which leads to ${\varvec{X}}{\varvec{\kappa }}={\varvec{X}}\widetilde{{\varvec{b}}}-{\varvec{Y}}{\varvec{a}}\ne {\varvec{0}}$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yamada, H., Bao, R. $\ell _{1}$ Common Trend Filtering. Comput Econ 59, 1005–1025 (2022). https://doi.org/10.1007/s10614-021-10114-9

Download citation

Accepted: 18 March 2021
Published: 11 April 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10614-021-10114-9

Keywords

Mathematics Subject Classification

62G05

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

\(\ell _{1}\) Common Trend Filtering

Abstract

Similar content being viewed by others

A trend filtering method closely related to $$\ell _{1}$$ trend filtering

Time Series

Modelling Stochastic Processes with Time Series Analysis

1 Introduction

2 \(\ell _{1}\) Common Trend Filtering

2.1 \(\ell _{1}\) Trend Filtering

2.2 \(\ell _{1}\) Common Trend Filtering

2.3 Another Representation

3 Numerical Solution

3.1 Two Key Results

3.1.1 The Case Where \({\varvec{b}}\in \mathbb {R}^{T}\) is Given

3.1.2 The Case Where \({\varvec{a}}\in \mathbb {R}^{n}\) is Given

3.2 A Numerical Algorithm

Lemma 3.1

Proof

Proposition 3.2

Proof

4 A Clue for Specifying the Tuning Parameter

Proposition 4.1

Proof

5 Numerical Illustrations

6 An Empirical Illustration

7 A Generalization

8 Concluding Remarks

Change history

24 November 2021

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 72 KB)

A Appendix

A Appendix

1.1 A.1 \({\varvec{x}}={\varvec{X}}{\varvec{b}}\) when \({\varvec{J}}{\varvec{b}}\) is sparse

1.2 A.2 (11) and Chen and Huang (2012)

1.3 A.3 The dual problem of (18)

1.4 A.4 Proof of Proposition 4.1

Lemma A.1

Proof

Lemma A.2

Proof

1.5 A.5 Miscellaneous proofs

1.5.1 A.5.1 Proof of Lemma A.1

1.5.2 A.5.2 Proof of Lemma A.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation