1 Introduction

The \(\ell _{1}\) trend filtering, which was developed by Steidl et al. (2006), Steidl (2006), Kim et al. (2009), Tibshirani (2014), and Guntuboyina et al. (2020), enables us to extract a continuous piecewise linear trend of univariate time series.Footnote 1 Figure 1 illustrates a continuous piecewise linear trend. The filter and its variants have been subsequently applied in various fields, including astronomy (Politsch et al. 2020), climatology (Khodadadi and McDonald 2019), economics (Yamada and Jin 2013; Yamada and Yoon 2014; Winkelried 2016; Yamada 2017; Klein 2018), electronics (Suo et al. 2019), environmental science (Brantley et al. 2019), finance (Mitra and Rohit 2018), and geophysics (Wu et al. 2018).

The \(\ell _{1}\) trend filtering is defined by replacing the squared \(\ell _{2}\)-norm penalty of the Hodrick–Prescott (HP) (1997) filtering with the \(\ell _{1}\)-norm penalty.Footnote 2 It is notable that, even though the modification seems to be somewhat minor, the \(\ell _{1}\) trend filtering provides a continuous piecewise linear trend, whereas the HP filtering provides a smooth trend. In econometrics, such a continuous piecewise linear trend was dealt with by Perron (1989) and Rappoport and Reichlin (1989) and it reflects the idea that ‘economic events that have large permanent effects are relatively rare’ (Hamilton 1994). Thus, it is possible to say that the \(\ell _{1}\) trend filtering is a method to obtain the trend considered by Perron (1989) and Rappoport and Reichlin (1989).

Although the \(\ell _{1}\) trend filtering can estimate a continuous piecewise linear trend of univariate time series, it cannot estimate a common continuous piecewise linear trend of multiple time series. In this paper, we develop a statistical procedure that enables us to estimate it, which is a multivariate extension of the \(\ell _{1}\) trend filtering. To explain more precisely, let \(y_{i,t}\) be an observation of a univariate time series i at t, where \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\), and suppose that it has a continuous piecewise linear trend \(x_{i,t}\). As stated, the \(\ell _{1}\) trend filtering can be applied for estimating \(x_{i,t}\) from \(y_{i,t}\). In this paper, we consider the situation such that \(x_{i,t}\) can be expressed as

$$\begin{aligned} x_{i,t}=a_{i}x_{t},\quad i=1,\ldots ,n,\quad t=1,\ldots ,T, \end{aligned}$$
(1)

where \(x_{t}\) is a continuous piecewise linear trend and \(a_{i}\) is a loading coefficient. Given that (1) can be represented as

$$\begin{aligned} \begin{bmatrix} x_{1,t}\\ \vdots \\ x_{n,t}\\ \end{bmatrix} =\begin{bmatrix} a_{1}x_{t}\\ \vdots \\ a_{n}x_{t}\\ \end{bmatrix} =\begin{bmatrix} a_{1}\\ \vdots \\ a_{n}\\ \end{bmatrix} x_{t},\quad t=1,\ldots ,T, \end{aligned}$$
(2)

even though \(y_{1,t},\ldots ,y_{n,t}\) commonly have \(x_{t}\), their linear combination \(\beta _{1}y_{1,t}+\cdots +\beta _{n}y_{n,t}\) no longer has \(x_{t}\) if \([\beta _{1},\ldots ,\beta _{n}]'\) is a vector that belongs to the orthogonal complement of the space spanned by \([a_{1},\ldots ,a_{n}]'\). Hatanaka and Yamada (2003) referred to it as ‘co-trending.’Footnote 3

In this paper, by extending the \(\ell _{1}\) trend filtering, we develop a novel method to estimate \(x_{t}\) and \(a_{i}\) from \(y_{i,t}\). Recall that \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\), where n (resp. T) represents the number of univariate time series (resp. observations). We refer to the novel filtering method as ‘\(\ell _{1}\) common trend filtering.’ We provide an algorithm for estimating this and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also (i) numerically illustrate how well the algorithm works, (ii) provide an empirical illustration, and (iii) introduce a generalization of our novel method.

Here, we remark that (2) is not an unlikely model of trends in macroeconomic time series but has strong relevance. To explain more precisely, let \(y_{1,t},\ldots ,y_{n,t}\) be macroeconomic time series in natural logarithms and \(e_{1,t},\ldots ,e_{n,t}\) be such that

$$\begin{aligned} y_{i,t}=x_{i,t}+e_{i,t},\quad i=1,\ldots ,n,\quad t=1,\ldots ,T. \end{aligned}$$
(3)

Let \(g_{i,t}=\Delta y_{i,t}(=y_{i,t}-y_{i,t-1})\). Accordingly, \(g_{i,t}\) for \(i=1,\ldots ,n\) represent the growth rates of the original time series. Then, (2) and (3) are equivalent to

$$\begin{aligned} \Delta g_{i,t}=a_{i}b_{t}+v_{i,t},\quad i=1,\ldots ,n,\quad t=3,\ldots ,T, \end{aligned}$$
(4)

with initial conditions such as \(y_{i,1}=a_{i}x_{1}+e_{i,1}\) and \(\Delta y_{i,2}=a_{i}\Delta x_{2}+\Delta e_{i,2}\), where \(b_{t}=\Delta x_{t}-\Delta x_{t-1}\) and \(v_{i,t}=\Delta e_{i,t}-\Delta e_{i,t-1}\). Recall that \(\Delta g_{i,t}\) in (4) denotes the difference of growth rates of variable i at t. Given that \(x_{t}\) in (2) is a continuous piecewise linear trend, only a few of \(b_{3},\ldots ,b_{T}\) are not equal to 0. We may regard such nonzero \(b_{t}\)s in (4) as occasional permanent shocks that shift growth rates of multiple time series simultaneously and \(a_{i}\) for \(i=1,\ldots ,n\) in (4) represent individual reaction coefficients of the time series. A typical example of such occasional permanent shocks is the oil price shock in 1973. It is natural to consider that, at the time, the growth rates of macroeconomic time series changed simultaneously with their own reaction rates.

This paper is organized as follows. Section 2 introduces the novel filtering method and provides its reduced-rank-regression (RRR) representations. Section 3 discusses a numerical computation method for \(x_{t}\) and \(a_{i}\) in (1). Section 4 provides a clue to specify the tuning parameter of the procedure required for its application. Section 5 numerically illustrates how well the novel statistical procedure works. Section 6 includes an empirical illustration. Section 7 mentions a generalization of our method. Section 8 concludes the paper.

Notations Let \({\varvec{y}}_{i}=[y_{i,1},\ldots ,y_{i,T}]'\), \({\varvec{x}}_{i}=[x_{i,1},\ldots ,x_{i,T}]'\), \({\varvec{x}}=[x_{1},\ldots ,x_{T}]'\), \({\varvec{Y}}=[{\varvec{y}}_{1},\ldots ,{\varvec{y}}_{n}]\in \mathbb {R}^{T\times n}\), \({\varvec{I}}_{m}\) is the identity matrix of order m, \({\varvec{J}}=[{\varvec{0}},{\varvec{I}}_{T-2}]\in \mathbb {R}^{(T-2)\times T}\), \({\varvec{a}}=[a_{1},\ldots ,a_{n}]'\), and \({\varvec{D}}\in \mathbb {R}^{(T-2)\times T}\) be the second order difference matrix such that \({\varvec{D}}{\varvec{x}}_{i}=[\Delta ^{2}x_{i,3},\ldots ,\Delta ^{2}x_{i,T}]'\). Explicitly, \({\varvec{D}}\) is the \((T-2)\times T\) Toeplitz matrix of which the first and last rows are \([1,-2,1,0,\ldots ,0]\) and \([0,\ldots ,0,1,-2,1]\), respectively. In addition, let

(5)

Finally, for a vector \({\varvec{\gamma }}=[\gamma _{1},\ldots ,\gamma _{m}]'\), \(\Vert {\varvec{\gamma }}\Vert _{2}^{2}={\varvec{\gamma }}'{\varvec{\gamma }}=\sum _{t=1}^{m}\gamma _{t}^{2}\), \(\Vert {\varvec{\gamma }}\Vert _{1}=\sum _{t=1}^{m}|\gamma _{t}|\), \(\Vert {\varvec{\gamma }}\Vert _{\infty }=\max \{|\gamma _{1}|,\ldots ,|\gamma _{m}|\}\), and, for a matrix \({\varvec{\Gamma }}\in \mathbb {R}^{r\times s}\) whose (ij) entry is denoted by \(\gamma _{ij}\), \(\Vert {\varvec{\Gamma }}\Vert _{\mathrm {F}}^{2}=\sum _{i=1}^{r}\sum _{j=1}^{s}\gamma _{ij}^{2}\).

A small note (i) The null space \({\varvec{D}}\) is identical to the column space of \({\varvec{\Pi }}\) and accordingly \({\varvec{D}}{\varvec{\Pi }}={\varvec{0}}\), (ii) \({\varvec{\Psi }}\) is a right inverse of \({\varvec{D}}\), i.e., \({\varvec{D}}{\varvec{\Psi }}={\varvec{I}}_{T-2}\) (Paige and Trindade 2010), (iii) \({\mathsf {det}}({\varvec{X}})=1\) and thus \({\varvec{X}}\) is nonsingular, and (iv) given (1), we have \([{\varvec{x}}_{1},\ldots ,{\varvec{x}}_{n}]=[a_{1}{\varvec{x}},\ldots ,a_{n}{\varvec{x}}]={\varvec{x}}{\varvec{a}}'\in \mathbb {R}^{T\times n}\).

2 \(\ell _{1}\) Common Trend Filtering

2.1 \(\ell _{1}\) Trend Filtering

The \(\ell _{1}\) trend filtering is defined by

$$\begin{aligned} \min _{x_{i,1},\ldots ,x_{i,T}\in \mathbb {R}}\, \sum _{t=1}^{T}(y_{i,t}-x_{i,t})^{2}+\psi \sum _{t=3}^{T}|\Delta ^{2} x_{i,t}|, \end{aligned}$$
(6)

where \(\psi >0\) is a tuning parameter. In matrix notation, it is expressed as

$$\begin{aligned} \min _{{\varvec{x}}_{i}\in \mathbb {R}^{T}}\, \Vert {\varvec{y}}_{i}-{\varvec{x}}_{i}\Vert _{2}^{2}+\psi \Vert {\varvec{D}}{\varvec{x}}_{i}\Vert _{1}. \end{aligned}$$
(7)

2.2 \(\ell _{1}\) Common Trend Filtering

In this paper, we extend the \(\ell _{1}\) trend filtering so that we may estimate a common continuous piecewise linear trend of multiple time series, \(y_{1,t},\ldots ,y_{n,t}\). The filtering method we introduce in this paper is:

$$\begin{aligned} \min _{\begin{array}{c} x_{1},\ldots ,x_{T}\in \mathbb {R}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\sum _{t=1}^{T}(y_{i,t}-a_{i}x_{t})^{2}+\lambda \sum _{t=3}^{T}|\Delta ^{2} x_{t}|,\quad {\text {s.t.}}\,\sum _{i=1}^{n}a_{i}^{2}=1, \end{aligned}$$
(8)

where \(\lambda >0\) is a tuning parameter. We refer to the filtering method described by (8) as \(\ell _{1}\) common trend filtering. In matrix notation, the filtering is expressed as

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1},\quad {\text {s.t.}}\,\sum _{i=1}^{n}a_{i}^{2}=1. \end{aligned}$$
(9)

Furthermore, given that \(\sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}=\Vert {\varvec{Y}}-{\varvec{x}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}\) and \(\sum _{i=1}^{n}a_{i}^{2}=\Vert {\varvec{a}}\Vert _{2}^{2}\), (9) can be represented by

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ {\varvec{a}}\in \mathbb {R}^{n} \end{array}}\, \Vert {\varvec{Y}}-{\varvec{x}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$
(10)

This is an \(\ell _{1}\)-norm penalized RRR.

2.3 Another Representation

Let \({\varvec{b}}\in \mathbb {R}^{T}\) be a column vector such as \({\varvec{x}}={\varvec{X}}{\varvec{b}}\). Given that (i) \({\varvec{X}}\) is nonsingular and (ii) \({\varvec{D}}{\varvec{X}}={\varvec{J}}\) (Paige and Trindade 2010), we obtain another RRR representation of (10):

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{a}}\in \mathbb {R}^{n}\\ {\varvec{b}}\in \mathbb {R}^{T} \end{array}}\, f({\varvec{a}},{\varvec{b}})=\Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$
(11)

We remark that when \({\varvec{J}}{\varvec{b}}\) is sparse, \({\varvec{x}}={\varvec{X}}{\varvec{b}}\) represents a continuous piecewise linear trend.Footnote 4 In addition, interestingly, (11) is similar to Eq. (8) in Chen and Huang (2012).Footnote 5

3 Numerical Solution

3.1 Two Key Results

Given \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), the objective function in (11) can be represented as

$$\begin{aligned} \Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}&={\mathsf {tr}}({\varvec{Y}}'{\varvec{Y}})+{\mathsf {tr}}({\varvec{X}}{\varvec{b}}{\varvec{b}}'{\varvec{X}}') -2{\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} . \end{aligned}$$
(12)

3.1.1 The Case Where \({\varvec{b}}\in \mathbb {R}^{T}\) is Given

Suppose that \({\varvec{b}}\in \mathbb {R}^{T}\) is given. Then, \({\varvec{X}}{\varvec{b}}(={\varvec{x}})\in \mathbb {R}^{T}\) is a known column vector. Because both \({\mathsf {tr}}({\varvec{Y}}'{\varvec{Y}})\) and \({\mathsf {tr}}({\varvec{X}}{\varvec{b}}{\varvec{b}}'{\varvec{X}}')\) in (12) do not depend on \({\varvec{a}}\), when \({\varvec{b}}\in \mathbb {R}^{T}\) is given, (11) reduces to

$$\begin{aligned} \max _{{\varvec{a}}\in \mathbb {R}^{n}}\,{\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} ,\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$
(13)

We remark that, given \({\varvec{x}}={\varvec{X}}{\varvec{b}}\), it follows that \({\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} =\sum _{i=1}^{n}{\varvec{y}}_{i}'(a_{i}{\varvec{x}})\), which is quite reasonable as an objective function for estimating \(a_{1},\ldots ,a_{n}\). Moreover, letting \({\varvec{\phi }}={\varvec{Y}}'{\varvec{X}}{\varvec{b}}\in \mathbb {R}^{n}\), it follows that \({\mathsf {tr}}\left\{ {\varvec{Y}}'({\varvec{X}}{\varvec{b}}{\varvec{a}}')\right\} ={\varvec{\phi }}'{\varvec{a}}\). Therefore, instead of (13), we may consider the following constrained maximization problem:

$$\begin{aligned} \max _{{\varvec{a}}\in \mathbb {R}^{n}}\,g({\varvec{a}})={\varvec{\phi }}'{\varvec{a}},\quad {\text {s.t.}}\,\Vert {\varvec{a}}\Vert _{2}^{2}=1. \end{aligned}$$
(14)

Given \(\Vert {\varvec{a}}\Vert _{2}=1\), by the Cauchy–Schwarz inequality, we obtain \(g({\varvec{a}})={\varvec{\phi }}'{\varvec{a}}\le |{\varvec{\phi }}'{\varvec{a}}|\le \Vert {\varvec{\phi }}\Vert _{2}\Vert {\varvec{a}}\Vert _{2}=\Vert {\varvec{\phi }}\Vert _{2}\), from which we have \(g({\varvec{a}})<g(\widehat{{\varvec{a}}})\) if \({\varvec{a}}\ne \widehat{{\varvec{a}}}\), where \(\widehat{{\varvec{a}}} =({\varvec{\phi }}'{\varvec{\phi }})^{-1/2}{\varvec{\phi }} =({\varvec{b}}'{\varvec{X}}'{\varvec{Y}}{\varvec{Y}}'{\varvec{X}}{\varvec{b}})^{-1/2}{\varvec{Y}}'{\varvec{X}}{\varvec{b}}\). Consequently, given \({\varvec{b}}\in \mathbb {R}^{T}\), we have the following inequality:

$$\begin{aligned} f(\widehat{{\varvec{a}}},{\varvec{b}})< f({\varvec{a}},{\varvec{b}}) \end{aligned}$$
(15)

if \({\varvec{a}}\ne \widehat{{\varvec{a}}}\).

3.1.2 The Case Where \({\varvec{a}}\in \mathbb {R}^{n}\) is Given

Next, suppose that \({\varvec{a}}\in \mathbb {R}^{n}\) such that \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\) is given. Then, (11) reduces to

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\, \Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1}. \end{aligned}$$
(16)

Let \({\varvec{A}}_{\perp }\in \mathbb {R}^{n\times (n-1)}\) be a matrix of an orthonormal basis of the orthogonal complement of the space spanned by \({\varvec{a}}\). Then, given \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), \([{\varvec{a}},{\varvec{A}}_{\perp }]\in \mathbb {R}^{n\times n}\) is an orthogonal matrix. Thus, given that \( ({\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}')[{\varvec{a}},{\varvec{A}}_{\perp }] =[{\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}},{\varvec{Y}}{\varvec{A}}_{\perp }], \) it follows that \(\Vert {\varvec{Y}}-{\varvec{X}}{\varvec{b}}{\varvec{a}}'\Vert _{\mathrm {F}}^{2} =\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}+\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}\). Given that \(\Vert {\varvec{Y}}{\varvec{A}}_{\perp }\Vert _{\mathrm {F}}^{2}\) does not depend on \({\varvec{b}}\), (16) becomes

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2}+\lambda \Vert {\varvec{J}}{\varvec{b}}\Vert _{1}. \end{aligned}$$
(17)

We remark here that (17) can be represented as

$$\begin{aligned} \min _{{\varvec{x}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{x}}\Vert _{2}^{2}+\lambda \Vert {\varvec{D}}{\varvec{x}}\Vert _{1}. \end{aligned}$$
(18)

See, e.g., Eq. (9) in Kim et al. (2009).Footnote 6

As (17) is a problem whose objective function is coercive and strictly convex over \(\mathbb {R}^{T}\), it has a unique global minimizer. Thus, denoting the solution by \(\widehat{{\varvec{b}}}\), we have the following result. Given \({\varvec{a}}\in \mathbb {R}^{n}\) such that \(\Vert {\varvec{a}}\Vert _{2}^{2}=1\), we have the following inequality:

$$\begin{aligned} f({\varvec{a}},\widehat{{\varvec{b}}})< f({\varvec{a}},{\varvec{b}}) \end{aligned}$$
(19)

if \({\varvec{b}}\ne \widehat{{\varvec{b}}}\).

3.2 A Numerical Algorithm

Based on the above two inequalities, (15) and (19), we introduce a numerical algorithm. Given \(\widehat{{\varvec{a}}}_{1}\in \mathbb {R}^{n}\) and \(\widehat{{\varvec{b}}}_{1}\in \mathbb {R}^{T}\), for \(i\in \mathbb {N}\), we define \(\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i+1}\) by

$$\begin{aligned}&\widehat{{\varvec{a}}}_{i+1} =(\widehat{{\varvec{b}}}_{i}'{\varvec{X}}'{\varvec{Y}}{\varvec{Y}}'{\varvec{X}} \widehat{{\varvec{b}}}_{i})^{-1/2}{\varvec{Y}}'{\varvec{X}}\widehat{{\varvec{b}}}_{i}, \end{aligned}$$
(20)
$$\begin{aligned}&\widehat{{\varvec{b}}}_{i+1}={\mathrm {arg\,}}\min _{{\varvec{b}}_{i+1}\in \mathbb {R}^{T}}\, \Vert {\varvec{Y}}\widehat{{\varvec{a}}}_{i+1}-{\varvec{X}}{\varvec{b}}_{i+1}\Vert _{2}^{2} +\lambda \Vert {\varvec{J}}{\varvec{b}}_{i+1}\Vert _{1}. \end{aligned}$$
(21)

Then, we have the following result.

Lemma 3.1

For \(i\in \mathbb {N}\), it follows that \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\).

Proof

(i) From (15) and (20), we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f({\varvec{a}},\widehat{{\varvec{b}}}_{i})\) for any \({\varvec{a}}\in \mathbb {R}^{n}\) and we thus have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\). (ii) Likewise, from (19) and (21), we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},{\varvec{b}})\) for any \({\varvec{b}}\in \mathbb {R}^{T}\) and we thus have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\). (iii) Combining these inequalities, we have \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i}) \le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), which leads to \(f(\widehat{{\varvec{a}}}_{i+1},\widehat{{\varvec{b}}}_{i+1})\le f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\), where the equality holds only if \(\widehat{{\varvec{a}}}_{i}=\widehat{{\varvec{a}}}_{i+1}\) and \(\widehat{{\varvec{b}}}_{i}=\widehat{{\varvec{b}}}_{i+1}\). \(\square \)

Given Lemma 3.1, we have the following result.

Proposition 3.2

Let \(f_{i}=f(\widehat{{\varvec{a}}}_{i},\widehat{{\varvec{b}}}_{i})\) for \(i\in \mathbb {N}\). A sequence of real numbers, \((f_{i})_{i\in \mathbb {N}}\), has a finite limit.

Proof

From Lemma 3.1, \((f_{i})_{i\in \mathbb {N}}\) is a nonincreasing sequence. In addition, as \(f_{i}\ge 0\) for \(i\in \mathbb {N}\), it is bounded below. Consequently, it has a finite limit. \(\square \)

Proposition 3.2 implies that the objective function in (11) converges by alternatively minimizing it over \({\varvec{a}}\) and \({\varvec{b}}\). Denote \({\varvec{a}}\) and \({\varvec{b}}\) such that the objective function in (11) is converged by \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\). A Matlab user-defined function for estimating \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\) and \(\lambda \), l1_common_trend_filter, is provided in the supplementary material.

4 A Clue for Specifying the Tuning Parameter

Applying the \(\ell _{1}\) common trend filtering requires the specification of its tuning parameter. In this section, we provide a clue for specifying it.

Consider the following convex problem:

$$\begin{aligned} \min _{{\varvec{b}}\in \mathbb {R}^{T}}\,h({\varvec{b}})=\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{X}}{\varvec{b}}\Vert _{2}^{2},\quad {\text {s.t.}}\,\Vert {\varvec{J}}{\varvec{b}}\Vert _{1}\le c, \end{aligned}$$
(22)

where \(c>0\) and \({\varvec{a}}\in \mathbb {R}^{n}\) is a given vector. Given that \({\varvec{X}}\) is nonsingular and \(h({\varvec{b}})\ge 0\), if \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) is feasible, i.e., \(\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\le c\), then \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) is the solution of the above convex problem. Here, recall \({\varvec{J}}{\varvec{X}}^{-1}={\varvec{D}}\). In the case, \(h({\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}})=0\). If it is not feasible, i.e., \(\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}>c\), then \({\varvec{b}}={\varvec{X}}^{-1}{\varvec{Y}}{\varvec{a}}\) cannot be the solution. In the case, the solution, denoted by \(\widetilde{{\varvec{b}}}\), locates at the boundary. Thus, we have \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c\).

More precisely, concerning \(\widetilde{{\varvec{b}}}\), we have the following results.

Proposition 4.1

If \(0<c<\Vert {\varvec{D}}{\varvec{Y}}{\varvec{a}}\Vert _{1}\), then (i) \(\widetilde{{\varvec{b}}}\) equals \(\widehat{{\varvec{b}}}\) estimated by (16)/(17) with

$$\begin{aligned} \lambda =\frac{2({\varvec{X}}\widetilde{{\varvec{b}}})'({\varvec{Y}}{\varvec{a}}-{\varvec{X}}\widetilde{{\varvec{b}}})}{\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}} \end{aligned}$$
(23)

and (ii) \(\Vert {\varvec{J}}\widetilde{{\varvec{b}}}\Vert _{1}=c\) holds.

Proof

See Section A.4 in the Appendix. \(\square \)

Proposition 4.1 implies that we can obtain \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) by specifying c in (22) instead of \(\lambda \) in (17). A Matlab user-defined function for calculating \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\) and c, l1_common_trend_filter_c, is provided in the supplementary material. Here, we point out that specifying c is much easier than specifying \(\lambda \). We do not have any useful information for specifying \(\lambda \), whereas we have such an information for specifying c. As stated in Proposition 4.1, we may estimate \(\widehat{{\varvec{b}}}\) such that

$$\begin{aligned} c=\Vert {\varvec{J}}\widehat{{\varvec{b}}}\Vert _{1}=\sum _{t=3}^{T}|\widehat{b}_{t}|. \end{aligned}$$
(24)

We may utilize this relation for specifying c. See, e.g., (A.1). In the case,

$$\begin{aligned} x_{t}-x_{t-1} = {\left\{ \begin{array}{ll} b_{2}, &{} \text {if }t=2,3,\\ b_{2}+b_{4}, &{} \text {if }t=4,5,\\ b_{2}+b_{4}+b_{6}, &{} \text {if }t=6,7. \end{array}\right. } \end{aligned}$$
(25)

Thus, we may specify rough range of c from the plots of first differences of multiple time series.Footnote 7

Finally, given \({\varvec{X}}{\varvec{b}}={\varvec{x}}\) and \({\varvec{J}}{\varvec{b}}={\varvec{D}}{\varvec{x}}\), we remark that the convex problem (22) may be replaced with the following convex problem:

$$\begin{aligned} \min _{{\varvec{x}}\in \mathbb {R}^{T}}\,\Vert {\varvec{Y}}{\varvec{a}}-{\varvec{x}}\Vert _{2}^{2},\quad {\text {s.t.}}\,\Vert {\varvec{D}}{\varvec{x}}\Vert _{1}\le c, \end{aligned}$$
(26)

where \(c>0\).

5 Numerical Illustrations

In this section, we numerically illustrate how well the algorithm described in the last section works. Figure 1 plots generated \({\varvec{X}}{\varvec{b}}\), where \(T=100\), \(b_{1}=10\), \(b_{2}=0.5\), and \(\sum _{t=3}^{T}|b_{t}|=2.7243\). Ten bullets in the figure depict the kink points and accordingly, 10 entries of \({\varvec{J}}{\varvec{b}}=[b_{3},\ldots ,b_{T}]'\) are not equal to 0. Using \({\varvec{X}}{\varvec{b}}\) shown in Fig. 1, we generated \({\varvec{y}}_{1}\), \({\varvec{y}}_{2}\), and \({\varvec{y}}_{3}\) by

$$\begin{aligned} {[}{\varvec{y}}_{1},{\varvec{y}}_{2},{\varvec{y}}_{3}]={\varvec{X}}{\varvec{b}}{\varvec{a}}'+{\varvec{E}}, \end{aligned}$$
(27)

where \({\varvec{a}}=[1,0.6,0.2]'\) and \(\mathrm {vec}({\varvec{E}})\sim \mathrm {N}({\varvec{0}},\sigma ^{2}{\varvec{I}}_{3T})\) with \(\sigma =1\). Figure 2 plots them.

For obtaining \(\widehat{{\varvec{a}}}\) and \(\widehat{{\varvec{b}}}\), we used l1_common_trend_filter_c, which is presented in the supplementary material, with \(c=3\). This required three iterations for convergence. As a result, we obtained

$$\begin{aligned} \widehat{{\varvec{a}}}=\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2}\\ \widehat{a}_{3}\\ \end{bmatrix} =\begin{bmatrix} 0.8455\\ 0.5066\\ 0.1688\\ \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \widehat{a}_{3}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.5991\\ 0.1996\\ \end{bmatrix} \approx \begin{bmatrix} 1\\ 0.6\\ 0.2\\ \end{bmatrix}, \end{aligned}$$
(28)

and \(\widehat{{\varvec{b}}}\) such that \(\Vert {\varvec{J}}\widehat{{\varvec{b}}}\Vert _{1}=3\), the latter of which is consistent with Proposition 4.1(ii). The value of \(\lambda \) for obtaining \(\widehat{{\varvec{b}}}\) from \({\varvec{Y}}\widehat{{\varvec{a}}}\) is 10.3730.

Figure 3 illustrates the results. The solid line in Fig. 3 plots the estimated \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\). The dashed line in the figure plots \(a_{1}{\varvec{X}}{\varvec{b}}\). Note that, given \(a_{1}=1\), it is identical to the solid line depicted in Fig. 1. From the figure, we can see that \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\) looks very much like \(a_{1}{\varvec{X}}{\varvec{b}}\). Figure 4 also illustrates the results. The solid lines in Fig. 4 are identical to those in Fig. 2. The dashed lines on y\(_{1}\), y\(_{2}\) and y\(_{3}\) respectively plot \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\), \(\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}\), and \(\widehat{a}_{3}{\varvec{X}}\widehat{{\varvec{b}}}\). Again, the figure shows that our novel procedure works well.

As a supplementary examination, we generated an additional data set, repeated the same analysis, and revealed similar results. For example, we obtained

$$\begin{aligned} \widehat{{\varvec{a}}}=\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2}\\ \widehat{a}_{3}\\ \end{bmatrix} =\begin{bmatrix} 0.8462\\ 0.5060\\ 0.1670\\ \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \widehat{a}_{3}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.5980\\ 0.1974\\ \end{bmatrix} \approx \begin{bmatrix} 1\\ 0.6\\ 0.2\\ \end{bmatrix}. \end{aligned}$$
(29)

See also Figures B.1–B.4 in the supplementary material.

6 An Empirical Illustration

Figure 5, which is identical to Figure 1.2 of Hatanaka and Yamada (2003), plots two quarterly macroeconomic time series. More precisely, the upper [resp. lower] panel of the figure depicts the natural logarithm of Japanese M2\(+\)CD [resp. real gross domestic product (GDP)], from the first quarter of 1980 to the third quarter of 2001. From the figure, we can observe that these two time series seem to contain a common piecewise linear trend such that a major kink point is located at around 1991, which corresponds to the peak of the Japanese asset price bubble. Actually, the statistical procedure developed by Hatanaka and Yamada (2003) detected a common piecewise linear trend. [See Section 8.5 of Hatanaka and Yamada (2003).]

Figure 6 depicts the corresponding demeaned series. We estimated a common piecewise linear trend of these demeaned data. Denote it by \({\varvec{X}}\widehat{{\varvec{b}}}\). For the estimation, we used l1_common_trend_filter_c with \(c=0.018\). We specified the value of c by reference to the plots of time series in first differences shown in Fig. 7. The upper panel (resp. lower panel) of Fig. 8 depicts \({\varvec{X}}\widehat{{\varvec{b}}}\) (resp. \(\widehat{b}_{2},\ldots ,\widehat{b}_{T}\), where \(T=87\)). From the panels, we may observe that (i) a major kink point is located at around 1991 and (ii) \(\sum _{t=3}^{T}|\widehat{b}_{t}|\) equals the value of c. The solid lines in Fig. 9 are identical to those plotted in Fig. 6. The dashed line in the upper (resp. lower) panel plots \(\widehat{a}_{1}{\varvec{X}}\widehat{{\varvec{b}}}\) (resp. \(\widehat{a}_{2}{\varvec{X}}\widehat{{\varvec{b}}}\)), where

$$\begin{aligned} \widehat{{\varvec{a}}} =\begin{bmatrix} \widehat{a}_{1}\\ \widehat{a}_{2} \end{bmatrix} =\begin{bmatrix} 0.8962\\ 0.4437 \end{bmatrix},\quad \widehat{{\varvec{a}}}/\widehat{a}_{1}= \begin{bmatrix} \widehat{a}_{1}/\widehat{a}_{1}\\ \widehat{a}_{2}/\widehat{a}_{1}\\ \end{bmatrix} =\begin{bmatrix} 1\\ 0.4951 \end{bmatrix}. \end{aligned}$$

Finally, the dashed lines in Fig. 10 are the mean-restored estimated piecewise linear trends. The solid lines in the figure are identical to those in Fig. 5.

7 A Generalization

In this section, we mention a generalization of our method briefly. Let \({\varvec{D}}_{p}\in \mathbb {R}^{(T-p)\times T}\) be the p-th order difference matrix such that \({\varvec{D}}_{p}{\varvec{x}}_{i}=[\Delta ^{p}x_{i,p+1},\ldots ,\Delta ^{p}x_{i,T}]'\). Explicitly, \({\varvec{D}}_{p}\) is a Toeplitz matrix as follows:

$$\begin{aligned} {\varvec{D}}_{p}=\begin{bmatrix} a_{0} &{} \cdots &{} a_{p} &{} 0 &{} \cdots &{} 0 \\ 0 &{} \ddots &{} &{}\ddots &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} &{} \ddots &{} 0 \\ 0 &{} \cdots &{} 0 &{} a_{0} &{} \cdots &{} a_{p} \\ \end{bmatrix}, \end{aligned}$$
(30)

where \(a_{k}=(-1)^{p-k}\left( {\begin{array}{c}p\\ k\end{array}}\right) \) for \(k=0,\ldots ,p\). Accordingly, \({\varvec{D}}_{2}\) equals \({\varvec{D}}\). Then, without any difficulty, we may extend our procedure to

$$\begin{aligned} \min _{\begin{array}{c} {\varvec{x}}\in \mathbb {R}^{T}\\ a_{1},\ldots ,a_{n}\in \mathbb {R} \end{array}}\, \sum _{i=1}^{n}\Vert {\varvec{y}}_{i}-a_{i}{\varvec{x}}\Vert _{2}^{2}+\lambda _{p}\Vert {\varvec{D}}_{p}{\varvec{x}}\Vert _{1}, \quad {\text {s.t.}}\,&\sum _{i=1}^{n}a_{i}^{2}=1, \end{aligned}$$
(31)

where \(p\in \mathbb {N}\) and \(\lambda _{p}>0\) is a tuning parameter. We refer to (31) as ‘\(\ell _{1}\) common polynomial trend filtering.’ The solution of the problem represents a continuous piecewise \((p-1)\)-th order polynomial trend. Note that the corresponding \({\varvec{X}}_{p}=[{\varvec{\Pi }}_{p},{\varvec{\Psi }}_{p}]\) such that \({\varvec{D}}_{p}{\varvec{X}}_{p}=[{\varvec{D}}_{p}{\varvec{\Pi }}_{p},{\varvec{D}}_{p} {\varvec{\Psi }}_{p}]=[{\varvec{0}},{\varvec{I}}_{T-p}]\in \mathbb {R}^{(T-p)\times T}\) is given in Yamada (2015).

8 Concluding Remarks

In this paper, we developed an extension of the \(\ell _{1}\) trend filtering. The \(\ell _{1}\) trend filtering can estimate a continuous piecewise linear trend of univariate time series. However, it cannot estimate a common continuous piecewise linear trend of multiple time series, which the novel statistical procedure developed in this paper enables. We provided an algorithm and a clue to specify the tuning parameter of the procedure, both of which are required for its application. We also numerically illustrated how well the algorithm works, provided an empirical illustration, and introduced a generalization of our novel method.