1 Introduction

This work is motivated by recent papers in nearly-constrained estimation in several dimensions, and by the papers in generalised penalized least squared regression. The subject of penalized estimators starts with \(L_{1}\)-penalisation, cf. Tibshirani (1996), which is called lasso signal approximation, and \(L_{2}\)-penalisation, which is usually addressed as ridge regression (Hoerl and Kennard 1970) or sometimes as Tikhonov–Philips regularization (Phillips 1962; Tikhonov et al. 1995). The first generalisation of lasso is \(L_{1}\)-penalisation imposed on the successive differences of the coefficients. For a given sequence of data points \({\varvec{y}} \in {\mathbb {R}}^{n}\), the fusion approximator (cf. Rinaldo 2009) is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} +\lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|.\nonumber \\ \end{aligned}$$
(1)

The combination of fusion approximator and lasso is called fused lasso estimator and is given by:

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L}){} & {} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}\nonumber \\{} & {} \quad +\lambda _{F}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}| + \lambda _{L} ||{\varvec{\beta }}||_{1}. \end{aligned}$$
(2)

The fused lasso was introduced in Tibshirani et al. (2005), and its asymptotic properties were studied in detail in Rinaldo (2009). Also, it is worth to note that in the paper Tibshirani and Taylor (2011) the estimator in (1) is called the fused lasso, while the estimator in (2) is addressed as the sparse fused lasso.

In the area of constrained inference the basic and simplest problem is isotonic regression in onedimension. For a given sequence of data points \({\varvec{y}} \in {\mathbb {R}}^{n}\), the isotonic regression is the following approximation

$$\begin{aligned} \hat{{\varvec{\beta }}}^{I} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min }||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}, \ \text {subject to} \ \beta _{1} \le \beta _{2} \le \dots \le \beta _{n},\nonumber \\ \end{aligned}$$
(3)

i.e. it is \(\ell ^{2}\)-projection of the vector \({\varvec{y}}\) onto the set of non-increasing vectors in \({\mathbb {R}}^{n}\). The notion of isotonic “regression” in this context might be confusing. Nevertheless, it is a standard notion in this subject, cf., for example, the papers Best and Chakravarti (1990); Stout (2013), where the notation “isotonic regression” is used for the isotonic projection of a general vector. Also, in this paper we use the notations “regression”, “estimator” and “approximator” interchangeably. A general introduction to isotonic regression can be found, for example, in Robertson et al. (1988).

The nearly-isotonic regression, introduced in Tibshirani et al. (2011) and studied in detail in Minami (2020), is a less restrictive version of isotonic regression, and it is given by the following optimization problem

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI})= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} + \lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}, \end{aligned}$$
(4)

where \(x_{+} = x \cdot 1\{x > 0 \}\).

In this paper we combine the fused lasso estimator with nearly-isotonic regression and call the resulting estimator as fused lasso nearly-isotonic signal approximator, i.e. for a given sequence of data points \({\varvec{y}} \in {\mathbb {R}}^{n}\), the problem in one-dimensional case is the following optimization:

$$\begin{aligned} \begin{aligned}&\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\\&\quad = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}+ \lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}| \\&\qquad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}. \end{aligned} \end{aligned}$$
(5)

Also, in the case of \(\lambda _{F} \ne 0\) and \(\lambda _{NI} \ne 0\) with \(\lambda _{L} = 0\) we call the estimator fused nearly-isotonic regression, i.e.

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})\equiv & {} {} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) \nonumber \\= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}\nonumber \\{} & {} + \lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|\nonumber \\{} & {} +\lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}. \end{aligned}$$
(6)

This generalisation of nearly-isotonic regression in (6) was proposed in the conclusion of the paper Tibshirani et al. (2011). Next, a one-dimensional fused nearly-isotonic regression was considered and numerically solved in Yu et al. (2022) with time complexity \({\mathcal {O}}(n)\). Nevertheless, first, in this paper we consider and solve the problem in general dimensions. Second, for fixed penalisation parameters in a one-dimensional case we also provide a solution with linear complexity and an exact partly path solution (when one of the parameters is fixed and the path is with respect to the other one) with complexity \({\mathcal {O}}(n\log (n))\).

It is also worth to mention the paper Gómez et al. (2022), where the authors studied the nearly-isotonic approximator with extra penalisation term

$$\begin{aligned} (\beta _{i} - \beta _{i+1})^{2}\cdot 1\{(\beta _{i} - \beta _{i+1}) > 0 \} \end{aligned}$$

with an additional lasso penalty. Also, in the paper Gaines et al. (2018) the authors did a comparison of the algorithms to solve the lasso with linear constraints, which is called constrained lasso.

In the next step we state the problem defined in (5) for the general case of isotonic constraints with respect to a general partial order. First, we have to introduce the notation.

1.1 Notation

We start with basic definitions of partial order and isotonic regression. Let \({\mathcal {I}} = \{{\varvec{i}}_{1}, \dots , {\varvec{i}}_{n}\}\) be some index set. Next, we define the following binary relation \(\preceq \) on \({\mathcal {I}}\).

A binary relation \(\preceq \) on \({\mathcal {I}}\) is called partial order if

  • it is reflexive, i.e. \({\varvec{j}}\preceq {\varvec{j}}\) for all \({\varvec{j}} \in {\mathcal {I}}\);

  • it is transitive, i.e. \({\varvec{j}}_{1}, {\varvec{j}}_{2}, {\varvec{j}}_{3} \in {\mathcal {I}}\), \({\varvec{j}}_{1} \preceq {\varvec{j}}_{2}\) and \({\varvec{j}}_{2} \preceq {\varvec{j}}_{3}\) imply \({\varvec{j}}_{1} \preceq {\varvec{j}}_{3}\);

  • it is antisymmetric, i.e. \({\varvec{j}}_{1}, {\varvec{j}}_{2} \in {\mathcal {I}}\), \({\varvec{j}}_{1} \preceq {\varvec{j}}_{2}\) and \({\varvec{j}}_{2} \preceq {\varvec{j}}_{1}\) imply \({\varvec{j}}_{1} = {\varvec{j}}_{2}\).

Further, a vector \({\varvec{\beta }}\in {\mathbb {R}}^{n}\) indexed by \({\mathcal {I}}\) is called isotonic with respect to the partial order \(\preceq \) on \({\mathcal {I}}\) if \({\varvec{j}}_{1} \preceq {\varvec{j}}_{2}\) implies \(\beta _{{\varvec{j}}_{1}} \le \beta _{{\varvec{j}}_{2}}\). We denote the set of all isotonic vectors in \({\mathbb {R}}^{n}\) with respect to the partial order \(\preceq \) on \({\mathcal {I}}\) by \({\varvec{{\mathcal {B}}}}^{is}\), which is a closed convex cone in \({\mathbb {R}}^{n}\) and it is also called isotonic cone. Next, a vector \({\varvec{\beta }}^{I}\in {\mathbb {R}}^{n}\) is the isotonic regression of an arbitrary vector \({\varvec{y}} \in {\mathbb {R}}^{n}\) over the pre-ordered index set \({\mathcal {I}}\) if

$$\begin{aligned} {\varvec{\beta }}^{I} = \underset{{\varvec{\beta }} \in {\varvec{{\mathcal {B}}}}^{is}}{\arg \min } \sum _{{\varvec{j}} \in {\mathcal {I}}}(\beta _{{\varvec{j}}} - y_{{\varvec{j}}})^{2}. \end{aligned}$$
(7)

For any partial order relation \(\preceq \) on \({\mathcal {I}}\) there exists a directed graph \(G = (V,E)\), with \(V = {\mathcal {I}}\) and E is the minimal set of edges such that

$$\begin{aligned}{} & {} E = \{({\varvec{j}}_{1}, {\varvec{j}}_{2}), \, \text {where} \, ({\varvec{j_{1}}},{\varvec{j_{2}}}) \,\nonumber \\{} & {} \quad \text { is the ordered pair of vertices from } \, {\mathcal {I}}\}, \end{aligned}$$
(8)

and an arbitrary vector \({\varvec{\beta }} \in {\mathbb {R}}^{n}\) is isotonic with respect to \(\preceq \) iff \(\beta _{{\varvec{l_{1}}}} \le \beta _{{\varvec{l_{2}}}}\), given that E contains the chain of edges from \({\varvec{l}}_{1} \in V\) to \({\varvec{l}}_{2} \in V\).

Now we can generalise the estimators discussed above. First, equivalent to the definition in (7), a vector \({\varvec{\beta }}^{I}\in {\mathbb {R}}^{n}\) is the isotonic regression of an arbitrary vector \({\varvec{y}} \in {\mathbb {R}}^{n}\) indexed by the partially ordered index set \({\mathcal {I}}\) if

$$\begin{aligned} {\varvec{\beta }}^{I} = \underset{{\varvec{\beta }}}{\arg \min } \sum _{{\varvec{j}} \in {\mathcal {I}}}(\beta _{{\varvec{j}}} - y_{{\varvec{j}}})^{2}, \end{aligned}$$
(9)

subject to \(\beta _{{\varvec{l_{1}}}} \le \beta _{{\varvec{l_{2}}}}\), whenever E contains the chain of edges from \({\varvec{l}}_{1} \in V\) to \({\varvec{l}}_{2} \in V\).

Second, for the directed graph \(G = (V, E)\), which corresponds to the partial order \(\preceq \) on \({\mathcal {I}}\), the nearly-isotonic regression of \({\varvec{y}}\in {\mathbb {R}}^{n}\) indexed by \({\mathcal {I}}\) is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI})= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}. \end{aligned}$$
(10)

This generalisation of nearly-isotonic regression was introduced and studied in Minami (2020).

Next, fused and fused lasso approximators for a general directed graph \(G = (V, E)\) are given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} +\lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|,\nonumber \\ \end{aligned}$$
(11)

and

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L}){} & {} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} \quad +\lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}| + \lambda _{L} ||{\varvec{\beta }}||_{1}. \end{aligned}$$
(12)

These optimization problems were introduced and solved for a general graph in Friedman et al. (2007); Hoefling (2010); Tibshirani and Taylor (2011).

Further, let D denote the oriented incidence matrix for the directed graph \(G = (V, E)\), corresponding to \(\preceq \) on \({\mathcal {I}}\). We choose the orientation of D in the following way. Assume that the graph G with n vertexes has m edges. Next, assume we label the vertexes by \(\{1, \dots , n\}\) and edges by \(\{1, \dots , m\}\). Then, D is \(m\times n\) matrix with

$$\begin{aligned} D_{i,j} = {\left\{ \begin{array}{ll} 1, &{} \quad \hbox { if vertex } j \hbox { is the source of edge } i, \\ -1, &{} \quad \hbox { if vertex } j \hbox { is the target of edge } i,\\ 0, &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(13)

In order to clarify the notations we consider the following examples of partial order relation. First, let us consider the monotonic order relation in the one-dimensional case. Let \({\mathcal {I}} = \{1, \dots , n\}\), and for \(j_{1} \in {\mathcal {I}}\) and \(j_{2} \in {\mathcal {I}}\) we naturally define \(j_{1}\preceq j_{2}\) if \(j_{1} \le j_{2}\). Further, if we let \(V = {\mathcal {I}}\) and \(E = \{(i, i+1): i = 1, \dots , n-1 \}\), then \(G = (V, E)\) is the directed graph which corresponds to the one-dimensional order relation on \({\mathcal {I}}\). Figure 1 displays the graph and the incidence matrix for the graph.

Fig. 1
figure 1

Graph for monotonic contstraints and oriented incidence matrix

Next, we consider two dimensional case with bimonotonic constraints. The notion of bimonotonicity was first introduced in Beran and Dümbgen (2010) and it means the following. Let us consider the index set

$$\begin{aligned}{} & {} {\mathcal {I}} = \{ {\varvec{i}}= (i^{(1)},i^{(2)}): \, i^{(1)}=1,2,\dots , n_{1}, \\{} & {} i^{(2)}=1,2,\dots , n_{2}\} \end{aligned}$$

with the following order relation \(\preceq \) on it: for \({\varvec{j}}_{1}, {\varvec{j}}_{2}\in {\mathcal {I}}\) we have \({\varvec{j}}_{1} \preceq {\varvec{j}}_{2}\) iff \(j^{(1)}_{1} \le j^{(1)}_{2}\) and \(j^{(2)}_{1} \le j^{(2)}_{2}\). Then, a vector \({\varvec{\beta }}\in {\mathbb {R}}^{n}\), with \(n=n_{1}n_{2}\), indexed by \({\mathcal {I}}\) is called bimonotone if it is isotonic with respect to bimonotone order \(\preceq \) defined on its index \({\mathcal {I}}\). Further, we define the directed graph \(G = (V, E)\) with vertexes \(V = {\mathcal {I}}\), and the edges

$$\begin{aligned} \begin{aligned} E ={}&\{((l, k),(l, k+1) ): \, 1 \le l \le n_{1}, 1 \le k \le n_{2} - 1\}\\ \cup \,&\{((l, k),(l+1, k) ): \, 1 \le l \le n_{1}-1, 1 \le k \le n_{2} \}. \end{aligned} \end{aligned}$$

The labeled directed graph for bimonotone constraints and its incidence matrix are displayed in Fig. 2.

Fig. 2
figure 2

Graph for bimonotonic contstraints and oriented incidence matrix

1.2 General statement of the problem

Now we can state the general problem studied in this paper. Let \({\varvec{y}} \in {\mathbb {R}}^{n}\) be a signal indexed by the index set \({\mathcal {I}}\) with the partial order relation \(\preceq \) defined on \({\mathcal {I}}\). Next, let \(G=(V,E)\) be the directed graph corresponding to \(\preceq \) on \({\mathcal {I}}\). The fused lasso nearly-isotonic signal approximation with respect to \(\preceq \) on \({\mathcal {I}}\) (or, equivalently, to the directed graph \(G = (V, E)\), corresponding to \(\preceq \)) is given by

$$\begin{aligned} \begin{aligned}&\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\\&\quad = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} + \lambda _{F} \sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}| \\&\qquad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}. \end{aligned} \end{aligned}$$
(14)

Therefore, the estimator in (14) is a combination of the estimators in (10) and (12).

Equivalently, we can rewrite the problem in the following way:

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})&= \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \\&\quad + \lambda _{F} ||D{\varvec{\beta }}||_{1}\\&\quad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}||D{\varvec{\beta }}||_{+}, \end{aligned} \end{aligned}$$
(15)

where D is the oriented incidence matrix of the graph \(G =(V, E)\). Here, we clarify that in the case of penalisation with the incidence matrix D we assume that \({\varvec{\beta }}\) is indexed according to the indexing of the edges in the graph \(G =(V, E)\). Analogously to the definition in the one-dimensional case, if \(\lambda _{L} = 0\) we call the estimator fused nearly-isotonic approximator and denote it by \(\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})\).

Here, it is worth to mention recent papers in constrained estimation (Deng and Zhang 2020; Han et al. 2019; Han and Zhang 2020), where the authors studied the asymptotic properties of the isotonic regression in general dimensions. Also, in paper Wang et al. (2015) \(\ell _{1}\)-trend filtering was generalised for the case of a general graph.

1.3 Organisation of the paper

The rest of the paper is organized as follows. In Sect. 2 we provide the numerical solution to the fused lasso nearly-isotonic signal approximator. Section 3 is dedicated to the theoretical properties of the estimator. We show how the solutions to the fused lasso nearly-isotonic regression, fused lasso, and nearly-isotonic regression are related to each other. Also, we prove that in the one-dimensional case the new estimator has agglomerative property and the procedures of near-isotonisation and fusion can be swapped and provide the solution to the original problem. Next, in Sect. 4 we derive the unbiased estimator of the degrees of freedom of the estimator. Furthermore, in Sect. 5 we discuss the computational aspects, do the simulation study and show that the estimator is computationally feasible for moderately large data sets. Also, we illustrate the usage of the estimator for the real data set. The article closes with a conclusion and a discussion of possible generalisations in Sect. 6. The proofs of all results are given in the Appendix. The R and Python implementations of the estimator are available upon request.

2 Solution to the fused lasso nearly-isotonic signal approximator

First, we consider fused nearly-isotonic regression, i.e. in (15) we assume that \(\lambda _{L} = 0\).

Theorem 1

For a fixed data vector \({\varvec{y}} \in {\mathbb {R}}^{n}\) indexed by the index set \({\mathcal {I}}\) with the partial order relation \(\preceq \) defined on \({\mathcal {I}}\), the solution to the fused nearly-isotonic problem in (15) is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{\nu }}}(\lambda _{F}, \lambda _{NI}) \end{aligned}$$
(16)

with

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }} \in {\mathbb {R}}^{m}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2} \nonumber \\{} & {} \quad \text {s. t.} \quad - \lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \end{aligned}$$
(17)

where D is the incidence matrix of the directed graph \(G = (V, E)\) with n vertices and m edges corresponding to \(\preceq \) on \({\mathcal {I}}\), \({\varvec{1}}\in {\mathbb {R}}^{m}\) is the vector whose all elements are equal to 1 and the notation \({\varvec{a}} \le {\varvec{b}}\) for vectors \({\varvec{a}},{\varvec{b}} \in {\mathbb {R}}^{m}\) means \(a_{i} \le b_{i}\) for all \(i = 1, \dots , m\).

Next, we provide the solution to the fused lasso nearly-isotonic regression.

Theorem 2

For a given vector \({\varvec{y}}\) indexed by \({\mathcal {I}}\), the solution to the fused lasso nearly-isotonic signal approximator \(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})\) is given by soft thresholding the fused nearly-isotonic regression \(\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}},\lambda _{F}, \lambda _{NI})\), i.e.

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI}) \nonumber \\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) - \lambda _{L}, &{} \,\text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \ge \lambda _{L}, \\ 0, &{} \, \text { if } \, |{\hat{\beta }}^{FNI}_{{\varvec{i}}}| \le \lambda _{L},\\ {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) + \lambda _{L}, &{}\, \text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \le -\lambda _{L}, \end{array}\right. } \end{aligned}$$
(18)

for \({{\varvec{i}}}\in {\mathcal {I}}\).

From this result we can conclude that adding lasso penalisation does not add much to the computational complexity of the solution. The computational aspects of fused nearly-isotonic approximator will be discussed in Sect. 5 below. In the next section we discuss properties of the fused lasso nearly-isotonic regression.

3 Properties of the fused lasso nearly-isotonic signal approximator

We start with a proposition which shows how the solutions to the optimization problems (11), (10) and (15) are related to each other. This result will be used in the next section to derive degrees of freedom of the fused lasso nearly-isotonic signal approximator.

Proposition 3

For a fixed data vector \({\varvec{y}}\) indexed by \({\mathcal {I}}\) and penalisation parameters \(\lambda _{NI}\) and \(\lambda _{F}\) the following relations between estimators \(\hat{{\varvec{\beta }}}^{F}\), \(\hat{{\varvec{\beta }}}^{NI}\) and \(\hat{{\varvec{\beta }}}^{FNI}\) hold

$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{F}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI}\right) , \end{aligned}$$
(19)
$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{NI}({\varvec{y}} + \lambda _{F}D^{T}{\varvec{1}}, \lambda _{NI} + 2\lambda _{F})\nonumber \\{} & {} \qquad \qquad \qquad \qquad \qquad \quad = \hat{{\varvec{\beta }}}^{F}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) \nonumber \\ \end{aligned}$$
(20)

and

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})= & {} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}- \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}},\nonumber \\{} & {} \frac{1}{2}\lambda _{NI} + \lambda _{F}, \lambda _{L}), \end{aligned}$$
(21)

where D is the oriented incidence matrix for the graph \(G = (V, E)\) corresponding to the partial order relation \(\preceq \) on \({\mathcal {I}}\).

Further, let us introduce two "naive" versions of \(\hat{{\varvec{\beta }}}^{FNI}\). Instead of simultaniously penalise by fusion and isotonisation we consider the following two-step procedures:

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})&= {} \hat{{\varvec{\beta }}}^{NI}(\hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}), \lambda _{NI}) \\&\equiv \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} || \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) \\&\quad - {\varvec{\beta }}||_{2}^{2} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}, \end{aligned} \end{aligned}$$
(22)

and

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{NI\rightarrow F}({\varvec{y}}, \lambda _{NI}, \lambda _{F})&= {} \hat{{\varvec{\beta }}}^{F}(\hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}), \lambda _{F}) \\&\equiv \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} || \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) \\&\quad - {\varvec{\beta }}||_{2}^{2} + \lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|. \end{aligned} \end{aligned}$$
(23)

Below we prove that both “naive” methods in the one-dimensional case with a simple monotonic restriction defined above are not only equivalent, but both methods provide the solution to the fused nearly-isotonic regression.

First, we have to prove that, analogously to fused lasso and nearly-isotonic regression, as one of the penalization parameters increases, the constant regions in the solution \(\hat{{\varvec{\beta }}}^{FLNI}\) can only be joined together and not split apart. In the paper Minami (2020) this property of the estimator was called agglomerative property. We prove this result only for the one-dimensional monotonic order, and the general case is an open question.

Proposition 4

(Agglomerative property of FLNI estimator) Let \({\mathcal {I}} = \{1, \dots , n\}\) with the natural order for integers defined on it. Next, let \({\varvec{\lambda }} = (\lambda _{F}, \lambda _{L}, \lambda _{NI})\) and \({\varvec{\lambda }}^{*}= (\lambda _{F}^{*}, \lambda _{L}^{*}, \lambda _{NI}^{*})\) are the triples of penalisation parameters such that one of the elements of \({\varvec{\lambda }}^{*}\) is greater than the corresponding element in \({\varvec{\lambda }}\), while two others are the same. Next, assume that for some i the solution \(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},{\varvec{\lambda }})\) satisfies

$$\begin{aligned} {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}) = {\hat{\beta }}_{i+1}^{FLNI}({\varvec{y}},{\varvec{\lambda }}). \end{aligned}$$

Then for \({\varvec{\lambda }}^{*}\) we have

$$\begin{aligned} {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*}) = {\hat{\beta }}_{i+1}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*}). \end{aligned}$$

Now we can prove the commutability property of the “naive” estimators and the equivalence of the approach to the fused nearly-isotonic regression.

Theorem 5

(Commutability property of FNI estimator) Let \(\hat{{\varvec{\beta }}}^{F\rightarrow NI}\!({\varvec{y}}, \lambda _{F}, \lambda _{NI})\) and \(\hat{{\varvec{\beta }}}^{NI\rightarrow F}\!({\varvec{y}}, \lambda _{NI}, \lambda _{F})\) be the “naive” versions of the fused nearly-isotonic approximator, defined in (22) and (23), in the case of one-dimensional monotonic constraint. Then, we have

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})= & {} \hat{{\varvec{\beta }}}^{NI\rightarrow F}({\varvec{y}}, \lambda _{NI}, \lambda _{F}) \\= & {} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}). \end{aligned}$$

One of the first conclusions of Theorem 5 is commutability of strict isotonisation (which corresponds to the large values of \(\lambda _{NI}\)) and fusion. For big values of \(\lambda _{NI}\), fused lasso nearly-isotonic signal approximation is, in principle, analogous to the approach studied in Gao et al. (2020), where the authors studied estimation of isotonic piecewise constant signals solving the following optimization problem

$$\begin{aligned} {\varvec{\beta }}^{*} = \underset{{\varvec{\beta }} \in {\varvec{{\mathcal {B}}}}^{is}_{n,k} }{\arg \min } \sum _{j = 1}^{n}(\beta _{j} - y_{j})^{2} + pen(n, k), \end{aligned}$$
(24)

where

$$\begin{aligned} \begin{aligned} {\varvec{{\mathcal {B}}}}^{is}_{n,k} = \{&{\varvec{\beta }}\in {\mathbb {R}}^{n}: \text { there exists } \{ a_{j} \}_{j=0}^{k} \text { and } \{ \mu _{j} \}_{j=1}^{k}\\&\text { such that } 0 \le a_{0} \le a_{1} \le \dots \le a_{k} = n, \\&\mu _{1} \le \mu _{2} \le \dots \le \mu _{k}, \text { and } \beta _{i} = \mu _{j}\\&\text { for all } i \in (a_{j-1}: a_{j}] \}, \end{aligned} \end{aligned}$$

and pen(nk) is a penalization term which depends on n and k but not on \({\varvec{y}}\). Therefore, the result of Theorem 5 provides an alternative approach to obtain an exact solution in the estimation isotonic piecewise constant signals.

4 Degrees of freedom

In this section we discuss the estimation of the degrees of freedom for the fused nearly-isotonic regression and the fused lasso nearly-isotonic signal approximator. Let us consider the following nonparametric model

$$\begin{aligned} {\varvec{Y}} = {\varvec{\mathring{\beta }}} + {\varvec{\varepsilon }}, \end{aligned}$$

where \({\varvec{\mathring{\beta }}}\in {\mathbb {R}}^{n}\) is an unknown signal, and the error term \({\varvec{\varepsilon }}\in {\mathcal {N}}({\varvec{0}}, \sigma ^{2}{\varvec{I}})\).

The degrees of freedom is a measure of complexity of the estimator, and following Efron (1986), for the fixed values of \(\lambda _{F}\), \(\lambda _{L}\) and \(\lambda _{Ni}\) the degrees of freedom of \(\hat{{\varvec{\beta }}}^{FNI}\) and \(\hat{{\varvec{\beta }}}^{FLNI}\) are given by

$$\begin{aligned}{} & {} df(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))\nonumber \\{} & {} \quad = \frac{1}{\sigma ^{2}} \sum _{i=1}^{n}\textrm{Cov}[{\hat{\beta }}^{FLNI}_{i}({\varvec{Y}}, \lambda _{F},\lambda _{L}, \lambda _{NI}), Y_{i}]. \end{aligned}$$
(25)

The next theorem provides the unbiased estimators of the degrees of freedom \(df(\hat{{\varvec{\beta }}}^{FNI})\) and \(df(\hat{{\varvec{\beta }}}^{FLNI})\).

Theorem 6

For the fixed values of \(\lambda _{F}\), \(\lambda _{L}\) and \(\lambda _{Ni}\) let

$$\begin{aligned} K^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})= & {} \#\{\text {fused groups in }\\{} & {} \quad \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})\}, \end{aligned}$$

and

$$\begin{aligned} K^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})= & {} \#\{\text {non-zero fused groups in } \\{} & {} \quad \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\}. \end{aligned}$$

Then we have

$$\begin{aligned} {\mathbb {E}}[K^{FNI}({\varvec{Y}}, \lambda _{F}, \lambda _{NI})] = df(\hat{{\varvec{\beta }}}^{FNI}({\varvec{Y}}, \lambda _{F}, \lambda _{NI})), \end{aligned}$$

and

$$\begin{aligned}{} & {} {\mathbb {E}}[K^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})]\\{} & {} \quad = df(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})). \end{aligned}$$

We can potentially use the estimate of degrees of freedom for an unbiased estimation of the true risk \({\mathbb {E}}[\sum _{i=1}^{n}(\mathring{\beta }_{i}- {\hat{\beta }}^{FLNI}_{i}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))^2]\), which is given by the \({\hat{C}}_{p}\) statistic

$$\begin{aligned} \begin{aligned} {\hat{C}}_{p}(\lambda _{F}, \lambda _{L}, \lambda _{NI})&=\sum _{i=1}^{n}(y_{i} - {\hat{\beta }}^{FLNI}_{i} ({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))^2 \\&\quad - n\sigma ^{2} + 2\sigma ^{2}K^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}). \end{aligned} \end{aligned}$$

Though, we note that in real applications the variance \(\sigma ^{2}\) is unknown. The variance estimator for the case of one-dimensional isotonic regression was introduced in Meyer and Woodroofe (2000). To the authors’ knowledge, the variance estimator even for the one-dimensional nearly-isotonic regression is an open problem.

In order to illustrate the performance of the degrees of freedom estimator, we generate \(M = 3000\) independent samples from te following signal on a grid

$$\begin{aligned} Y_{i,j} = \max {(i,j)} + \varepsilon _{i,j}, \end{aligned}$$

where \(i,j= 1,\dots , 5\) and \({\varvec{\varepsilon }}\in {\mathcal {N}}(0, 0.25)\). Using Monte-Carlo simulations, we estimate \(\textrm{Cov}[{\hat{\beta }}^{FLNI}_{i}, Y_{i}]\) and compare the estimated true value of df with the estimator defined in Theorem 6. The result for different values of penalisation parameters is given in Fig. 3.

Fig. 3
figure 3

Degrees of freedom estimator for different values of penalisation parameters and its 25 % and 75 % quantiles based on 3000 samples

5 Computational aspects, simulation study and application to a real data set

First of all, recall that the solution to the fused lasso nearly-isotonic approximator is given by

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})\\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) - \lambda _{L}, &{} \,\text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \ge \lambda _{L}, \\ 0, &{} \, \text { if } \, |{\hat{\beta }}^{FNI}_{{\varvec{i}}}| \le \lambda _{L},\\ {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) + \lambda _{L}, &{}\, \text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \le -\lambda _{L}, \end{array}\right. } \end{aligned}$$

for \({{\varvec{i}}}\in {\mathcal {I}}\), with

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{\nu }}}(\lambda _{F}, \lambda _{NI}) \end{aligned}$$

where

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }} \in {\mathbb {R}}^{m}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2}\\{} & {} \quad \text {s. t.} - \lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \end{aligned}$$

where D is the incidence matrix displayed in Fig. 1 (a) for the one-dimensional case. The matrix D is full raw ranked, therefore, the problem is strictly convex. Next, we have similar box-type constraints as in the problem of the \(L_{1}\)-trend filtering example and we can solve the problem with \({\mathcal {O}}(n)\) time complexity.

Second, note that in one-dimensional case the time complexities of path solution algorithms for the nearly-isotonic regression and the fusion approximator are equal to \({\mathcal {O}}(n\log (n))\), cf. Tibshirani et al. (2011); Hoefling (2010); Bento et al. (2018) with the references therein. Therefore, if we have \(\lambda _{F}\) fixed, then using the result of Theorem 5 we can get the solution path with respect to \(\lambda _{NI}\) with the time complexity \({\mathcal {O}}(n\log (n))\). Further, if we fix \(\lambda _{NI}\) then, again, using Theorem 5 we can obtain the solution path with respect to \(\lambda _{F}\) with complexity \({\mathcal {O}}(n\log (n))\). In paper Yu et al. (2022) the one-dimensional fused nearly-isotonic regression was solved for fixed values of penalisation parameters. Therefore, one-dimensional fused lasso and nearly-isotonic regression have been studied in detail, therefore, in our paper we focus on the two-dimensional case.

The case of several dimensions is more complicated. Note, that, for example, even in the case of two dimensions the matrix D, displayed in Fig. 2, is not full raw ranked. Therefore, the dual problem is not strictly convex. At the same time one can see that the matrix D is sparse diagonal. Therefore, we apply the recently developed OSQP algorithm, cf. Stellato et al. (2020). The time complexity of the solution is linear with respect to the number of edges in the graph, i.e. it is \({\mathcal {O}}(|E|)\).

Fig. 4
figure 4

Computational times vs side size of a square grid for OSQP solution of fused nearly-isotonic approximator in two dimensions

The exact solution for fixed values of penalisation parameters can be obtained using the results of the paper Minami (2020), where the author proposed the algorithm for a general graph with computational complexity \({\mathcal {O}}(n|E|\log (\frac{n^{2}}{|E|}))\). Therefore, in principle, using the relation between fused nearly-isotonic regression and nearly-isotonic regression proved in Proposition 3 it is possible to obtain the exact solution to the fused nearly-isotonic approximation for a general graph.

First, recall that from Theorem 2 it follows that the solution with \(\lambda _{L} \ne 0\) is given by soft-thresholding of the solution with \(\lambda _{L} = 0\). Therefore, lasso penalization does not add much to the complexity, and we concentrate on the case with \(\lambda _{L} = 0\). Following Minami (2020), we use the following bi-monotone functions (bisigmoid and bicubic) to test the performance of the fused nearly-isotonic approximator:

$$\begin{aligned} f_{bs}(x^{(1)}, x^{(2)})= & {} \frac{1}{2}\Big (\frac{e^{16x^{(1)} - 8}}{1 + e^{16x^{(1)} - 8}} + \frac{e^{16x^{(2)} - 8}}{1 + e^{16x^{(2)} - 8}}\Big ),\\ f_{bc}(x^{(1)}, x^{(2)})= & {} \frac{1}{2}\Big ( (2x^{(1)} -1)^{3} + (2x^{(2)} -1)^{3} \Big ) + 2, \end{aligned}$$

where \(x^{(1)} \in [0,1)\) and \(x^{(2)} \in [0,1)\).

The simulation experiment is performed in the following way. First, we generate homogeneous grid \(k\times k\):

$$\begin{aligned} x^{(1)}_{k} = \frac{k-1}{d} \quad \text {and} \quad x^{(2)}_{k} = \frac{k-1}{d}, \end{aligned}$$

for \(k = 1, \dots , d\). The size of the side d varies in \(\{ 2\times 10^{2}, 4\times 10^{2}, 6\times 10^{2}, 8\times 10^{2}, 10^{3} \}\). Next, we uniformly generate penalisation parameters \(\lambda _{F}\) and \(\lambda _{NI}\) from U(0, 5). We perform 10 runs and compute computational times for each d. Analogously to Stellato et al. (2020), we consider two cases of OSQP algorithm: low precision case with \(\varepsilon _{abs} = \varepsilon _{rel} = 10^{-3}\), and high precision case with \(\varepsilon _{abs} = \varepsilon _{rel} = 10^{-5}\) (for the details of the settings in OSQP we refer to Stellato et al. (2020)). Figure 4 below provides these computational times. All the computations were performed on MacBook Air (Apple M1 chip), 16 GB RAM. From these results we can conclude that the estimator is computationally feasible for moderate sized data sets (i.e. for the grids with millions of nodes).

Next, Fig. 5 visualizes the fused nearly-isotonic approximator. We use the Adult data set, available from the UCI Machine Learning repository (Becker and Kohavi 1996). The target variable in this data set is either a person’s salary is greater than 50 000 dollars per year or less. We use two features (education number and working hours per week) and each bar in the figure is the proportion of people making more that the amount of money mentioned above. This data set was used, for example, in Wang et al. (2022).

Fig. 5
figure 5

Data visualisation for different levels of fusion and isotonisation

From Fig. 5 we can see that fused nearly-isotonic regression provides a trade-off between monotonicity, block sparsity and goodness-of-fit.

6 Conclusion and discussion

In this paper we introduced and studied the fused lasso nearly-isotonic signal approximator in general dimensions. The main result is that the estimator is computationally feasible and it provides interplay between fusion and monotonisation. Also, we proved that the properties of the new estimator are very similar to the properties of the fusion estimator and the nearly-isotonic regression.

In our opinion, one of the most important results is Theorem 5, where we proved the commutability property of fusion and nearly-isotonisation, because for the fixed values of one of the penalisation parameters we can immediately obtain the path solution with respect to the other one. Path algorithm for fused lasso exists (Hoefling 2010; Tibshirani and Taylor 2011). At the same time, to the authors’ knowledge, path algorithm for nearly-isotonic regression in general dimensions has not been developed yet. Therefore, further direction could be the solution for the nearly-isotonic regression, and, next, to prove if commutability holds in a general dimensional case.

One of the other possible directions is to study the asymptotic properties. In particular, it is interesting to understand the rate of convergence for different model selection and cross-validation procedures of choosing penalisation parameters.

Another direction is to study properties of the solution when \(\lambda _{F}\) and \(\lambda _{NI}\) are not the same for each vertex. An example where one must use different penalisation parameters is the case when the data points are measured along non-homogeneously spaced grid. It is important to note that, as discussed in Minami (2020), this case is different and even in the one-dimensional case the estimator will behave differently. In particular, the agglomerative property of the nearly-isotonic regression holds if the penalisation parameters satisfy the certain relation, cf. Proposition A.1. in Minami (2020), which is crucial for the solution path.

Finally, in our opinion, it is interesting to study different combinations of penalisation estimators, even though, practically, in this case one needs more data, because there will be more penalisation parameters to estimate.