ADMM-Based Residual Whiteness Principle for Automatic Parameter Selection in Single Image Super-Resolution Problems

We propose an automatic parameter selection strategy for the single image super-resolution problem for images corrupted by blur and additive white Gaussian noise with unknown standard deviation. The proposed approach exploits the structure of both the down-sampling and the blur operators in the frequency domain and computes the optimal regularisation parameter as the one optimising a suitably defined residual whiteness measure. Computationally, the proposed strategy relies on the fast solution of generalised Tikhonov ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _2$$\end{document}–ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _2$$\end{document} problems as proposed in Zhao et al. (IEEE Trans Image Process 25:3683–3697, 2016). These problems naturally appear as substeps of the Alternating Direction Method of Multipliers used to solve single image super-resolution problems with non-quadratic, non-smooth, sparsity-promoting regularisers both in convex and in non-convex regimes. After detailing the theoretical properties allowing to express the whiteness functional in a compact way, we report an exhaustive list of numerical experiments proving the effectiveness of the proposed approach for different type of problems, in comparison with well-known parameter selection strategies such as, e.g., the discrepancy principle.

imaging, where, typically, the SR problem aims to reconstruct high-quality and fine-detailed images overcoming the physical limitations imposed by the acquisition setting, such as, e.g., light diffraction phenomena [14,40].
The SR problem can be formulated in mathematical terms as follows. Let X ∈ R N r ×N c denote the original HR image, with x = vec(X) ∈ R N , N = N r N c , being its row-wise vectorisation. The degradation process describing the mapping from HR to LR data can be described by the linear model b = SKx + e , with e realisation of E ∼ N (0 n , σ 2 I n ), (1) where b, e ∈ R n are the vectorised LR observed image and additive white Gaussian noise (AWGN) realisation, respectively, both of size n r × n c , with n = n r n c and S ∈ R n×N is a binary selection matrix inducing a pixel decimation with factors d r and d c along the N r rows and the N c columns of the HR image X, respectively-i.e. N r = n r d r , N c = n c d c . Then, K ∈ R N ×N is the matrix representing a space-invariant blurring operator and E is an n-variate Gaussian-distributed random vector with zero mean and scalar covariance matrix, with σ > 0 indicating the (often unknown) noise standard deviation. In the following, we will use the notation I n ∈ R n×n and 0 n , 1 n ∈ R n to denote the n × n identity matrix and the n-dimensional vectors with all zeros and ones, respectively and set d := d r d c , so that N = nd. Determining x ∈ R N solving (1) given b ∈ R n and the operators S, K is an ill-posed inverse problem. Many approaches for solving it have been proposed, ranging from interpolation techniques, see, e.g., [35], sparse representation approaches [41], reconstruction-based methods [5,10,25,38,43] and, recently, approaches based on learning techniques, see the recent review [42].
In this paper, we consider a reconstruction-based approach based on suitable regularisation approaches. In particular, we seek an estimate x * of x being the minimiser of a suitable cost function J : R N → R + , with R + denoting the set of non-negative real numbers, which codifies both a-priori information on the solution and on the Gaussian noise statistics. A standard approach for doing so consists in considering the problem: where R : R N → R + is a possibly non-convex and nonsmooth regularisation term, and where the data fidelity term (1/2) SKx−b 2 2 encodes the AWGN assumption on e, while the regularisation parameter μ > 0 balances the action of the fidelity against regularisation.
The optimal choice of μ in (2) is in fact crucial for obtaining high-quality reconstructions. Many heuristic approaches have been proposed for its automatic selection, such as, e.g., L-curve [6] and generalised cross-validation (GCV) [13]. Alternatively, several methods exploiting available information on the noise corruption have been designed. Among them, a standard strategy is the Morozov Discrepancy Principle (DP)-see [7,16,31] for general problems and [37] for applications to SR-which can be formulated as follows: Select μ = μ * such that r * (μ * ) 2 where x * (μ) ∈ R N solves (2), and r * (μ) := SKx * (μ) − b ∈ R n is defined as the LR residual with τ > 0 denoting the discrepancy coefficient. When σ is known, τ is set equal to 1, otherwise a value slightly greater than 1 is typically chosen so to avoid noise under-estimation. In many real world applications an accurate estimate of σ is not available, which often limits the applicability of DP-based strategies. Alternative strategies overcoming this issue and leveraging the noise whiteness property have been used in the context of image denoising and deconvolution problems (i.e. where S = I N ). Using the whiteness property of the residual image-i.e. of the noise-is an effective idea as on the one hand it does not require to know the noise standard deviation and on the other it exploits much more information than the one encoded in the first moments of the noise distribution. Variational models for image denoising and deblurring containing data fidelity terms or regularisers aimed to explicitly enforce the residual whiteness property have been proposed; see, e.g., [2,18,20,21,27]. The good results achieved by such models demonstrate the potentiality of using the noise whiteness property. However, these models are strongly nonconvex, with all the associated numerical and computational difficulties. More classically, whiteness and other statistical and deterministic properties of the residual have been used as a-posteriori criteria for evaluating the quality of the restored image or selecting the regularisation parameter of convex variational models in image denoising and deblurring; see, e.g., [1,4,17,29]. In particular, in [1] the authors proposed an effective parameter selection strategy for variational image deconvolution based on minimising the residual normalised auto-correlation. This approach, that has been applied in [1] as an a posteriori criterion, has then been revisited in [22], where the authors design a measure of whiteness of the residual image r * (μ) = Kx * (μ) − b that is regarded as a function of μ. This strategy, called the residual whiteness principle (RWP), allows for an automatic estimation of μ and can be naturally embedded within an iterative ADMM optimisation scheme and shown to be effective for different choices of non-quadratic non-smooth regularisers R. In fact, whenever S = I N and upon the assumption of periodic boundary conditions, models of the form (2) can be easily manipulated through an ADMM-type scheme where matrix-vector products and matrix inversions can be efficiently computed in the frequency domain by means of fast discrete Fourier transform solvers, due to the circulant structure of the operator K (see [22] for details).
In SR problems, however, due to the presence of the decimation operator S, the operator A := SK is, typically, unstructured and, as a consequence, the solution of (2) becomes more challenging. To motivate this better, let us consider the special choice R(x) := 1 2 Lx − v 2 2 , with L ∈ R M×N being a known regularisation matrix and v ∈ R M a given vector. Then, problem (2) takes the form of the generalised Tikhonov-regularised 2 -2 problem which reads which has been previously employed, e.g., in [34,41] in the context of SR problems. The solution of (4) can be computed by considering the corresponding optimality condition upon the inversion of unstructured operators, thus requiring in principle the use of iterative solvers, such as, e.g., the Conjugate Gradient algorithm.
However, for problems like (4) and upon a specific choice of S, it was shown in [43] that an efficient solution strategy based on the use of the Woodbury's formula can be derived. The resulting algorithm, therein called Fast Super-Resolution (FSR) algorithm significantly reduces the computational efforts required by iterative solvers solving (4) as it boils down the problem to the inversion of diagonal matrices in the Fourier domain. As far as the parameter selection strategy is concerned, in [43] a Generalised Cross Validation strategy [11] is used to select the optimal μ, which is known to be impractical for large-scale problems, see, e.g., [8]. Note, furthermore, that 2 -2 problems in the form (4) naturally arise when attempting to solve the more general class of variational models (2) by means of classical iterative optimisation solvers such as the ADMM. In this context, such problems appear to enforce suitable variable splitting and are defined in terms of appropriate penalty parameters. Here, the FSR algorithm can thus still be used as an efficient solver, similarly as done, e.g., in [5,24,33,33]. Contribution We propose an automatic parameter selection strategy for the regularisation parameter μ in (2) which does not require any prior knowledge on the AWG noise level and can be applied to general non-smooth and possibly non-convex regularisers R. Our approach is based on the optimisation of a suitably defined measure of whiteness of the residual image in the frequency domain. The proposed approach generalises the results presented in [22] for image deconvolution problems to the more challenging scenario of single-image super-resolution problems. At the same time, it extends the results contained in an earlier conference version of this work [26] where only problems in the form (4) were considered . By designing an ADMM-based optimisation strategy for solving the general problem (2) with an appropriate variable splitting, the residual whiteness principle can be applied iteratively along the ADMM iterations and used jointly as part of the solution of the 2 -2 substeps in the form (4). Several numerical results confirming the effectiveness of the proposed method in comparison to the standard Discrepancy Principle for popular image regularisers such as the Tikhonov and Total Variation are reported. Moreover, to provide some insights about the extension of such strategy to non-convex setting, we propose to embed the automatic estimation strategy yielded by the adoption of the RWP within an iterative reweighted 1 scheme for tackling the non-convex continuous relaxation of the 0 norm [32].

Notations, Preliminaries and Assumptions
We start by setting the notations and recalling useful results that will be used in the following discussion. Then, we detail the adopted assumptions for our derivations.

Notations and Preliminaries
In the following, for z ∈ C we use z, |z| to indicate the conjugate and the modulus of z, respectively. We denote by F, F H ∈ C N ×N the unitary coefficient matrices of the 2D discrete Fourier transform operator and its inverse, respectively, when applied to vectorised N r × N c images, with FF H = F H F = I N . For any v ∈ R N and any A ∈ R N ×N , we use the notations v = Fv and A = FAF H to denote the action of the 2D Fourier transform operator F on vectors and matrices, respectively. Given a permutation matrix P ∈ R N ×N , we denote byv = Pṽ and byÂ = PÃP T the action of P on the Fourier-transformed vectorṽ and matrixÃ, respectively. We indicate byǍ the productǍ = PÃ H P T , i.e. the action of P onÃ H . Finally, we denote by J s the s × s matrix of all ones and by c s i the i-th canonical basis vector of R s . In the following Lemmas 1-4 we recall two well-known properties of the Kronecker product '⊗'-see, e.g., [12]-as well as two results that will be useful in the paper.

Lemma 1 Let
Lemma 2 Let A, B be square matrices of orders r and s, respectively. Then: where P r s ∈ R rs×rs is a special permutation matrix, called the perfect shuffle matrix for a set of s × r elements, defined by Lemma 3 (Woodbury formula) Let A 1 , A 2 , A 3 , A 4 be conformable matrices with A 1 and A 3 invertible. Then, the following inversion formula holds: Lemma 4 [36] Let S ∈ R n×N be the binary selection matrix introduced. Then: The non-zero entries of the matrix S H S in (8) are all equal to 1/d and are arranged along replicated patterns, as shown in Fig. 1(left) for the particular case N r = 6, N c = 12, d r = 2, d c = 3 ⇒ n r = 3, n c = 4. In the following Proposition 1 we prove that there always exists a permutation matrix P , as the one represented in Fig. 1(centre), capable of shuffling the rows and columns of S H S to get a well-structured, blockdiagonal matrix, as shown in Fig. 1(right). Proposition 1 Let P ∈ R N ×N be the permutation matrix defined by with P d n , P n r d c perfect shuffle matrices as defined in (6). Then, there holds Proof First, by setting P := I d r ⊗ P n r d c ⊗ I n c and based on the definition (8), we have where in (11) we used the associative property of Kronecker products and in (12) we applied Lemma 1. Then, starting from (12), we have where in (13) we used the property of the transposed of a Kronecker product, in (14) we applied Lemma 1 and in (15) Lemma 2. Finally, by recalling the definition of matrix P in (9), namely P = P d n P, we have where in (16) we again applied Lemma 2.

Assumptions
In this section, we detail the class of variational models of interest by listing the assumptions adopted for the regularisation term R(x) as well as for the decimation matrix S and the blurring matrix K.
In this paper, we focus on the automatic selection of the regularisation parameter μ in single image super-resolution variational models of the form: We refer to L ∈ R M×N as the regularisation matrix, whereas the regularisation function G : R M → R := R ∪ {+∞} is nonlinear and possibly non-smooth. The general variational model (17) undergoes the following assumptions: (A1) The blur matrix K ∈ R N ×N , decimation matrix S ∈ R n×N and regularisation matrix L ∈ R M×N are such that null(SK) ∩ null(L) = 0 N . (A2) The regularisation function G : R M → R is proper, lower semi-continuous, convex and coercive. (A3) The blur matrix K represents a 2D discrete convolution operator-which follows from the blur being space-invariant-and the regularisation matrix L is of the form: matrices also representing 2D discrete convolution operators. (A4) The decimation matrix S ∈ R n×N is a binary selection matrix, such that SS H = I n and the operator S H ∈ R N ×n interpolates the decimated image with zeros. (A5) The regularisation function G is easily proximable, that is, the proximity operator of G at any t ∈ R M , can be efficiently computed.
Assumptions (A1)-(A2) guarantee the existence-and, upon suitable assumptions, uniqueness-of solutions of the considered class of variational super-resolution models (17), as formally stated in Proposition 2 below, whose proof can be easily derived from the one of Proposition 2.1 in [22]. (17) is proper, lower semi-continuous, convex and coercive, hence it admits global minimisers. Furthermore, if matrix SK has full rank, then the global minimiser is unique.

Proposition 2 If the assumptions (A1)-(A2) above are fulfilled, for any fixed
Assumptions (A3)-(A4) are crucial for our proposal. In fact, as it will be detailed in the paper, they allow for the efficient automatic selection of the regularisation parameter in the frequency domain based on the RWP. More specifically, (A3) guarantees that, under the assumption of periodic boundary conditions, the blur matrix K and the regularisation matrices L j , j = 1, . . . , s, are all block-circulant matrices with circulant blocks, which can be diagonalised by the 2D discrete Fourier transform, i.e.: where λ, j ∈ C N ×N are diagonal matrices defined by Assumption (A4) allows to apply Lemma 4 and Proposition 1 which, together with Fourier-diagonalisation formulas (18)- (19) allow to solve in the frequency domain the linear systems arising in the proposed iterative solution procedure. The Fourier representation allows to obtain an explicit form of the whiteness measure function such that the regularisation parameter μ can be efficiently updated along the solver iterations based on a simple univariate minimisation problem, according to the RWP. We remark that assumptions (A3)-(A4) cannot be relaxed without making the computational burden of our proposal impractical: even in the case when the arising linear systems can be solved very efficiently (e.g. if matrix L is a tall matrix such that L H L has small size and can be inverted cheaply), if an explicit frequency domain solution is not available, then the repeated evaluation of the whiteness measure function becomes impractical. Assumption (A5) is required to compute efficiently the solution of all the proximity steps arising in the solution algorithm.
We now briefly discuss how stringent the above assumptions are. First, assumption (A4) on S is not stringent at all. In fact, if the binary selection matrix S is preceded by the integration of the HR image over the support of each LR pixel, a pixel blur operator-representing a 2D convolution operator -can be introduced and incorporated in the original blur matrix K.
Next, as far as popular regularisation terms satisfying (A2),(A3),(A5) are concerned, we mention the Tikhonov (TIK) regulariser, which, in its general form reads G(Lx) = Lx − v 2 2 for suitably chosen L and v ∈ R N reflecting any prior knowledge on the solution. When L = D, the discrete image gradient, such regulariser is commonly referred to as discrete Sobolev regularisation and it is typically adopted when the image of interest is characterised by smooth regions. Note, that in the following we will consider only such gradient-choice of the Tikhonov regulariser, hence we will denote by TIK the discrete Sobolev regularisation functional G(Lx) = Dx 2 2 . Alternatively, under the same choice for L = D, another common choice for G(Lx) is the Total Variation (TV) regulariser [30], which is employed for the reconstruction of piece-wise constant images with sharp edges, and that admits an Isotropic (TVI) and Anisotropic (TVA) formulation. They can be expressed in terms of the regularisation matrix L and of the nonlinear regularisation function G(t), t = Lx, as follows: finite difference operators discretising the first-order horizontal and vertical partial derivatives.
In presence of images characterised by different local features, the global nature of TIK and TV compromises their performance. As a way to promote regularisation with different strength over the image, in [9] a class of weighted TV-based regularisers has been proposed; in formula:

WTV
Regardless of its local or global nature, a regularisation term designed by setting L = D is expected to promote gradient sparsity. When dealing with sparse-recovery problems, i.e. problems where the signal itself is assumed to be sparse, regularisations based, e.g., on the use of the 1 norm are often considered and possibly combined with a space-variant modelling so as to get a Weighted 1 (WL1) regulariser, which reads: Remark 1 (Non-convex regularisations) Despite our convexity assumption (A2), we will detail in Sect. 6 few insights on the use of the proposed parameter selection strategy to non-convex regularisers corresponding to continuous approximations of the 0 pseudo-norm, such as, e.g., the CEL0 penalty [32]. The iterative strategy we are going to discuss next does not directly apply to this scenario; nonetheless, upon the suitable definition of appropriate surrogate convex functions approximating the original non-convex problems (by means, for instance, of reweighted 1 algorithms [23]), its use is still possible.

The RWP for Single Image Super-Resolution
In this section, we recall some key results originally reported in [26] and concerning the application of the RWP to Tikhonov-regularised super-resolution least squares problems of the form (4). In fact, what follows represents the building block of the iterative procedures introduced in Sect. 5. Let us consider the noise realisation e in (1) in its original n r × n c matrix form: where index pairs (l, m) are commonly called lags, and * denote the 2-D discrete correlation and convolution operators, respectively, and where e (i, j) = e(−i, − j). Clearly, for (20) being defined for all lags (l, m) ∈ , the noise realisation e must be padded with at least n r − 1 samples in the vertical direction and n c − 1 samples in the horizontal direction by assuming periodic boundary conditions, such that and * in (20) denote 2-D circular correlation and convolution, respectively. This allows to consider only lags If the corruption e in (1) is the realisation of a white Gaussian noise process-as in our case-it is well known that as n → +∞, the sample auto-correlation a l,m (e) satisfies the following asymptotic property [20]: We remark that the DP exploits only the information at lag (0, 0). In fact, according to (3), the standard deviation of the residual image is required to be equal to the noise standard deviation σ . Imposing whiteness of the residual image by constraining its auto-correlation at non-zero lags to be small is a stronger requirement. In order to make such whiteness principle independent of the noise level, we consider the normalised sample autocorrelation of noise realisation e, namely where · 2 denotes here the Frobenius norm. It follows easily from (21) that ρ(e) satisfies the following asymptotic properties: In [22], the authors introduce the following non-negative scalar measure of whiteness W : R n r ×n c → R + of noise realisation e: where the last equality comes from Proposition 3 belowthe proof being reported in [22]-withẽ ∈ C n r ×n c the 2D discrete Fourier transform of e and W : C n r ×n c → R + the function defined in (23).

Proposition 3
Let e ∈ R n r ×n c andẽ = F e ∈ C n r ×n c . Then, assuming periodic boundary conditions for e, the function W defined in (22) satisfies: By now looking at the considered class of super-resolution variational models (17), we observe that, as a general rule, the closer the attained super-resolved image x * (μ) to the target HR image x, the closer the associated residual image r * (μ) = SKx * (μ) − b to the white noise realisation e in (1) and, hence, the whiter the residual image according to the scalar measure in (22).
This motivates the application of the RWP for automatically selecting the regularisation parameter μ in variational models of the form (17), which reads: where, according to the definition of function W in (22)- (23), the scalar non-negative cost function W : R ++ → R + in (24), from now on referred to as the residual whiteness function, takes the following form: with r * (μ) the 2D discrete Fourier transform of the residual image and function W defined in (23). In [26], the authors derived an explicit expression for the super-resolution residual whiteness function W (μ) in (25) in the frequency domain which generalises the one for the restoration-only case (i.e. the case where S = I N ) reported in [22]. The results of derivations in [26] are summarised in the following proposition, whose proof is reported the appendix.
be the high-resolution residual image and let P ∈ R N ×N be the permutation matrix defined in (9). Then, we have: wherê r H (μ) = Pr H (μ) and ι := 1 + i − 1 d d .

RWP for Tikhonov-Regularised SR Problems
Here, we derive the analytical expression of the whiteness function W (μ) defined in (26) when addressing Tikhonovregularised least squares problems as the one in (4). More specifically, following the derivations reported in [43], in Proposition 5 we give an efficiently computable expression for the solution x * (μ) of the general 2 -2 variational model.

Proposition 5 Let
and with guaranteeing the inversion of s j=1 H j j . The solution of the Tikhonov-regularised least squares problem can be expressed as: where From (28), we can easily derive the Fourier transform of the high resolution residual r * H (μ) = Kx * (μ) − b, that takes the form Recalling Lemma 1 and the property (10), we prove the following proposition and corollary.

Proposition 6
Let ∈ R n×n be a diagonal matrix and consider the matrix λ defined in (27). Then, the following equality holds: Corollary 1 Let = dI + μλ λ H −1 . Then, the expression in (29) turns intõ Recalling now the action of the permutation matrix P on vectors, we have that the productr * Combining altogether, we finally deduce:

Proposition 7
The residual whiteness function for the generalised Tikhonov least squares problem takes the form where the parameters η i ∈ R + , ρ i ∈ C, ν i ∈ C are defined as: When d = 1, i.e. no decimation is considered and S = I N , the expression in (30) reduces to the one derived in [22] for image restoration problems. According to the RWP, the optimal μ * is selected as the one minimising the whiteness measure function in (30). The value μ * can be efficiently detected via grid-search or applying the Newton-Raphson algorithm to the nonlinear equation W (μ) = 0. Finally, the optimal μ * is used for the computation of the reconstruction x * (μ * ) based on (28).
The main steps of the proposed procedure are summarised in Algorithm 1.

Iterated RWP for Convex-Regularised SR Problems
In this section, we present an ADMM-based iterative algorithm for the solution of SR variational models of the general form (17) under assumptions (A1)-(A5), where the regularisation parameter μ is automatically updated along the iterations based on the RWP. The approach relies on the iterative application of the results reported in the previous section. First, we employ variable splitting to rewrite our family of variational models (17) in the following equivalent linearly constrained form: To solve problem (32), we defined the augmented Lagrangian function, where β > 0 is a scalar penalty parameter and λ ∈ R M is the dual variable, i.e. the vector of Lagrange multipliers associated with the set of M linear constraints in (32). Solving (32) corresponds to seek for the saddle point(s) of the augmented Lagrangian function in (33), i.e.: x * (μ), t * (μ), λ * (μ) ∈ arg min x,t max λ L(x, t, λ; μ) . (34) Upon suitable initialisation, and for any k ≥ 0, the kth iteration of the standard ADMM applied to solving the saddle-point problem (34) with L defined in (33) reads as follows: = arg min with q (k+1) = Lx (k+1) + λ (k) β , One can notice that when introducing the additional parameter γ := μ/β, the minimisation sub-problem (35) for the primal variable x has the form of the Tikhonov-regularised least-squares problem (4). Hence, we adjust the regularisation parameter μ-i.e. γ -along the ADMM iterations by applying the RWP to problem (35) as illustrated in Sect. 4.
The complete x-update procedure reads as follows: (31) and The minimisation sub-problem (37) for the primal variable t can be written in the form of a proximity operator, namely According to assumption (A5), the regularisation function G is easily proximable, which means that problem (43) can be solved efficiently. We now report the closed-form expression of the proximity operators for the regularisation terms listed in Sect.

2.2.
For what concerns the case of the TV and WTV regularisers, the associated 2N -variate proximity operators associated to the splitting performed are both separable into N independent bivariate proximity operators. In particular, after introducing the N vectorst the proximity operators admit the following closed-form expressions: where α (k) i denote the spatial weights of the WTV regulariser (45).
Finally, the proximity operator for the WL1 regularisation term reads:

7.
· compute λ (k+1) by (39) 8. end for In Algorithm 2 we outline the main computational steps of the proposed approach, which we refer to as Iterative RWP-ADMM, in short IRWP-ADMM. As initial guess of the IRWP-ADMM, we select x (0) to be the solution of the TIK-L 2 model in (4) with L = D and v the null vector (i.e. by using Sobolev regularisation), and the regularisation parameter μ (0) computed by applying the exact RWP outlined in Sect. 4, i.e. by minimising (30). In the context of image restoration [22], this choice has already been proven to facilitate and speed up the convergence of the employed iterative scheme.
Convergence of two-blocks ADMM applied to the solution of convex optimisation problems of the form (17) with fixed parameters is well-established (see, e.g., [3]). One can thus exploit such standard result for convergence of Algorithm 2 by simply setting fixed values of the parameters (regularisation parameter μ and, possibly, spatial weights of the WTV and WL1 regularisers) starting from a given iteration. In the numerical tests shown in Sect. 7, we observed that the parameters typically stabilise after 400/500 iterations of the IRWP-ADMM.

An Extension of IRWP for SR Problems with Nonconvex Regularisers
In this section we show an example of how IRWP can be applied to nonconvex regularisation problems that do not satisfy assumption (A2) on the convexity of the nonlinear function G.
In several sparse-recovery problems, such as, e.g., variable selection, biological image super-resolution one typically would like to consider relaxations of the 0 -norm tighter than the 1 so as to improve the sparsity-promoting behaviour. Among the many approaches proposed, we consider here the CEL0 penalty term considered in [32] which, combined with a non-negativity constraint yields the following optimisation problem with where a i ∈ R N denotes the i-th column of matrix A and the (parametric) penalty function CEL0 : Notice that, in agreement with our convention by which the desired parameter μ is the one multiplying the fidelity term, in (47)-(48) we divided [32] by its regularisation parameter λ and set μ := 1/λ. In [15], problem (47) is solved by means of the iterative reweighted 1 (IRL1) algorithm [23], which belongs to the class of so-called Majorise-Minimise (MM) optimisation approaches (see, e.g., [19]). More precisely, at any iteration h > 0 of the IRL1 scheme, one minimises a convex majorant of the non-convex cost function J CEL0 which is tangent to J CEL0 at the current iterate x (h) . The majorising variational model can be defined as a convex weighted L 1 -L 2 (WL1-L 2 ) model so that, upon suitable initialisation x (0) the h-th iteration of the IRL1 algorithm applied to solving (47) reads (see [15]): where the weights w i (μ) are the generalised derivatives of φ CEL0 computed at x (h) . The minimisation problem in (50) can be addressed, e.g., by ADMM-see (35)-(39)-with L = I N . The sub-problem with respect to t reduces to compute the proximity operator of function whose closed-form expression comes easily from (46) and reads In this setting, the RWP can again be heuristically applied so as to automatically update μ along the outer iterations of the iterative scheme (49) and (50). At the ADMM iteration k = 0 of the general outer iteration h, μ (h) is updated by applying the residual whiteness principle to problem (35). Then, the weights w (h) i are computed by (49) with μ = μ (h) and x (h) current iterate. The regularisation parameter and the weights are kept fixed along the ADMM iterations and updated at the beginning of the following outer iteration.
The proposed procedure, to which we refer as IRWP-IRL1, is outlined in Algorithm 3.

Computed Examples
In this section, we evaluate the performance of the proposed RWP-based procedure for the automatic selection of the regularisation parameter μ in denoising, deblurring and super-resolution variational models of the form (17). More specifically, we consider first the TV-L 2 which is particularly for k = 1, 2, . . . until convergence do:
· compute t (h+1,k+1) by solving (51) 8. · compute λ (h+1,k+1) by (39) 9. end for 10. end for suitable for the reconstruction of piece-wise constant images, such as, e.g., QR-codes; then, we focus our attention on the reconstruction of natural images for which the more flexible WTV-L 2 is employed. For the first two sets of numerical tests, we also perform the TIK-L 2 variational model with L = D. Finally, we compare the L 1 -L 2 and the CEL0-L 2 variational models for the problem of super-resolution microscopy of phantom images representing biological samples.
In what follows, the output restorations obtained by means of the proposed RWP-based strategy will be denoted by x * REG , with REG ∈ {TIK, TV, WTV, L 1 , CEL0} denoting the regularisation term employed in the solved variational model. Note that when REG = TIK the exact RWP-based approach described in Sect. 4 is used, while for REG = TV, WTV, L 1 and REG=CEL0 the IRWP-ADMM and IRWP-IRL1 iterative methods detailed in Algorithms 2 and 3 are used, respectively.
The proposed RWP-based approach is compared with the DP parameter selection strategy, defined by criterion (3) with τ = 1 and known value σ of the true noise standard deviation.
With the proposed numerical tests we aim to: -prove that the RWP is capable of selecting optimal μ * values returning high quality results in variational image super-resolution; -prove that the proposed IRWP-ADMM approach is capable of automatically selecting such optimal μ * values in a robust (and efficient) manner for non-quadratic superresolution variational models.
The latter point will be confirmed by showing that the iterative IRWP-ADMM/IRWP-IRL1 strategies and the RWP applied a posteriori behave similarly. For all the variational models considered in the experiments, there is a one-to-one relationship between the value of the regularisation parameter μ and the standard deviation of the associated residual image r * (μ) = SKx * (μ) − b. Hence, in all the reported results where μ represents the independent variable, we will substitute the μ-values with the corresponding τ * (μ)-values defined, according to (3), as the ratio between the residual image standard deviation and the true noise standard deviation σ , in formula In the first two set of examples, the quality of the restorations x * computed for different values of τ * with respect to the original uncorrupted image x will be assessed by means of three scalar measures, namely the Structural Similarity Index (SSIM) [39], and the Peak-Signal-to-Noise-Ratio (PSNR) and the Improved-Signal-to-Noise Ratio (ISNR), defined by respectively, with max(x, x * ) representing the largest component-wise value of x and x * , while x BIC denotes the bicubic interpolation of b. In the third example, we select as measure of quality the Jaccard index J δ ∈ [0, 1], which is the ratio between correct detections up to some tolerance δ and the sum of correct detections, false negatives and false positives. In particular, we consider three different values of δ ∈ {0, 2, 4}.

IRWP-ADMM for TV Regularisation
We start testing the IRWP-ADMM on the TV-L 2 model for the reconstruction of the test image qrcode (256×256 pixels) with pixel values between 0 and 1. First, the original image has been corrupted by Gaussian blur, generated by the MATLAB routine fspecial with parameters band=13 and sigma=3. The action of the decimation matrix S has been synthesised by applying a uniform blur with a d r × d c kernel, with d r = d c = 4, and then selecting the rows and the columns of the blurred image according to the decimation factors d c , d r . Finally, the blurred and decimated image has been further corrupted by AWGN with standard deviation σ = 0.1. The original image and the acquired data are shown in Fig. 3a, b, respectively.
Due to the strongly anisotropic nature of the geometric image contents, we employ both the isotropic and the anisotropic version of TV as image regularisers.
The black solid curves in Fig. 2a, d represent the residual whiteness functions W (μ) as defined in (30), with μ replaced by τ * (μ) defined in (52), for the TVI-L 2 and TVA-L 2 models. They have been computed by solving the models for a fine grid of different μ-values, and then calculating for each μ-value the two associated τ * (μ) and W (μ) quantities. Note that independently from the selected regulariser, W has a global minimiser over the considered domain. The optimal value τ * = τ * (μ * ) according to the proposed RWP is depicted by the vertical solid magenta lines, while the vertical dashed black lines correspond to τ = 1 corresponding to the standard value of classical DP.
The ISNR and SSIM curves for different values of τ are plotted in Fig. 2b, c, where the vertical lines have the same meaning as in the first column figures. Note that the RWP tends to select a value for τ that maximises the ISNR rather than the SSIM.
We are also interested in verifying that the proposed IRWP-ADMM scheme outlined in Algorithm 2 succeeds in automatically selecting such optimal τ * in a robust and efficient way. To this purpose, the outputτ of the iterative scheme is indicated with a dashed green line in Fig. 2a-d. The blue and red markers in Fig. 2b, c represent the final ISNR and SSIM values, respectively, of the image reconstructed via IRWP-ADMM. We observe that the algorithm returns aτ which is close to τ * detected a posteriori.
The image reconstructed by bicubic interpolation, the initial guess computed by the TIK-L 2 models and the output reconstructions obtained with the TVI-L 2 and the TVA-L 2 variational models solved by the IRWP-ADMM approach in Algorithm 2 are shown in Fig. 3. We note that the bicubic interpolation in Fig. 3c performs very poorly. The TVI-L 2 reconstruction preserves image edges; however, as observable in Fig. 3e, the rounding of corners, which is a typical drawback of isotropic TV regularisation, affects the quality of the final reconstruction. The anisotropic TV returns a high quality reconstruction-shown in Fig. 3f-as it naturally drives the regularisation along the true horizontal and the vertical edge directions.
Quantitative assessment in terms of PSNR, ISNR and SSIM values are reported in Table 1 where values corresponding to the reconstructions in Fig. 3 are reported to the right part.
As a second example, we consider the test image geometric (320×320) corrupted by the same blur, decimation factors and AWGN as the ones considered for the test image qrcode. The original image and the observed data are shown in Fig. 5a, b. In this case, we only perform the isotropic version of TV, to which we are going to refer as TV.   The residual whiteness function and the ISNR and SSIM curves for the TV-L 2 model are shown in Fig. 4a, b, respectively. Also in this case, the W function exhibits a global minimiser over the considered domain for τ and the τ selected by RWP and IRWP are very close to each other and return high-quality reconstructions in terms of ISNR/SSIM metrics. We also notice that the τ values selected by RWP and IRWP are in this case very close to the one selected by DP, the only difference being that RWP-based approaches do not require prior knowledge of the noise standard deviation.
The quality indices of the restorations computed via the IRWP-ADMM approach are reported in Table 1, while the corresponding reconstructions images are shown in Fig. 5.  For this second example, we also show the behaviour of the regularisation parameter μ, of the ISNR and of the SSIM along the iterations of the IRWP-ADDM for the TV-L 2 variational model -see Fig. 6a-c, respectively.
Finally, for the two test images qrcode and geometric, we report in Table 1 the PSNR/ISNR/SSIM achieved by the bicubic interpolation and the considered variational models when applying a less severe noise degradation to the original images (left side of the table).

IRWP-ADMM for WTV Regularisation
We are now testing the IRWP-ADMM for the WTV-L 2 variational model employed for the reconstruction of natural images. First, we consider the test images church (480 × 320) and monarch (512 × 512) both corrupted by Gaussian blur with band=13 and sigma = 3, decimated with factors d c = d r = 2 and further degradated by an AWGN with σ = 0.1. The original uncorrupted images are shown in Figs. 8a and 9a, while the acquired data are shown in Figs. 8b and 9b, respectively. We show the behaviour of the residual whiteness measure for the two test images in the first column of Fig. 7. Notice that, as already highlighted in [22], the adoption of a more flexible regularisation term yields that the ISNR and the SSIM achieve their maximum value at approximately the same τ . In both cases, the IRWP-ADMM for the solution of the WTV-L 2 returns a τ * (μ) very close to the one selected by the RWP; moreover, the ISNR/SSIM values corresponding to the output τ * s are close to the optimal ones and, in any case, larger than the one achieved by means of the DP.
The image reconstructed via bicubic interpolation, the initial guess computed by the TIK-L 2 model coupled with the RWP and the final reconstructions obtained by employing the IRWP-ADMM for the WTV-L 2 model are shown in Figs. 8 and 9 for the test image church and monarch, respectively.
Moreover, for the test image church, we also report in Fig. 10 the convergence plots of the regularisation parameter μ, the ISNR and the SSIM along the iterations of the IRWP-ADMM.
Finally, the PSNR/ISNR/SSIM values achieved for the reconstructions shown in Figs. 8 and 9 and for the ones obtained considering a less severe corruption level are reported in Table 2.

IRWP-IRL1 for CEL0 Regularisation
In this last example, we consider the problem of molecule localisation starting from a downsampled, blurred and noisy microscopy image. In the test image molecules shown in Fig. 12a, the samples are represented by sparse point sources. The original image has been corrupted by Gaussian blur with parameters band=13 and sigma=3, then decimated with factors d c = d r = 2, and finally degradated by adding an AWGN realisation with σ equal to the 2% of the noiseless signal. The acquired image is shown in Fig. 12b.
In Fig. 11, we show the behaviour of the residual whiteness measure (left) and of the Jaccard index J 4 (right). Also in this example, the W function exhibits a global minimiser and    the τ * values selected by RWP and IRWP are very close and slightly larger than τ * = 1 corresponding to DP. Unlike the previously considered quality measures, the Jaccard index does not present a smooth behaviour, as it measures the precision of the molecules localisation rather than some global visual properties such as the ISNR and the SSIM. However, the J 4 value selected by the IRWP-IRL1 is closer to the achieved maximum when compared to the DP. The output reconstruction is shown in Fig. 12, together with the initialisation for the IRWP-IRL1 algorithm computed by employing the IRWP-ADMM for solving the L 1 -L 2 variational model.
A few more insights about the performance of the proposed approach are given in Table 3, where we report the Jaccard indices J δ , δ ∈ {0, 2, 4}, for the reconstruction obtained by the IRWP-ADMM for the L 1 -L 2 model and by the IRWP-IRL1 for the CEL0 model (right). In the Table, we also show the Jaccard indices achieved when applying a lower degradation level (left).
We conclude by showing, for the most severe corruption, the behaviour of the regularisation parameter μ and of the Jaccard indices along the outer iterations of Algorithm 3. One can observe that, although the monitored quantities do not exhibit a smooth nor a monotonic behaviour (as a further consequence of the definition of J d ), they stabilise thus reflecting the empirical convergence of the scheme (Fig. 13).

Conclusions
We proposed an automatic selection strategy for the regularisation parameter of single image super-resolution variational models based on the Residual Whiteness Principle applied along the iterations of an ADMM-based optimisation scheme. Such approach proved to be successfully applicable to several scenarios of highly degradated images by means of a large family of convex variational models, among which we performed the TIK-L 2 , the TV-L 2 and the WTV-L 2 model. Moreover, we generalised the proposed approach so as to effectively deal with a specific class of non-convex variational models, i.e. the ones admitting a convex majorant. In particular, we focused on the case of the CEL0 functional whose minimisation is carried out by the IRL1 strategy coupled with the proposed automatic estimation approach. When compared to standard parameter estimation techniques such as the Discrepancy Principle, our method has been shown to outperform and to be more applicable as it does not require any prior knowledge on the noise variance. The empirical results on the convergence of the nested iterative schemes employed reflect the robustness of our method. A more detailed theoretical study on this matter is left for future research.   Table 3 Jaccard indices achieved by the L 1 -L 2 and the CEL0-L 2 variational models, for which the proposed RWP-based procedure has been adopted band = 9, sigma = 2, %σ = 1 band = 13, sigma = 3, %σ = 2 In light of its replicating structure, we observe that the action of the permutation P onb H will cluster the identical entries, so that theb H ,ι+ j can be written as the mean of the set of d values {b H ,ι , . . . ,b H , This proves the proposition.