On a Variational Problem with a Nonstandard Growth Functional and Its Applications to Image Processing

We propose a new variational model in Sobolev–Orlicz spaces with non-standard growth conditions of the objective functional and discuss its applications to image processing. The characteristic feature of the proposed model is that the variable exponent, which is associated with non-standard growth, is unknown a priori and it depends on a particular function that belongs to the domain of objective functional. So, we deal with a constrained minimization problem that lives in variable Sobolev–Orlicz spaces. In view of this, we discuss the consistency of the proposed model, give the scheme for its regularization, derive the corresponding optimality system, and propose an iterative algorithm for practical implementations.


Introduction
Over the past few decades, remote sensing has been increasingly contributing to many agricultural monitoring services by providing worldwide systematic observations of the so-called optical vegetation indexes. These indexes (like Normalized Difference Vegetation Index (NDVI), Infrared Percentage Vegetation Index (NPVI), and others) are wellestablished proxies for crop conditions and give early insights into how well the crops are doing and if they are in need of water or nutrients? Peter I. Kogut and Olha P. Kupenko  From physical point of view, the remote sensing sensors record earth's reflectance in different wavelength and these received reflectance values are processed to create separate image for each wavelength. The reflectance values stored for different wavelength in different layers, which are also called spectral channels, are present in such satellite images. In particular, the satellites Sentinel-2 delivers 13 spectral channels ranging from 10 to 60-m in pixel size. So, a hyperspectral image simply define when more than one spectrum and at least three wavelength data like RGB are recorded by sensor to create composite of satellite images.
Typically, to provide useful information of crop status, it is important to monitor the selected agricultural fields in different spectrum on a regular basis in order to retrieve the complete time series of a vegetation index especially during those periods when conditions on the fields change drastically (It can be essential growth stages such as plant emergence, maturation period, and others.). However, in spite of the fact that optical images have a high resolution and are easily captured by low-cost cameras, the real-life satellite images frequently suffer from different types of noise, blur, and other atmosphere artifacts that can affect the radiation recovered by the sensors in different spectral channels and in a different manner. As a result, such satellite images lose their efficiency for the crop field monitoring problems and their utilization can lead to erroneous results and inferences.
In such situation, the problem we are going to deal with, can be shortly described as follows. We have a hyperspectral image u 0 = u 1,0 , u 2,0 , . . . , u M,0 T ∈ L 1 ( ; R M ) which is corrupted by noise or blur, or such that the features of interest in this image are hidden. Here, ⊂ R 2 is a bounded open domain with sufficiently smooth boundary ∂ . So, the challenge is to recover the original image u = [u 1 , u 2 , . . . , u M ] T , in fact, the hidden image features, from the observed datum u 0 . In mathematical terms, this means that for each spectral channel i = 1, . . . , M, one has to solve an inverse problem where the linear blur operator T i models the process through which the i-th spectral channel of u went before observation, and w is the unknown noise affecting the measurements. Due to the presence of the noise and the fact that the blurring operator T i is usually ill-conditioned or even non-invertible, and w is the unknown noise affecting the measurements, the recovery of u from the measurements u 0 is an ill-posed problem [4,28]. In general, the ill-posed nature of this problem implies that there are too many ways one can obtain an approximate solution. A reasonable solution is to reformulate the image-deblurring problem by taking into account the image formation and acquisition process as well as any available prior information about the properties of the image to be restored. The most popular way for that is to represent the denoising problem (in each separate spectral channel) in the form of an appropriate variational problem where B 1 , B 2 are two Banach spaces over , u 0 ∈ B 1 is a given image, λ > 0 is the tuning parameter, T ∈ L(B 1 , B 1 ) is a bounded linear operator, and R : B 2 → R is the regularizing term which smooths the image u and represents some kind of a priori information about the minimizer u. Here, the term T u − u 0 2 B 1 is the so-called fidelity term of the approach which forces the minimizer u to stay close to the given image u 0 (how close depends on the size of λ). As for the choice of operator T , they usually set T = I d (the identity in B 1 ) for the image denoising problems, and T is a symmetric kernel with a smaller support than the image u for the deblurring problems. The choice of a proper regularization R(v), Banach spaces B 1 , B 2 , and a convenient fidelity term T v − u 0 2 B 1 are always challenging problems since they affect the quality of the desired image and also the consistency with the given data.
There is a wide choice of the regularization term proposed in the literature to ensure the well-posedness of the denoising problem (1). In the most cases, this term can be represented as follows: where Dv stands for the generalized gradient.
In fact, it means that we can exploit the benefits of isotropic diffusion (when p = 2) arising from the minimization problem (1), total variation-based diffusion ( p = 1), and more general anisotropic diffusion (1 < p < 2).

The Case p = 1 (Total Variation Minimization)
One of the classical regularization terms was the total variation (TV), introduced by Rudin et al. [48]. Basically, they introduced in [48] the following variational model (shortly, the ROF model) where BV ( ) stands for the space of functions with bounded variation [2,3], ⊂ R 2 is an open bounded Lipschitz domain, f ∈ L 2 ( ) is a given image, f = u + ξ , ξ is the 'noise', and |Du| denotes the total variation seminorm of u on .
This model produces rather good results for removing noise and preserving edges in structured images, meaning images without texture-like components, i.e., without edges. Moreover, the choice B 2 = BV ( ) is reasonable from mathematical point of view, since the space of functions of bounded variation is allowed for discontinuities which are necessary for edge reconstruction. At the same time, it fails in the presence of the latter. Namely, it cannot separate pure noise, i.e., well oscillatory components, from texture, but removes both equally. Also the ROF model suffers from the so-called blocky effects and it can also develop 'false edges' which can mislead a human or computer into identifying erroneous features not present in the true image. These phenomena were a subject of long intensive discussions in the literature (see, for instance, [4,5,14,15,27,28,44,47]).

The Case 1 < p ≤ 2
As follows from (1)-(2), different types of diffusion can be explored. In particular, setting p = 2 in (1), we arrive at the isotropic diffusion which, in principle, solves the staircasing problem. Here, the "staircase effect" means the creation in the image of flat regions separated by artifact boundaries. However, this model is not good for image reconstruction since it has no mechanism for preserving edges. If we fix some value p ∈ (1, 2), then we deal with an anisotropic diffusion which is somewhere between TV-based and isotropic smoothing. In spite of the fact that this type of diffusion can be effective in reconstructing piecewise smooth regions, a fixed value of 1 < p < 2 may not allow for discontinuities, and thus obliterating edges.

High-Order Total Variation and Combination of TV-Based with Isotropic Diffusion
To overcome the drawbacks mentioned above, various modifications of the model (1) have been introduced. We can briefly indicate here the Adaptive Total Variation model [49] and a variety of high-order total variation regularizations (see [17,50] and the references therein). Recently, in order to reduce the staircasing effect, in a series of papers [6,11,12,50,53], authors proposed the models based on the so-called total generalized variation (TGV) involving a linear combination of higher-order derivatives and the total variation, on the total fractional-order derivatives, and on a combination of the total variation seminorm and L p norms. One of the successful high-order regularization was the combination of TV and TV 2 terms proposed by Chambolle and Lions [15,16] (see also [46]). Namely, they proposed the following specification of (1): As follows from (4), in this model, the diffusion is strictly perpendicular to the gradient, where |Dv| > β; that is, where edges are most likely present, and it is isotropic where |Dv| ≤ β. So, this model is successful in restoring images where homogeneous regions are separated by distinct edges. However, if the image intensities representing objects are non-uniform or if an image is highly degraded, this model may become sensitive to the threshold β. Not so long ago, Brinkmann et al. [8] proposed a new model that generalize most of the known regularizations, including TV and TGV, and also the combination between them. This model is based on the introduction of a vector field whose nonzero entries are concentrated at the edges of the image.

Regularization Terms with Variable Exponent (1 ≤ p(x) ≤ 2)
This case is rather new in statement of the image denoising problem, albeit the conjecture to utilize this idea was firstly pushed forward by Blomgren et al. [5] at the end of 90th. In particular, they proposed the following variant of regularization term where lim s→0 p(s) = 2, lim s→∞ p(s) = 1, and p is a monotonically decreasing function. In spite of the natural expectations that this model will reap the benefits of both isotropic and TV-based diffusion, as well as a combination of the two, its real exploitation was rather questionable because of some principle difficulties coming from the absence of its rigorous mathematically substantiation. Further specifications of this approach can be found in [18,19], where the authors discuss the following model: with , |Du| > β. dx.
Here, β > 0 is a given threshold, and k > 0 and σ > 0 are fixed parameters. A similar model to (6)- (9) has been later proposed in [42], where, instead of (7), the authors considered the variant R(u) = 1 p(x) |Du| p(x) with the same choice of variable exponent p(x).
The main benefit of the model (6)-(9) is the manner in which it accommodates the local image information. Where the gradient of the noisy image f is sufficiently large (i.e., likely edges), only TV-based diffusion will be used. Where the gradient is close to zero (i.e., homogeneous regions), the model is isotropic. At all other locations, the filtering is somewhere between Gaussian and TV-based. However, as follows from (8), the type of anisotropy in (6) is completely predefined by the structure of the observed, noisy image f , whereas denoised image u may have other structure with other location of edges and other shape of homogeneous regions.
In spite of the fact that there are many other different variants for the choice of regularization term using the so-called directional total variation [9,10] and flexible space-variants anisotropic regularization [12,39], to the best of our knowledge, the development of effective choice of Banach spaces B 1 , B 2 and a proper regularization term R(v) for general image denoising problem with different noise distributions remains an open problem for nowadays.

The Choice of Fidelity Term
As for the choice of the fidelity term 1 (1), it mainly depends on the noise distribution. In most cases, when the noise is assumed to be additive with Gaussian distribution, the L 2 -norm is considered (see, for example, [14]). At the same time, if the image is corrupted by noise with different statistical properties, other fidelity terms are investigated [7,45,51]. For instance, when the image is corrupted by impulse noise, a typical choice for the space B 1 is L 1 ( ). In fact, the use of L 1 -norm is more robust to recover the exact solution compared to the L 2 -norm, which suffers from the contrast loss and other artifacts.
There exist many works in the literature regarding the choice of the fidelity term, where the authors propose different variants including L 1 + L 2 data fidelity [29] and the so-called infimal convolution combination of data fidelities [13,26]. Recently, the non-convex functions have been performed as a fidelity term and have achieved a considerable success in removing complex non-smooth noise [25,53]. In particular, the following concave variant of fidelity term has been proposed in [1]: where γ is a strictly positive constant used to control the concavity rate. However, when the Gaussian noise level is very high compared to the impulse one, the exploitation of the concave variant of fidelity term can lead to the appearance of some artifacts in denoised images. So, the correct choice of the γ parameter, which controls the concavity rate, is a nontrivial and challenging problem.

Motivations
A common feature of the models used in denoising and deblurring problems is the fact that a true image u together with the rest feasible solutions belongs to the same functional space B 2 (with B 2 = BV ( ) or B 2 = W 1,s ( ) in a particular cases), whereas it is natural to suppose that the regularity of the restored image u should be low in places in where edges or discontinuities are present in u, and that it should be high in places where u is smooth or contains homogeneous features.
Because of the specifics of the satellite images with crop fields, such images typically contain a significant portion of edges (boundaries of the crop locations). That is why the important question is to separate pure noise from edges. Moreover, since a typical satellite image u 0 can be viewed as a number of separate scalar images (spectral channels) T which have been captured by different sensors in different zones of electromagnetic spectrum with different space resolution, the level of noise and its distribution can essentially variate from channel to channel. Figure 1 provides a visual comparison of red and nearinfrared channels for a satellite image from Sentinel-2. So, it is plausible to suppose that different channels However, we cannot achieve such regularity for the recovered images in the framework of the standard setting of denoising problem. Even if we make use of the model (6)-(9), we can easily deduce that any its solution u ∈ BV ( ) ∩ L 2 ( ) has the following Sobolev regularity: u ∈ W 1, p(x) ( ), where the exponent p(x) is completely determined by the original noisy image f , whereas regularity of images f and u and their texture can be drastically different. It says that the flexibility of the model (6)-(9) is not enough, especially for the hyperspectral images. In view of this the idea to involve into consideration the variational model (1) with an appropriate choice of variable character of exponent p(s) in the regularization term (5), sounds more attractive.

Contribution
The main purpose of this paper is to present a robust approach for the denoising and deblurring of non-smooth hyperspectral satellite images u 0 ∈ L 1 ( ; R M ) using the energy functional with nonstandard growth. In fact, we use the L 1 -norm of the noise in the minimization function and a special form of anisotropic diffusion tensor for the regularization term. Namely, we deal with the following optimization problem: where (G σ * v) (x) determines the convolution of function v with the N -dimensional Gaussian filter kernel G σ (see (9)), v is zero-extension of v outside , |ξ | stands for the Euclidean norm of ξ ∈ R N given by the rule |ξ | = (ξ, ξ ) R N , and In particular, the definition of variable exponent p( where the edgestopping function g(s) is taken in the form of the Cauchy law implies that p(x) ≈ 1 in places in where edges or discontinuities are present in the spectral channels of an image u(x), and p(x) ≈ 2 in places where u(x) is smooth or contains homogeneous features (see Lemma 1).
As for the data fidelity term, in contrast to the quadratic data-fitting term in the ROF model (3) (see also (1)), where it is specialized for zero-mean Gaussian noise, we take this term in L 1 -norm just in order to apply the model to other types of noise, especially when the noise is impulsive or contains gross outliers. Following this way, we are going to increase the noise robustness of the model (10) albeit it makes such variational problem completely non-smooth and, hence, significantly more difficult from a minimization point of view.
So, the main characteristic feature of this model is that we involve into consideration the energy functional with the nonstandard growth where the edge information for restora- . Basically, by analogy with the model (6)-(9), here we utilize a similar manner of accommodating the local information. However, there is a principle difference between the models (6)-(9) and (10). In contrast to [18,19,42], we do not predefine the variable exponent p(x) a priori using for that the original noisy image, but instead we associate this characteristic with each feasible solution. As a result, we admit that each feasible solution to this problem lives in the corresponding 'personal' functional space W 1,F(u i (·)) ( ). Formally it means that we look for the true image u rec such that As follows from the definition of Sobolev-Orlicz space , its regularity is completely determined by the exponent F(u rec i (·)) which depends on i-th spectral channel of the true image u rec and, hence, is unknown a priori. Moreover, as it was shown in Fig. 1, the exponents F(u rec 1 (·)), F(u rec 2 (·)), . . . , F(u rec M (·)) may significantly differ from channel to channel. In particular, some pixels, which are the local minimum points in red channel, become local maximum points in near-infrared channel and vice versa. Moreover, the different feasible solutions v 1 = v 2 to the above problem live in different functional spaces: We have v 1 ∈ W 1,F(v 1 (·)) ( ), whereas v 2 ∈ W 1,F(v 2 (·)) ( ). As a consequence, any minimizing sequence to this problem is, in fact, a sequence living in the scale of variable spaces. As a result, such notions as a convergence concept, compactness, density and others should be specified for the case of variable Sobolev-Orlicz spaces.
Thus, in spite of the fact that in the literature there are many approaches to the study of variational problems in abstract functional spaces, the above-mentioned circumstances make the problem (10) challenging (see [9,18,19,21,22,30] for the recent studies in this field).
In summary, the main contributions of our paper can be enumerated as follows: • The variational statement for the hyperspectral image denoising and deblurring in the form of minimization problem in Sobolev-Orlicz spaces with non-standard growth conditions of the objective functional; • Rigorous substantiation of the well posedness of the variational problem with non-standard growth functional; • The proof of existence result to the proposed variational problem; • The iterative algorithm for the numerical implementations; • Rigorous derivation of the first-order necessary conditions for the approximating problem; • The approximation scheme with fictitious control and rigorous substantiation of the attainability properties of its solutions.
The remainder of the paper is organized as follows: Sect. 2 focuses on the well-posedness of the proposed model for the image denoising and deblurring problem, and on the solvability issues. In Sect. 3, we propose an iterative algorithm for the approximate solution of the denoising problem and discuss its convergence properties. The deriving of optimality conditions for approximating problem and their rigorous substantiation, we provide in Sect. 4. Since the proposed iterative algorithm leads to the so-called weak solutions, the possible ways for the relaxation of the constrained minimization problem (10) are discussed in Sect. 5. With that in mind we introduce a family of minimization problems with fictitious control and show that each of these problem is solvable and their solutions converge to the solution of the original problem in an appropriate topology. For illustration of the proposed approach, we give in Sect. 6 some results of numerical experiments with real satellite images. Finally, we give in "Appendix" section the main auxiliary results concerning the Orlicz spaces and Sobolev spaces with variable exponent.

Existence Result
Our main intention in this section is to show that constrained minimization problem (10) is consistent and admits at least one solution. Because of the specific form of the energy functional J (v), we cannot assert that minimization problem (10) is well defined on the entire set W 1,α ( ) with some appropriate exponent α ≥ 1. Moreover, the structure and main topological properties of the set of feasible solution to minimization problem (10) are challenging issues. So, the study of these issues for minimization problem (10) is the main subject of this section. (We can refer to [5,9,18,23] for some specific details that can appear in this case.) We begin with the following key assumption: The true intensities u i of all spectral channels for the retrieved image u = [u 1 , . . . , u M ] are subjected to the constraints γ 0,i ≤ u i (x) ≤ γ 1,i a.e. in , where We say that a function u rec = u rec 1 , u rec i.e., for each i = 1, . . . , M, Here, i is the set of feasible solutions to the problem (16) which we define as follows: where F(v) is given by (2), and W 1, p(·) ( ) is the Sobolev-Orlicz space (for the details we refer to Appendix A).
Hereinafter, we associate with each spectral channel u rec i of the restored optical image u rec = u rec 1 , u rec 2 , . . . , u rec M T : → R M the so-called texture index p i : → R following the rule: where g:[0, ∞) → (0, ∞) is the edge-stopping function which we take in the form of the Cauchy law g(t) = 1 1+(t/a) 2 .
As follows from representation (12) and smoothness of the Gaussian filter kernel G σ , we have the following estimates The following result plays a crucial role in the sequel.
uniformly in as k → ∞, Proof In view of the initial assumptions, the sequence {v k } k∈N is uniformly bounded in L 1 ( ). Hence, by smoothness of the Gaussian filter kernel G σ , it follows that Then, L 1 -boundedness of {v k } k∈N guarantees the existence of a positive value δ ∈ (0, 1) (see (18)) such that estimate (18) holds true for all k ∈ N. Moreover, as follows from the relations and smoothness of the function ∇G σ (·), there exists a positive constant C G > 0 independent of k such that we see that Since max x∈ | p k (x)| ≤ β and each element of the sequence { p k } k∈N has the same modulus of continuity, it follows that this sequence is uniformly bounded and equi-continuous. Hence, by Arzelà-Ascoli theorem, the sequence { p k } k∈N is relatively compact with respect to the strong topology of C( ). Taking into account the estimate (20) and the fact that the set S is closed with respect to the uniform convergence and by definition of the weak- * convergence in L ∞ ( ), we deduce: p k (·) → p(·) uniformly in as k → ∞, where p(x) = 1 + g (|(∇G σ * v) (x)|) in . The proof is complete.
For our further analysis, we make use of the following result (we refer to [55,Lemma 13.3] for comparison) concerning the lower semicontinuity property of the variable L p k (·) -norm with respect to the weak convergence in L p k (·) ( ).

Proposition 1
If a bounded sequence f k ∈ L p k (·) ( ) k∈N converges weakly in L α ( ) to f for some α > 1, then f ∈ L p(·) ( ), f k f in variable L p k (·) ( ), and Proof In view of the fact that Using the Young's inequality ab ≤ |a| p / p + |b| p / p , we have for p k (x) = p k (x)/( p k (x) − 1) and any ϕ ∈ C ∞ 0 (R N ). Then, passing to the limit in (23) as k → ∞ and utilizing property (A8) and the fact that we obtain Since the last inequality is valid for all ϕ ∈ C ∞ 0 (R N ) and C ∞ 0 (R N ) is a dense subset of L p (·) ( ), it follows that this relation also holds true for ϕ ∈ L p (·) ( ). So, taking ϕ = | f (x)| p(x)−2 f (x), we arrive at the announced inequality (21). As a consequence of (21) and estimate (A4), we get: f ∈ L p(·) ( ).
To end the proof, it remains to observe that relation (24) holds true for ϕ ∈ C ∞ 0 (R N ) as well. From this the weak convergence f k f in the variable Orlicz space L p k (·) ( ) follows.

Remark 1
Arguing in a similar manner and using, instead of (23), the estimate it can be shown that the lower semicontinuity property (21) can be generalized as follows: (26) Following in some technical aspects the recent studies [21,22,30], we give the following existence result. So, without loss of generality, we can suppose that J i (v k ) ≤ ζ + 1 for all k ∈ N. From this and the initial assumptions, we deduce where p k (x) = 1+g (|(∇G σ * v k ) (x)|) in , and, therefore, sup k∈N sup x∈ p k (x) ≤ 2 (see Lemma 1).
Utilizing the fact that v k (x) ≤ γ 1,i for almost all x ∈ , we infer the following estimate Then, arguing as in Lemma 1, it can be shown that p k ∈ C 0,1 ( ) and Taking this fact into account, we deduce from (27), (28), uniformly with respect to k ∈ N. Therefore, there exist a subsequence of {v k } k∈N , still denoted by the same index, and a function u rec where, by Sobolev embedding theorem, Moreover, passing to a subsequence if necessary, we have (see Proposition 1 and Lemma 1): where u rec i ∈ W 1, p rec (·) ( ).
for all k ∈ N, it follows from (31) that the limit function u rec i is also subjected to the same restriction. Thus, u rec i is a feasible solution to minimization problem (16).
Let us show that u rec i is a minimizer of this problem. With that in mind, we note that in view of the obvious inequality we have: The sequence T i (v k (x)) − u 0,i (x) k∈N is bounded in L 1 ( ), equi-integrable in , and in view of (31), it strongly converges in L 1 ( ) to T i u rec It remains to notice that due to the properties (27), (30), the sequence |∇v k | ∈ L p k (·) ( ) k∈N is bounded and weakly convergent to |∇u rec i | in L α ( ). Hence, by Proposition 1, the following lower semicontinuous property lim inf holds true. As a result, utilizing relations (32) and (33), we finally obtain Thus, u rec i is a minimizer to the problem (16), whereas its uniqueness remains an open question.

Iterative Algorithm Based on the Variational Convergence of Extremal Problems
Our prime interest in this section is to propose an approximation approach to the solution of the minimization problem (16) that would be effective from numerical simulations point of view. After discussion of the main steps of the proposed iterative algorithm, we supply it by a rigorous substantiation.
As an alternative approach to the study of minimization problems (16), there can be recommended the modular-proximal gradient algorithms in variable Lebesgue spaces that were recently developed in [41]. However, the efficiency of such algorithms in the context of a particular form of the objective functional (65) seems to remain an open question nowadays. Let i ∈ {1, . . . , M} be a given spectral channel. Mainly basing on the concept of relaxation of extremal problems and their variational convergence [31,33,38], we propose the following algorithm. At the first step, we set up where Then, for each k ≥ 1, we set It is worth no notice that such definition of the elements u k is correct. Indeed, arguing as in the proof of Theorem 2 and using convexity arguments, it can be shown that for each p(·) ∈ S, there exists a unique element u ). This fact reflects the principle difference between optimization problems (37) and (16), where the problem (16) can be viewed as a minimization of the growth energy functional (16) with 'the frozen exponent' p k (x).
Before proceeding further, we introduce the following set: where C > 0 and δ > 0 are defined by (20) and (29), respectively.
Utilizing the arguments of the proof of Theorem 2, it can be shown that for given i = 1, . . . , M, μ > 0, and u 0 ∈ L 1 ( , R M ), the sequence u k ∈ W 1, p k (·) ( ) k∈N is compact with respect to the weak topology of W 1,α ( ), whereas the exponents { p k } k∈N are compact with respect to the strong topology of C( ).
We say that a function u i is a weak solution to the original problem (16) if

Remark 2
The relation between a weak solution and a solution to the problem (16) is very intricate. Since the uniqueness of solutions to (16) is a rather questionable option, it follows that, in principle, these definitions can describe different functions in i . As immediately follows from (38), a weak solution is a merely feasible one to the original problem. However, if the problem (16) admits a unique solution (u 0 i , p 0 i ) ∈ i , then (38) implies that this function is considered as a weak solution. On the other hand, if u * i ∈ i is some solution to (16), then setting in (34) for all x ∈ , we immediately arrive at the conclusion: )) for all k ∈ N and, therefore, u * i is a weak solution to the above problem. So, in general, it would be provocative to assert that all weak solutions to the problem (16) are local minimizers of (16). In view of arguments given above, these solutions are merely stationary points of the original problem.
Our main goal in this section is to establish the existence of a weak solution to the original problem (16) and show that it can be attained by some iterative algorithm.
We begin with some technical results.
Proof Let us show that the sequence of minimizers u k k∈N of (38) is bounded in the sense of condition (A9). Let u ∈ C 1 ( ) be an arbitrary function such that γ 0,i ≤ u(x) ≤ γ 1,i in . Since From this and definition of the set B i, p k (·) , we deduce Then, estimate (A4) implies that the sequence u k k∈N is bounded in W 1,α ( ). So, its weak compactness is a direct consequence of the reflexivity of W 1,α ( ).
We notice that boundedness of the sequence u k k∈N in W 1,α ( ) and compactness of the embedding W 1,α ( ) → L q ( ) for q ∈ 1, N α N −α imply the existence of an element u * ∈ W 1,α ( ) such that, up to a subsequence, Then, using (42) and passing to the limit in two-side inequal- Utilizing this fact together with the pointwise convergence (42), by the Lebesgue dominated convergence theorem, we have Since, by Arzelà-Ascoli theorem, the set { p k = 1 + g ∇G σ * u k−1 (x) k∈N is compact with respect to the strong topology of C( ), it follows from (45) (see also the proof of Lemma 1) that p k → p * = F(u * (x)) strongly in C( ) as k → ∞, and p * ∈ S.
Then, properties (42)- (46) and Proposition 1 imply: Thus, the iterative procedure (36) has a cluster point (u * , p * ) ∈ B i, p * (·) ×S with respect to the convergence (42)- (43), (46). The main result of this section can be stated as follows: Theorem 3 Let μ > 0, and u 0 ∈ L 1 ( , R M ) be given data. Then, for each i ∈ {1, . . . , M}, the sequence of approximated solutions (u k , p k ) k∈N possesses the asymptotic properties: where u i is a weak solution to the problem (16), that is, and, in addition, the following variational property holds true Proof As immediately follows from Lemma 2 and reasoning given above, the sequence (u k , p k ) k∈N is compact with respect to the convergence (48)- (50). Let ( u i , p i ) be its cluster point. In order to show that the function u i is a weak solution to the problem (16), we assume the conversenamely, there is another function z ∈ B i, p i (·) such that Using the procedure of the standard direct smoothing, we set where ε > 0 is a small parameter, K is a positive compactly supported smooth function with properties and z is zero extension of z outside of . Since z ∈ W 1, p(·) ( ) and p(x) ≥ α = 1 + δ in , it follows that z ∈ W 1,α ( ). Then, by the classical properties of smoothing operators. From this we deduce that Moreover, taking into account the estimates we see that each element u ε is subjected to the pointwise constraints Since, for each ε > 0, u ε ∈ W 1, p k (·) ( ) for all k ∈ N, it follows that u ε ∈ B i, p k (·) , i.e., each element of the sequence {u ε } ε>0 is a feasible solution to all approximating problems . Hence, Further, we notice that by Proposition 1 and Fatou's lemma, and uniformly in as k → ∞, it follows from the Lebesgue dominated convergence theorem and (57) that lim k→∞ J i (u ε , p k (·)) = J i (u ε , p i (·)), ∀ε > 0.
As a result, passing to the limit in (55) and utilizing properties (56)-(58), we obtain for all ε > 0. Taking into account the pointwise convergence (see (54) and property (53)) as ε → 0, and the fact that, for ε small enough, we can pass to the limit in (59) as ε → 0 by the Lebesgue's dominated convergence theorem. This yields

Optimality Conditions
To characterize the solution u 0, p(·) ∈ B i, p(·) of the approximating optimization problem inf v∈B i, p(·) J i (v, p(·)) , we check that the functional J i (v, p(·)) is Gâteaux differentiable with respect to v, that is, To this end, we note that almost everywhere in . Since, by convexity, Taking into account that and |∇v(x)| p(x) dx by( A4) ≤ ∇v 2 L p(·) ( ,R N ) + 1, we see that the right-hand side of inequality (62) is an L 1 ( )function. Therefore, by the Lebesgue's dominated convergence theorem. Utilizing similar arguments to the rest of terms in (10), we deduce that the representation (61) for the Gâteaux differential of J i (·, p(·)) at the point u 0, p(·) ∈ B i, p(·) is valid.
Thus, in order to derive some optimality conditions for the minimizing element u 0, p(·) ∈ B i, p(·) to the problem inf v∈B i, p(·) J i (v, p(·)), we note that B i, p(·) is a nonempty convex subset of W 1, p(·) ( ) and the objective functional J i (·, p(·)) : B i, p(·) → R is strictly convex. Hence, the well-known classical result (see [43,Theorem 1.1.3]) and representation (61) lead us to the following conclusion.
Theorem 4 Let p k (·) ∈ S be an exponent given by the iterative rule (36). Then, the unique minimizer u k ∈ B i, p k (·) to the approximating problem inf

On Approximation of the Reconstruction Problem
It is clear that because of the nonstandard growth energy functional and its non-convexity, constrained minimization problem (16) is not trivial in its practical implementation. The main difficulty in its study comes from the state constraints that we impose on the variable exponent and the set of feasible solutions i . The iterative algorithm that was proposed in the previous sections allows to attain in the limit the socalled weak solution to the minimization problem (16). So, the question whether some optimal solutions can be reachable in such way remains open. This motivates us to pass to another relaxation scheme of the original variational problem.
As the main step of this procedure, we propose to consider the function p(·) := F(v(·)) as a fictitious control subjected to some special constraints and interpret the fulfillment of equality F(v(x)) = 1+ g (|(∇G σ * v) (x)|) with some accuracy in . To do so, we notice that if v ∈ i is a feasible solution to (16), then F(v(·)) is subjected to the two-side inequality (28) with δ ∈ (0, 1) given by (29). Keeping this in mind and following in some aspects the standard penalty method (see, [52,Chapter 2], see also [34][35][36][37]40]), we consider the following family of approximating problems: with C > 0 given by (20), α = 1 + δ, and δ > 0 is given by (29). We assume that the parameter ε varies within a strictly decreasing sequence of positive real numbers which converge to 0. So, when we write ε > 0, we consider only the elements of this sequence.

Remark 3
It is clear that the condition p ∈ S ad together with the fact that S ad is a compact subset in C( ) imply: Every cluster point of a sequence { p k } k∈N ⊂ S ad with respect to the uniform topology is a regular exponent, i.e., it is an exponent satisfying the log-Hölder continuity condition [55]. In this case, the set C ∞ 0 (R N ) is dense in W 1, p(·) ( ) [20] and this fact plays a crucial role in approximation of minimization problem (16).
The principle point in the statement of approximated problem (65) is the fact that we pass from the state constrained optimization problem (16) with the variable exponent p(x) = F(v(x)) strongly depending on the function of interest v to its approximation where we eliminate the equality constraint p(x) = F(v(x)) between the state v(x) and exponent p(x) and allow such pairs run freely in their respective sets of feasibility.
We begin with the following existence result.

Theorem 5
For each i = 1, . . . , M, every positive value ε > 0, and given μ > 0, u i,0 ∈ L 1 ( ), and T i ∈ L(L 1 ( )), the minimization problem (65) has at least one solution. Fig. 2 Example. Left panel: noisy satellite image. Middle panel: reconstruction using total variation (TV) approach. Right panel: reconstruction using the approach introduced in this paper A step size restriction is imposed for stability by condition: /h 2 ≤ C. For numerical simulations in this section, we set: ε = 0.01, σ = 0.5, a = 5, β = 2, γ 0,i = 1, γ 1,i = 255, T i = I d, i = 1, 2, 3. For the three-channel satellite image depicted on the left panel in Fig. 2, we have conducted k = 5 iterations with the time step t = 0.01 at each iteration to obtain result seen on the right panel in Fig. 2. For comparison, Fig. 2b shows the result of a denoising using the ROF model (3).

Author Contributions
The authors declare that all of them have contributed to the realization of the results.
Funding Open access funding provided by Università degli Studi di Salerno within the CRUI-CARE Agreement. No funds, grants, or other support was received.
Availability of data and materials Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.