1 Introduction

Image scaling aims to get the image at a different size preserving the original content as much as possible, with minor loss of quality, in two opposite ways: downscaling and upscaling. Downscaling is a compression process by which the size of the high-resolution (HR) input image is reduced to recover the low-resolution (LR) target image. Conversely, upscaling is an enhancement process in which the size of the LR input image is enlarged to regain the HR target image.

Image scaling (also termed image resampling or image resizing) is a widely used tool in several fields such as medical imaging, remote sensing, gaming, electronic publishing, autonomous driving, and aerial photography [1,2,3,4,5,6]. For example, upscaling allows highlighting of important details of the image in remote sensing and medical applications [1, 2], while downscaling is a fundamental operation for fast browsing or sharing purposes [3, 4]. Other applicative examples regard scenarios like deforestation, monitoring, traffic, surveillance, and many other engineering tasks. Sometimes image scaling is used for illicit purposes, for example, to automatically generate camouflage images whose visual semantics change dramatically after scaling [7]. In these cases, it is very important to detect the scaling effects in order to defend against such attacks and adopt suitable countermeasures [8, 9].

From a computational point of view, image scaling can be addressed by different numerical methods (see Sect. 2), whose main critical points typically are: (a) undesired effects, such as ringing artifacts and aliasing, due to the increase/decrease in the number of pixels which introduces/reduces information to the image; (b) computational efficiency in performing the resampling task in real-world applications. Moreover, most existing methods treat the resampling in only one direction since downscaling and upscaling are often considered separate problems in literature [10].

We aim to propose a scaling method that works in both downscaling and upscaling directions. To this aim, looking at the scaling problem as an approximation problem, we employ an interpolation polynomial based on an adjustable filter of de la Vallée- Poussin (briefly VP) type, which can be suitably modulated to improve the approximation (see, e.g., [11,12,13]).

Indeed, the VP type interpolation has been introduced in the literature as a valid alternative to Lagrange interpolation to provide a better pointwise approximation, especially when the Gibbs phenomenon occurs [12, 14, 15]. In fact, an interesting feature of VP filtered approximation is the presence of a free additional degree—parameter, which is responsible for the localization degree of the polynomial interpolation basis (the so—called fundamental VP polynomials) around the nodes. By changing this parameter, we may modulate the typical oscillatory behavior of the fundamental Lagrange polynomials according to the data, improving the approximation without destroying the interpolation property and keeping fixed the number of the interpolation nodes. Moreover, it is also worth noting that VP interpolation can be embedded in a wavelet scheme with decomposition and reconstruction algorithms very fast since based on fast cosine transforms [16].

From a theoretical point of view, the literature concerning VP filtered approximation provides many convergence theorems, also in the uniform norm. They estimate an error comparable with the error of the best polynomial approximation [13, 17] and allow to predict the convergence order from the regularity of the function to approximate [18]. Due to such nice behavior, VP approximation has been usefully applied as a demonstration tool to carry out proofs of different theorems [19,20,21,22,23].

From a more applicative point of view, it has been used to solve singular integral and integro-differential equations [24, 25] or derive good quadrature rules for the finite Hilbert transform [26, 27]. However, to our knowledge, it has never been applied to image processing. Hence, the present paper represents the first step in investigating how the VP interpolation scheme can be usefully employed in image scaling.

To explain the proposed scaling method (shortly denoted by VPI method or simply by VPI), as a starting point, we consider that the input RGB image is represented at a continuous scale by a vectorial function (with separate channels for each color) whose sampling yields the pixels values. We globally approximate such function using suitable VP interpolation polynomials, modulated by a free parameter \(\theta \in ]0,1]\) [11, 13,14,15, 18]. Hence, we get the resized image by evaluating such VP polynomials in a denser (upscaling) or coarser (downscaling) grid of sampling points.

Being designed both for downscaling and upscaling, VPI method is flexible and implementable for any scale factor. The rescaling can be obtained by specifying the scale factor or, alternatively, the desired size of the image. We point out that, in the following, to distinguish between upscaling and downscaling mode, we use the notation u-VPI and d-VPI, respectively.

Both in upscaling and downscaling, for the limiting parameter choice \(\theta =0\), VPI coincides with the LCI method proposed in [28] and based on classical Lagrange interpolation at the same nodes. Moreover, for any choice of the parameter \(\theta \), d-VPI with odd scale factors also produces the same output resized image of LCI, which results by a direct assignment without any computation and any prefiltering operation. In these cases, if the LR image satisfies the Nyquist–Shannon sampling theorem [29], d-VPI produces a MSE not greater than input MSE times the scale factor squared. Thus, we can get a null MSE and the best visual quality measures in case of exact input data (cf. Proposition 1). However, we point out that in cases where the downscaling size violates the sampling theorem, aliasing effects occur. Experiments in the paper also deepen this aspect, and a partial solution is proposed, remaining the problem open to further investigations.

A further contribution of this paper includes a detailed quantitative and qualitative analysis of the obtained results on several publicly available datasets commonly used in Image Processing. The experimental results confirm the effectiveness and utility of employing the VP interpolation scheme, achieving on average a good compromise between visual quality and processing time: The resized images present few blurred edges and artifacts, and the implementation is computationally simple and rather fast.

On average, VPI has a competitive and satisfactory performance, with quality measures generally higher and more stable than other existing scaling methods considered as a benchmark. In general, we have satisfactory performance, also for high-scale factors, compared to the benchmark methods. Specifically, in downscaling, when the free parameter is not equal to zero, VPI improves the LCI performance and results to be more stable than the latter due to the uniform boundedness of Lebesgue constants corresponding to the de la Vallée-Poussin type interpolation. Moreover, VPI results much faster than the methods specialized in only downscaling or upscaling.

At a visual level, VPI captures the object’s visual structure by preserving the salient details, the local contrast, and the luminance of the input image, with well-balanced colors and a limited presence of artifacts. To about the same extent as the other benchmark methods, in downscaling VPI exhibits aliasing effects.

Overall, due to its features, we consider VPI suitable for real-world applications, and, at the same time, we look at it as a complete method because it can also perform upscaling and downscaling with adequate performance.

The remainder of this paper is as follows. In Sect. 2, we outline the related work, briefly explaining the benchmark scaling methods we employ in the experimental phase. In Sect. 3 we provide the mathematical background. In Sect. 4, we describe the VPI method and state its main properties. In Sect. 5 we provide the most relevant implementation details and the qualitative/quantitative evaluations of the experimental results taken over a significant number of different image datasets. Finally, conclusions are drawn in Sect. 6.

2 Related Work

Image scaling has received great attention in the literature of the past decades, during which many methods based on different approaches have been developed. An overview containing pros and cons for some of them can be found in [30, 31].

Traditionally, image scaling methods are grouped into two categories [32]: non-adaptive [33,34,35,36,37] and adaptive [38,39,40,41,42]. In the first category, all the pixels are equally treated. In the second, suitable changes are arranged, depending on image features and intensity values, edge information, texture, etc. The non-adaptive category includes many of the most commonly used algorithms such as the nearest neighbor, bilinear, bicubic, and B-splines interpolation, Lanczos method [32,33,34,35,36, 43]. Adaptive methods are designed to maximize the quality of the results. They are also employed in most common approaches such as context-aware computing [38], segmentation techniques[39], and adaptive bilinear schemes [40]. Machine learning (ML) methods can be ascribed to the latter category, even if they are often considered as a separate problem [41, 42]. The learning paradigm of ML methods aims to compensate for complete (missing) information of the downscaled (upscaled) image using a relationship between HR (LR) and LR (HR) images. Mostly, this paradigm is implemented by a training step, in which the relationship is learned, followed by a step in which the learned knowledge is applied to unseen HR (LR) images.

Usually, non-adaptive scaling methods have problems of blurring or artifacts around edges and only store the low-frequency components of the original image. On the other hand, adaptive scaling methods generally provide better image visual quality and preserve high-frequency components. However, adaptive methods take more computational time as compared to non-adaptive ones. In turn, the ML methods ensure high-quality results but, at the same time, require extensive learning based on a huge number of parameters and labeled training images.

In this section, we limit to describe shortly the methods considered in the validation phase of the VPI method (see Sect. 5), namely DPID [44], L\(_0\) [45], SCN [46], LCI [28], and BIC [47]. The source code of such methods is made available by the authors themselves in a common language (MATLAB). Except for BIC and LCI, these methods are designed and tested considering the problem of resizing in one direction, i.e., in downscaling (DPID and L\(_0\)) or upscaling mode (SCN).

DPID is based on the assumption that the Laplacian edge detector and adaptive low-pass filtering can be useful tools to approximate the behavior of the human visual system. Important details are preserved in the downscaled image by employing convolutional filters and by selecting the input pixels that contribute more to the output image the more their color deviates from their local neighborhood.

In L\(_0\), an optimization framework for image downscaling, focusing on two critical issues, is proposed: salient features preservation and downscaled image construction. Accordingly, two L\(_0\)-regularized priors are introduced and applied iteratively until the objective function is verified. The first, based on gradient ratio, allows preserving the most salient edges and the visual perceptual properties of the original image. The second optimates the downscaled image by the guidance of the original one, avoiding undesirable artifacts.

SCN (Sparse Coding based Network) adopts a neural network based on sparse coding, trained in a cascaded structure from end to end. It introduces some improvements in terms of both recovery accuracy and human perception employing a CNN (Convolutional Neural Network) model.

In LCI, the input RGB image is globally approximated by the bivariate Lagrange interpolating polynomial at a suitable grid of first kind Chebyshev zeros. The output RGB image is obtained by sampling this polynomial at the Chebyshev grid of the desired size. Since the LCI method works both in upscaling and downscaling, according to the notation in [28], we use the notation u-LCI and d-LCI in upscaling and in downscaling, respectively.

BIC, one of the most commonly used rescaling methods, employs bicubic interpolation. It computes the unknown pixel value as a weighted average of \(4\times 4\) pixels closest to it. Note that BIC produces noticeably sharper images than the other classical non-adaptive methods such as bilinear and nearest neighbor, offering w.r.t. them a favorable quality image and processing time ratio.

We remark that in the following, BIC is implemented by the MATLAB built-in function imresize with bicubic option. For the other methods, we used the publicly available MATLAB codes provided by the authors with the default parameters settings.

3 Mathematical Preliminaries

Let I denote any color image of \(n_1\times n_2\) pixels, with \(n_1,n_2\in {\mathbb {N}}\). As is well-known, in the RGB space I is represented by means of a triad of \(n_1\times n_2\) matrices that we indicate using the same letter of the image they compose, namely \(I_\lambda \), with \(\lambda =\) 1:3 (i.e., \(\lambda =1,2,3\)). The entries of these matrices are integers from 0 to \(\max _f\) that denotes the maximum possible value of the image pixel, (e.g., \(\max _f= 255\) if the pixels are represented using 8 bits per sample). On the other hand, such discrete values can be embedded in a vector function of the spatial coordinates, say \({\textbf{f}}(x,y)=[f_1(x,y), f_2(x,y), f_3(x,y)]\), which represents the image at a continuous scale and whose sampling yields its digital versions of any finite size.

Hence, once fixed the sampling model, that is the system of nodes

$$\begin{aligned} X_{\mu \times \nu }=\{(x_i^\mu , y_j^\nu )\}_{i=1:\mu , j=1:\nu }, \quad \mu ,\nu \in {\mathbb {N}}, \end{aligned}$$
(1)

we suppose that the digital image \(I=[I_1,I_2,I_3]\) has behind the function \({\textbf{f}}=[f_1,f_2,f_3]\) such that

$$\begin{aligned} I(i,j)={\textbf{f}}(x_i^{n_1},y_j^{n_2}),\quad i=1:n_1,\quad j=1:n_2. \end{aligned}$$
(2)

In both downscaling and upscaling, the goal is getting an accurate reconstruction of I at a different (reduced and enhanced, resp.) size. Denoting by \(N_1\times N_2\) the new size that we aim to get and denoting by \(R=[R_1,R_2,R_3]\) the target resized image of \(N_1\times N_2\) pixels, according to the previous settings, we have

$$\begin{aligned} R(i,j)={\textbf{f}}(x_i^{N_1},y_j^{N_2}),\quad i=1:N_1,\quad j=1:N_2. \end{aligned}$$
(3)

From this viewpoint, the scaling problem becomes a typical approximation problem: how to approximate the values of \({\textbf{f}}\) at the grid \(X_{N_1\times N_2}\) once known the values of \({\textbf{f}}\) at the finer (in downscaling) or coarser (in upscaling) grid \(X_{n_1\times n_2}\).

Within this setting, the choice of the nodes system (1) as well as the choice of the approximation tool is both decisive for the success of a scaling method. In the next subsections we introduce these two basic ingredients and the evaluation metrics we use for our scaling method.

3.1 Sampling System

Since it is well known that any finite interval [ab] can be mapped onto \([-1,1]\), in the following, we suppose that each spatial coordinate belongs to the reference interval \([-1,1]\), so that the sampling system in (1) is included in the square \([-1,1]^2\).

In the literature, the equidistant nodes model is usually adopted for sampling. According to such traditional model, in (1) the coordinates \(\{x_i^{\mu }\}_i\) and \(\{y_j^{\nu }\}_j\) are those nodes that divide the segment \([-1,1]\) into \((\mu +1)\) and \((\nu +1)\) equal parts, respectively. On the other hand, we recall that other coherent choices of the sampling system (1) have been recently investigated, for instance, in [48, 49] for magnetic particle imaging.

Here we follow the sampling model recently introduced in [28]. According to this model, we assume that (1) is the Chebyshev grid where the coordinates \(\{x_i^{\mu }\}_i\) and \(\{y_j^{\nu }\}_j\) are the zeros of the Chebyshev polynomial of first kind of degree \(\mu \) and \(\nu \), respectively. This means that in (1) we are going to assume that

$$\begin{aligned} x_i^\mu = \cos (t_i^\mu ) \qquad \text {and}\qquad y_j^\nu = \cos (t_j^\nu ) \end{aligned}$$
(4)

where, for all \(n\in {\mathbb {N}}\), it is

$$\begin{aligned} t_k^n=\frac{(2k-1)\pi }{2n}, \qquad k=1:n. \end{aligned}$$
(5)

Hence, supposed that \({\textbf{f}}\) is the vector function representing the image at a continuous scale, at a discrete scale we interpret the digital version of the image with size \(\mu \times \nu \), as resulting from the sampling of \({\textbf{f}}\) at the Chebyshev grid \(X_{\mu \times \nu }\) defined by (1) and (4, 5).

We point out that both the coordinates in (4) are not equidistant in \([-1,1],\) but they are arcsine distributed and become denser approaching the extremes \(\pm 1\). Such nodes distribution is optimal from the approximation point of view but rather unusual in image sampling. Nevertheless, from a certain perspective, our sampling model is related to the traditional sampling at equidistant nodes since the nodes in (5) are equally spaced in \([0,\pi ]\). Indeed, the idea behind our sampling model is to transfer the sampling question from the segment to the unit semicircle, which is divided into equal arcs by the nodes system Eq. (5).

The main advantage of adopting this unusual point of view is the possibility of globally approximating the image, in a stable and near-best way, by the interpolation polynomials introduced in the next subsection.

Fig. 1
figure 1

Fundamental VP polynomials \(\{\varPhi _{m,k}^n\}_{k=1}^n\) for \(n=5\) and \(m=4\)

3.2 Filtered VP Interpolation

Regarding the approximation tool underlying our method, we consider some filtered interpolation polynomials recently studied in [14]. Such kind of interpolation is based on a generalization of the trigonometric VP means (see [11, 50]) and, besides the number of nodes, it depends on two additional parameters which can be suitable modulated in order to reduce the Gibbs phenomenon (see [13, 14]).

More precisely, for any \(n_i,m_i\in {\mathbb {N}}\) such that \(m_i\le n_i\), \(i=1,2\), let

$$\begin{aligned} {\textbf{n}}=(n_1,n_2),\quad \text {and}\quad {\textbf{m}}=(m_1,m_2), \end{aligned}$$

and let nm denote indifferently the first components (i.e., \(n_1,m_1\) resp.) or the second components (i.e., \(n_2,m_2\) resp.) of such vectors. Corresponding to these parameters, for any \(r=0:(n-1)\), we define the following orthogonal VP polynomials

$$\begin{aligned} q_{m,r}^n(\xi )=\left\{ \begin{array}{l} \cos (r t) \qquad \qquad \text {if } 0\le r\le (n-m),\\ \frac{n+m-r}{2m}\cos (r t)+\frac{n-m-r}{2m}\cos ((2n-r)t) \quad \text {if } n-m<r<n, \end{array}\right. \nonumber \\ \end{aligned}$$
(6)

where here and in the following \(\xi \in [-1,1]\) and \(t\in [0,\pi ]\) are related by \(\xi =\cos t\).

We recall the polynomial system in (6) consists of n univariate algebraic polynomials of degree at most \((n+m-1)\) that are orthogonal with respect to the scalar product

$$\begin{aligned} < F, G >=\int _{-1}^1F(\xi )G(\xi )\frac{\textrm{d}\xi }{\sqrt{1-\xi ^2}}. \end{aligned}$$

They generate the space (of dimension n)

$$\begin{aligned} S_m^n{:}{=}\text {span}\{q_{m,r}^n:\ r=0:(n-1)\} \end{aligned}$$

that is an intermediate polynomial space nested between the sets of all polynomials of degree at most \(n-m\) and \(n+m-1\).

The space \(S_m^n\) has also an interpolating basis consisting of the so-called fundamental VP polynomials that, in terms of the orthogonal basis (6), have the following expansion [13, 14]

$$\begin{aligned} \varPhi _{m,k}^{n}(\xi )=\frac{2}{n}\left[ \frac{1}{2} +\sum _{r=1}^{n-1} \cos (r t_k^n) q_{m,r}^n(\xi )\right] ,\quad k=1:n. \end{aligned}$$
(7)
Fig. 2
figure 2

Fundamental Lagrange polynomials \(\{\ell _{n,k}\}_{k=1}^n\) for \(n=5\)

Fig. 3
figure 3

Fundamental polynomials \(\ell _{n,k}\) and \(\varPhi _{m,k}^n\) for \(n=21,\ k=11\) and \(m=\lfloor n\theta \rfloor ,\) with \(\theta \in \{\ 0.4,\ 0.6,\ 0.8\}\)

In Figs. 1 and 2 we show, respectively, the fundamental VP polynomials for \(n=5\) and \(m=4\), and for the same \(n=5\), the well-known fundamental Lagrange polynomials, defined as

$$\begin{aligned} l_{n,k}(\xi )=\frac{2}{n}\left[ \frac{1}{2} +\sum _{r=1}^{n-1} \cos (r t_k^n) \cos (r t)\right] ,\quad k=1:n. \end{aligned}$$
(8)

We see that, similarly to \(\{l_{n,k}(\xi )\}_{k=1}^n\) also the fundamental VP polynomials satisfy the interpolation property

$$\begin{aligned} \varPhi _{m,k}^{n}(\cos t_h^n)=l_{n,k}(\cos t_h^n)=\left\{ \begin{array}{lr} 1 &{}\quad h=k\\ 0 &{}\quad h\ne k \end{array}\right. \end{aligned}$$
(9)

for all \(h,k=1:n\).

In addition to the number n of nodes, we also have the free parameter m which can be arbitrarily chosen (\(m=1:n\) being possible) without losing the interpolation property (9), as stated in [13]. Moreover, we note that also the limiting choice \(m=0\) is possible, being, in this case, \(S_0^n\) equal to the space of polynomials of degree at most \((n-1)\) and

$$\begin{aligned} \varPhi _{0,k}^{n}(\xi )=l_{n,k}(\xi ), \qquad \forall |\xi |\le 1, \qquad k=1:n . \end{aligned}$$
(10)

We recall in [16] both nm are chosen depending on a resolution level \(\ell \in {\mathbb {N}}\) and the fundamental VP polynomials constitute the scaling functions generating the multiresolution spaces \(V_\ell =S_m^n\).

Another choice of m, often suggested in the literature, is the following (see, e.g., [15, 18])

$$\begin{aligned} m=\lfloor \theta n\rfloor , \qquad \text {with } \theta \in ]0,1[, \end{aligned}$$
(11)

where, \(\forall a\in {\mathbb {R}}^+\), \(\lfloor a \rfloor \) denotes the largest integer not greater than a.

Figure 3 displays the plots of the fundamental VP polynomials corresponding to fixed nk and m given by (11) with different values of \(\theta \). Indeed such parameter (and more generally m) is responsible for the localization of the fundamental VP polynomial \(\varPhi _{n,k}^{m}(\xi )\) around the node \(\xi _k^n=\cos t_k^n\). In fact, in Fig. 3 we can see how those oscillations typical of the fundamental Lagrange polynomial \(\ell _{n,k}\) (plotted too) are very dampened by suitable choices of \(\theta \).

By using the fundamental VP polynomials (7) we can approximate any function g(xy) on the square \([-1,1]^2\) by means of its samples at the Chebyshev grid (1) as follows

$$\begin{aligned} V_{{\textbf{n}}}^{{\textbf{m}}}g(x,y){:}{=}\sum _{i=1}^{n_1}\sum _{j=1}^{n_2} g(x_i^{n_1}, y_j^{n_2})\varPhi _{n_1,i}^{m_1}(x)\varPhi _{n_2,j}^{m_2}(y). \end{aligned}$$
(12)

This is the definition of the VP polynomial of g and the approximation tool we use in our method.

By virtue of (9), such polynomial coincides with g at the grid \(X_{n_1\times n_2}\), i.e.,

$$\begin{aligned} V_{{\textbf{n}}}^{{\textbf{m}}}g(x_i^{n_1},y_j^{n_2})=g(x_i^{n_1},y_j^{n_2}), \quad i=1:n_1,\ j=1:n_2, \end{aligned}$$
(13)

Moreover, it has been proved that for any \((x,y)\in [-1,1]^2\), if (26) holds with an arbitrarily fixed \(\theta \in ]0,1[\) then for all continuous functions g, the following limit holds uniformly on \([-1,1]^2\)

$$\begin{aligned} \lim _{{\textbf{n}}\rightarrow \infty }\left| V_{{\textbf{n}}}^{{\textbf{m}}}g(x,y)-g(x,y)\right| =0 \end{aligned}$$

with the same convergence rate of the error of best polynomial approximation of g [14, Th.3.1].

3.3 Quality Metrics

Similar to most of the existing methods in the literature, the performance of our method is quantitatively evaluated and compared with other scaling methods in terms of the peak-signal-to-noise ratio (PSNR) and the structural similarity index (SSIM). For our method such metrics will give a measure of the error between the target resized image \(R=[R_1,R_2,R_3]\) and the output resized image that, in the following, we denote by \(\tilde{R}=[\tilde{R}_1, \tilde{R}_2,\tilde{R}_3]\).

The definition of PSNR is based on the standard definition of the mean squared error between two matrices

$$\begin{aligned} \textrm{MSE}(A,B)= \displaystyle \frac{1}{\nu \mu }\Vert A-B\Vert _F^2,\qquad \forall A,B\in {\mathbb {R}}^{\nu \times \mu } \end{aligned}$$
(14)

being \(\Vert \cdot \Vert _F\) the Frobenius norm defined as

$$\begin{aligned} \Vert A\Vert _F{:}{=}\left( \sum _{h=1}^{\nu }\sum _{k=1}^{\mu } a_{h,k}^2\right) ^\frac{1}{2}, \ \forall A=(a_{h,k})\in {\mathbb {R}}^{\nu \times \mu }. \end{aligned}$$

The extension of such definition to the case of color digital images of \(\nu \times \mu \) pixels can be performed in different ways giving rise to different measures of the related PSNR (see, e.g., [51, 52]). More precisely, for the color images R and \(\tilde{R}\), defining their MSE as follows

$$\begin{aligned} \textrm{MSE}(\tilde{R}, R)=\frac{1}{3} \sum _{\lambda =1}^3 \textrm{MSE}(\tilde{R}_\lambda ,R_\lambda ), \end{aligned}$$
(15)

the first, usually adopted definition of PSNR (used, for instance, in [28]) is the following

$$\begin{aligned} \textrm{PSNR}(\tilde{R}, R)=20 \log _{10}\left( \frac{\max _f}{\sqrt{\textrm{MSE}(\tilde{R},R)}}\right) . \end{aligned}$$
(16)

Another common way to measure the PSNR (also used in [46]) is given by converting to the color space YCrCb both the color RGB images \(R=[R_1,R_2,R_3]\) and \(\tilde{R}=[\tilde{R}_1,\tilde{R}_2,\tilde{R}_3]\), and separating the intensity Y (luma) channels that we denote by \(R_Y\) and \(\tilde{R}_Y\), respectively. We recall they are defined by the following weighted average of the respective RGB components

$$\begin{aligned} R_Y=\sum _{\lambda =1}^3 \alpha _\lambda R_\lambda +\alpha _4,\quad \tilde{R}_Y=\sum _{\lambda =1}^3 \alpha _\lambda \tilde{R}_\lambda +\alpha _4, \end{aligned}$$
(17)

with \(\alpha _i,\ i=1:4\) coefficients of the ITU -R BT.601 standard (see, e.g., [36]). Hence, taking the MSE of the matrices \(R_Y\) and \(\tilde{R}_Y\), the second, commonly used, definition of PSNR is referred to only such luma channel as follows

$$\begin{aligned} \textrm{PSNR}(\tilde{R},R)=20 \displaystyle \log _{10}\left( \frac{\max _f}{\sqrt{\textrm{MSE}(\tilde{R}_Y, R_Y)}}\right) , \end{aligned}$$
(18)

We point out that in our experiments the PSNR has been computed using both the previous definitions. However, for brevity, in this paper we report only the values achieved by definition (18), giving no new insight the results obtained by using the other definition (16).

Finally, also the SSIM metric is defined via the luma channel as follows [53]

$$\begin{aligned} \textrm{SSIM}(\tilde{R}, R)= \frac{\left[ 2\tilde{\mu }\mu +c_1\right] \left[ 2\textrm{cov}+c_2\right] }{\left[ {\tilde{\mu }}^2+\mu ^2+c_1\right] \left[ {\tilde{\sigma }}^2+\sigma ^2+c_2\right] }, \end{aligned}$$
(19)

where \(\tilde{\mu }, \mu \) and \(\tilde{\sigma }, \sigma \) denote the average and variance of the matrices \(\tilde{R}_Y, R_Y\), respectively, \(\textrm{cov}\) indicates their covariance, and the constants are usually fixed as \(c_1=(0.01\times L), c_2=(0.03\times L)\) with the dynamic range of the pixel values \(L=255\) in the case of 8-bit images.

4 VPI Scaling Method

According to the notation introduced in the previous section, both I and R are digital versions (with \(n_1\times n_2\) and \(N_1\times N_2\) pixels, respectively) of the same continuous image represented by the vector function \({\textbf{f}}=(f_1,f_2,f_3)\) (cf. (2), (3)). Nevertheless, to be more general, in view of the finite representation of the data and the accuracy used to store the image, we suppose the effective input image of our method is a more or less corrupted version of I. We denote it by \(\tilde{I}=[\tilde{I}_1,\tilde{I}_1,\tilde{I}_3]\) and assume that there exists a corrupted function \(\widetilde{\textbf{f}}=(\widetilde{f}_1,\widetilde{f}_2,\widetilde{f}_3)\) such that

$$\begin{aligned} \tilde{I}(i,j)=\tilde{\textbf{f}}(x_i^{n_1},y_j^{n_2}),\quad i=1:n_1,\quad j=1:n_2. \end{aligned}$$
(20)

Starting from these initial data, VPI method computes the output image \(\tilde{R}\) having the desired size \(N_1\times N_2\) and defined as follows

$$\begin{aligned} \tilde{R}(i,j)=V_{{\textbf{n}}}^{{\textbf{m}}}\tilde{\textbf{f}}(x_i^{N_1},y_j^{N_2}), \ i=1:N_1,\ j=1:N_2. \end{aligned}$$
(21)

In terms of the RGB components \(\tilde{R}=[\tilde{R}_1,\tilde{R}_2, \tilde{R}_3]\), by (12), this means that for any \(i=1:N_1,\ j=1:N_2\) and \(\lambda =1:3\) we have

$$\begin{aligned} \tilde{R}_\lambda (i,j)= & {} V_{{\textbf{n}}}^{{\textbf{m}}}\tilde{f}_\lambda (x_i^{N_1},y_j^{N_2})\nonumber \\= & {} \sum _{u=1}^{n_1}\sum _{v=1}^{n_2} \tilde{I}_\lambda (u,v)\varPhi _{n_1,u}^{m_1}(x_i^{N_1})\varPhi _{n_2,v}^{m_2}(y_j^{N_2}), \end{aligned}$$
(22)

that is

$$\begin{aligned} \tilde{R}_\lambda =V_1^T \tilde{I}_\lambda V_2, \quad \lambda =1:3, \end{aligned}$$
(23)

where the matrices \(V_1\in {\mathbb {R}}^{n_1\times N_1}\) and \(V_2\in {\mathbb {R}}^{n_2\times N_2}\) have the following entries

$$\begin{aligned} V_1(i,j)= & {} \varPhi _{n_1,i}^{m_1}(x_j^{N_1}),\qquad i=1:n_1,\ j=1:N_1 \end{aligned}$$
(24)
$$\begin{aligned} V_2(i,j)= & {} \varPhi _{n_2,i}^{m_2}(y_j^{N_2}),\qquad i=1:n_2,\ j=1:N_2 \end{aligned}$$
(25)

To compute \(V_1,V_2\) efficient algorithms based on Fast Fourier transform can be implemented (see, e.g., [54]). Moreover, by pre-computing matrices \(V_i\), the representation (23) allows to reduce the computational effort when we have to resize many images for the same fixed sizes.

Now, we note that in the previous formulas, the integers \(n_\ell \) and \(N_\ell \) for \(\ell =1,2\) are determined by the initial and final size of the scaling problem at hand, while the parameter \(m_\ell \) is free. Theoretically, it can be arbitrarily chosen from the set of integers between 1 and

\(n_\ell \). According to (11), for our method we fix

$$\begin{aligned} m_\ell =\lfloor \theta n_\ell \rfloor , \qquad \ell =1,2,\qquad \text {with}\qquad \theta \in ]0,1] \end{aligned}$$
(26)

including the limit case \(\theta =1\) too. Moreover, we also allow \(\theta =0\), but, in this case, we remark that, by virtue of (10), VPI reduces to Lagrange–Chebyshev Interpolation (LCI) scaling method recently proposed in [28]. In this sense, VPI can be considered a generalization of the LCI method.

Regarding the choice of the parameter \(\theta \), in the experimental validation of VPI, we consider two modes that we indicate in the sequel: “supervised VPI” and “unsupervised VPI.” In the latter case, \(\theta \) must be supplied by the user as an input parameter, arbitrarily chosen in [0, 1], where the choice \(\theta =0\) means to select the LCI method.

Nevertheless, if a target resized image is available, we have structured VPI method in a supervised mode that requires the target image as input argument, instead of the parameter \(\theta \). In this case, we take several choices of \(\theta \in [0,1]\) and, consequently, we get several matrices \(V_1, V_2\) that determine, using (23), several resized images. Among these images, the one that, once compared with the target image, gives the smallest MSE is chosen as the output image of the supervised VPI method.

In the sequel of this Section, we focus on d-VPI method with odd scale factors \(s=n_1/N_1=n_2/N_2\). In the following proposition, we suppose that the lower resolution sampling satisfies the Nyquist limit so that the continuous image \({\textbf{f}}\) can be uniquely reconstructed from both digital images I and R without any error. In this ideal case, we prove d-VPI method produces a MSE that it is not greater than \(s^2\) times the MSE of the input data (cf. (28)) and, in particular, we get a null MSE if the input image is exact or, at least, only some crucial pixels of it are exact (cf. Remark 1).

Proposition 1

Let I and R be the initial and resized true images given by (2) and (3), respectively, and let \(\tilde{I}\) and \(\tilde{R}\) by input and output images of d-VPI method, respectively, given by (20) and (21), with arbitrarily fixed integer parameters \(m_1<n_1\) and \(m_2<n_2\). If there exists \(\ell \in {\mathbb {N}}\) that relates the initial size \(n_1\times n_2\) with the final size \(N_1\times N_2\) as follows

$$\begin{aligned} \frac{n_1}{N_1}=\frac{n_2}{N_2}=(2\ell -1), \end{aligned}$$
(27)

then we have

$$\begin{aligned} \textrm{MSE}(R,\tilde{R})\le s^2 \textrm{MSE}(I, \tilde{I}),\qquad s=(2\ell -1). \end{aligned}$$
(28)

The same estimate holds also for the luma channel and, if in addition \(I=\tilde{I}\) holds too, then we get

$$\begin{aligned} \textrm{PSNR}(R,\tilde{R})=\infty ,\qquad \text {and}\qquad \textrm{SSIM}(R,\tilde{R})=1. \end{aligned}$$
(29)

Proof

Recalling the definition (15), to prove (28) it is sufficient to state the same inequality holds for the respective RGB components. Hence, according to our notation, let us state that for all \(\lambda =1:3\) we have

$$\begin{aligned} \textrm{MSE}(R_\lambda ,\tilde{R}_\lambda )< s^2 \textrm{MSE}(I_\lambda , \tilde{I}_\lambda ). \end{aligned}$$
(30)

To this aim, by using the short notation n and N to denote \(n_i\) and \(N_i\), respectively, for any \(i=1,2\), we note that whenever we have \(n= s N\) with \(s=(2\ell -1)\) and \(\ell \in {\mathbb {N}}\), all the zeros of the first kind Chebyshev polynomial of degree N (i.e., \(\cos (t_h^{N})\) with \(h=1:N\)) are also zeros of the first kind Chebyshev polynomial of degree n. More precisely, we have

$$\begin{aligned} \cos (t_h^{N})=\cos (t_{i(h)}^{n}) \quad \text {with } i(h)=\frac{s(2h-1)+1}{2}, \end{aligned}$$
(31)

where we point out that for all \(h=1:N\) the index \(i(h)=\frac{s(2h-1)+1}{2}\) is an integer between 1 and n thanks to the hypothesis s is odd.

By virtue of (31), for all \(h_1=1:N_1\) and \(h_2=1:N_2\), recalling (35) we get

$$\begin{aligned} \begin{array}{rl} R_\lambda (h_1,h_2)=&{}f_\lambda \left( x_{h_1}^{N_1},y_{h_2}^{N_2}\right) \\ =&{} f_\lambda \left( x_{i(h_1)}^{n_1},y_{i(h_2)}^{n_2}\right) \\ =&{} I_\lambda \left( i(h_1),\ i(h_2)\right) , \qquad \lambda =1:3. \end{array} \end{aligned}$$
(32)

Similarly, from (21), (31), (13) and (20), we deduce

$$\begin{aligned} \begin{array}{rl} \tilde{R}_\lambda (h_1,h_2)=&{}V_{{\textbf{n}}}^{{\textbf{m}}}\tilde{f}_\lambda \left( x_{h_1}^{N_1},y_{h_2}^{N_2}\right) \\ =&{} V_{{\textbf{n}}}^{{\textbf{m}}}\tilde{f}_\lambda \left( x_{i(h_1)}^{n_1},y_{i(h_2)}^{n_2}\right) =\tilde{f}_\lambda \left( x_{i(h_1)}^{n_1},y_{i(h_2)}^{n_2}\right) \\ =&{} \tilde{I}_\lambda \left( i(h_1),\ i(h_2)\right) , \qquad \lambda =1:3. \end{array} \end{aligned}$$
(33)

Therefore, by (32) and (33), for any \(\lambda =1:3\) we deduce (30) as follows

$$\begin{aligned}{} & {} \textrm{MSE}(R_\lambda ,\tilde{R}_\lambda )=\frac{1}{N_1N_2}\sum _{h_1=1}^{N_1}\sum _{h_2=1}^{N_2} \left[ R_\lambda (h_1,h_2)-\tilde{R}_\lambda (h_1,h_2)\right] ^2\\{} & {} =\frac{1}{N_1N_2}\sum _{h_1=1}^{N_1}\sum _{h_2=1}^{N_2} \left[ I_\lambda \left( i(h_1),\ i(h_2)\right) -\tilde{I}_\lambda \left( i(h_1),\ i(h_2)\right) \right] ^2\\{} & {} \le \frac{1}{N_1N_2}\sum _{i=1}^{n_1}\sum _{j=1}^{n_2} \left[ I_\lambda (i,j)-\tilde{I}_\lambda (i,j))\right] ^2\\{} & {} =\frac{s^2}{n_1n_2}\sum _{i=1}^{n_1}\sum _{j=1}^{n_2} \left[ I_\lambda (i,j)-\tilde{I}_\lambda (i,j))\right] ^2\\{} & {} =s^2 \textrm{MSE}(I_\lambda ,\tilde{I}_\lambda ). \end{aligned}$$

As regards the luma channel, we note that by (3233) and (17) we easily deduce that

$$\begin{aligned} \begin{array}{rl} R_Y(h_1,h_2)=&{}I_Y(i(h_1), \ i(h_2))\\ \tilde{R}_Y(h_1,h_2)=&{}\tilde{I}_Y(i(h_1), \ i(h_2)) \end{array} \end{aligned}$$
(34)

and such identities, similarly to the case of the RGB components, easily imply

$$\begin{aligned} \textrm{MSE}(R_Y, \tilde{R}_Y)\le s^2 \textrm{MSE}(I_Y,\tilde{I}_Y) \end{aligned}$$
(35)

Finally, in the case that \(I=\tilde{I}\), from (32) and (33) we deduce that

$$\begin{aligned} R_\lambda (h_1,h_2)=\tilde{R}_\lambda (h_1,h_2), \quad \lambda =1:3 \end{aligned}$$

holds for any \(h_1=1:N_1\) and \(h_2=1:N_2\). Consequently we get the best result (29). \(\diamondsuit \)

Remark 1

The previous proof shows that the hypothesis \(I=\tilde{I}\) can be relaxed requiring that these images coincide only on some suitable pixels.

More precisely, in order to get (29) it is sufficient that

$$\begin{aligned} \begin{array}{l} I_\lambda (i(h_1),\ i(h_2))=\tilde{I}_\lambda (i(h_1),\ i(h_2)), \\ \qquad \lambda =1:3, \quad h_1=1:N_1,\quad h_2=1:N_2 \end{array} \end{aligned}$$
(36)

holds, where we defined

$$\begin{aligned} i(h)=\frac{s(2h-1)+1}{2}. \end{aligned}$$
(37)

Remark 2

From (33), we deduce that in all cases of downscaling with odd scale factors, the choice of the parameter \(\theta \), and more generally, the values we assign to \( \ m \) do not matter as d-VPI always returns the same output image which coincides with the one produced by the d-LCI method. In this case, d-VPI reduces to a simple decimation of the original image, according to (33) and (37). Consequently, starting from a given image and using the same odd scale factor in opposite directions, for any \({\textbf{m}}\) ones may sequentially run u-VPI first and then d-VPI, getting back the initial image without any error.

In order to reduce the computational cost, in all downscaling with odd scale factors, formula (33) is used instead of (23) to get the output image by d-VPI.

We point out that Proposition 1 holds under the assumptions (2, 3), i.e., according to the Nyquist–Shannon theorem, both the target HR and LR images are a sampling of the same (unique) image at the different scales satisfying (27). From an image processing point of view, such a result has a theoretical value rather than a practical one since real-world images generally do not satisfy the above hypothesis. Indeed, the theoretical results of Proposition 1 do not exclude the possible occurrence of aliasing effects when we are downscaling input images with high-frequency details. In this case, even starting from not corrupted HR images, d-VPI produces LR images with aliasing effects that, under a certain size, become more and more visible. The experimental results in the next section show that aliasing also occurs when the downscaling factor (if any) does not satisfy the hypothesis of Proposition 1.

Following the sampling theorem [29], the standard approach for minimizing aliasing artifacts involves limiting the spectral bandwidth of the HR input image by filtering the image via convolution with a kernel before subsampling. As a well-known side effect, the resulting LR output image might suffer from loss of fine details and blurring of sharp edges. Thus, many filters have been developed [55] to balance mathematical optimality with the perceptual quality of the downsampled result. However, these filter-based methods can introduce undesirable ringing or over-smoothing artifacts.

Both L\(_0\) and DPID have focused on the aspects of detail preserving. Specifically, to solve the aliasing problem L\(_0\) proposes an L\(_0\)-regularized optimization framework where the gradient ratio and reconstruction prior are iteratively optimized in an alternative way. In contrast, DPID uses an inverse bilateral filter to emphasize the differences between areas with small detail and bigger ones with similar colors. However, the aliasing reduction process of these methods influences their performance results both in quality and CPU time terms, as it is possible to see in the next Sect. 5.

In this paper, our attention is focused on studying the effect of VP interpolation applied to image resizing in both upscaling and downscaling. Consequently, we limit to suggest the employment of suitable convolutional filtering for high downscaling scale factors whenever the aliasing effects are too evident (see Sect. 5.3). In the meantime, we are working on finding better solutions to reduce the possible aliasing effects in d-VPI.

5 Experimental Results

In this section, we describe the experimental validation of VPI. We test it on some publicly available image datasets, and we compare it with the methods described in Sect. 2; namely, we compare d-VPI with BIC, d-LCI, L\(_0\), DPID, and u-VPI with BIC, u-LCI, SCN. Although DPID and L\(_0\) (SCN) can also be applied in upscaling (downscaling) mode, we do not force the comparison with them in unplanned way to avoid an incorrect experimental evaluation.

All methods have run on the same computer with the configuration Intel Core i7 3770K CPU @350GHz in MATLAB 2018a. In the following, Sect. 5.1 introduces the considered datasets while Sect. 5.2 and 5.3 are, respectively, devoted to quantitative and qualitative performance evaluation, both in downscaling and upscaling.

5.1 Datasets

To be more general, besides the datasets used by the benchmark methods [28, 44,45,46,47], we also consider some datasets offering different characteristics and extensively employed in image processing.

Specifically, the d-VPI performance evaluation is carried out on some publicly available datasets comprising 1026 color images in total. In particular, we consider BSDS500 dataset [56], available at [57] which includes 500 color images having the same size (481 \(\times \) 321 or 321 \(\times \) 481). This set, also used in [28, 45], is sufficiently general and provides a large variety of images often employed also in other different image analysis tasks, such as in image segmentation [58,59,60,61] and color quantization [62,63,64,65]. We also consider the following datasets to favor the comparison with the benchmark methods.

  • The 13 natural-color images of the user study in [66] available at [67] and here denoted by 13US. They are originally taken from the MSRA Salient Object Database [68], used in a previous study [69] and also employed in [44]. These images have sizes ranging from 241 \(\times \) 400 to 400 \(\times \) 310 pixels.

  • The extensive two sets selected in [44] from the Yahoo 100Mimage dataset [70] and the NASA Image Gallery [71], available at [72]. We denote by NY17 and NY96 the corresponding sets of color images extracted from them. These sets comprise 17 and 96 color images, with sizes ranging from 500 \(\times \) 334 to 6394 \(\times \) 3456, respectively.

  • The Urban100 dataset [73] including 100 color images related to an urban context, with one dimension at most equal to 1024 and the other ranging from 564 to 1024 pixels. It has also been employed in [45].

  • The dataset PEXELS300 considered in [28] and available with VPI code. It consists of 300 color images randomly selected from [74] and originally having different large sizes that we centrally cropped to 1800 \(\times \) 1800 pixels.

Regarding the u-VPI performance evaluation, in addition to the previous datasets, we have also used the following well-known datasets, commonly used by the super-resolution community [75, 76] for a total of 1943 color images.

  • The 5 images, known in the literature as Set5 and originally taken from [77], with sizes ranging from 256 \(\times \) 256 to 512 \(\times \) 512 available at [78].

  • The 12 color images belonging to the Set14 [79], with sizes ranging from 276 \(\times \) 276 to 512 \(\times \) 768 available at [80].

  • The image dataset DIV2k (DIVerse 2k) consisting of high-quality resolution images used for the NTIRE 2017 SR challenge (CVPR 2017 and CVPR 2018) [81] available at [82]. It comprises the train set (DIV2k-T) and the valid set (DIV2k-V), with 800 and 100 color images, respectively. Such images have one dimension equal to 2040, while the other ranges from 768 to 2040. DIV2k has permitted testing the performance of all the benchmark methods on input images characterized by different degradations. Such input images are included in DIV2k and collected as follows:

    • DIV2k-T-B ( DIV2k-V-B), generated by BIC (-B);

    • DIV2k-T-u ( DIV2k-V-u), classified as unknown (-u);

    • DIV2k-T-d ( DIV2k-V-d), classified as difficult (-d);

    • DIV2k-T-m ( DIV2k-V-m), classified as mild (-m).

Since DIV2k is the only one to contain both the input image and the target image, in order to implement supervised VPI with the other datasets, we fix the images of these datasets as target images with \(N_1\times N_2\) pixels. Hence, we generate the respective input images, with size \(n_1\times n_2\), by a scaling method which we assume to be BIC in most cases, since it can be used both in downscaling (for testing supervised u-VPI) and upscaling (for testing supervised d-VPI). However, in Sect. 5.2.3, we also analyze the implementation of the other scaling methods from Sect. 2 to generate the input images.

For simplicity, in the following, we distinguish how the input image is generated by premising the name of the generating method. For instance, the input image generated by BIC is indicated as BIC input image.

5.2 Quantitative Evaluation

For the quantitative evaluation, we compute, both in upscaling and downscaling, the visual quality measures PSNR, SSIM (cf. Sect. 3.3), and the CPU time for VPI and the benchmark methods starting from the same input image. Here we focus on the quality measures while the CPU time is analyzed in Sect. 5.2.3.

Since the target image is necessary to compute PSNR and SSIM, we employ the supervised VPI both in upscaling and downscaling. Specifically, we let the free parameter \(\theta \) vary from 0.05 to 0.95 with step 0.05. In this way, we get 19 resized images, and we take as output image the one with minimum MSE.

Table 1 Average performance of upscaling methods on DIV2k with input images generated by BIC (-B) and classified as unknown (-u)
Table 2 Average performance of upscaling methods on DIV2k with input images classified as difficult (-d) and mild (-m)

First, we test supervised VPI and the benchmark methods on the DIV2k image dataset. As above specified, this is the only dataset that, for certain fixed scaling factors, includes both the input \(n_1\times n_2\) images and the target \(N_1\times N_2\) image. Since in DIV2k only the cases

$$\begin{aligned} (N_1, N_2)=s (n_1, n_2), \qquad \text {with}\qquad s=2,3,4, \end{aligned}$$

are present, on DIV2k we test upscaling methods (supervised u-VPI, BIC, u-LCI, and SCN) for these scale factors. Tables 1 and 2 show the average results of PSNR and SSIM computed with target images from both the train and valid sets (DIV2k-T and DIV2k-V, resp.), taking as input images the respective ones classified in DIV2k as BIC (-B), unknown (-u), difficult (-d), and mild (-m). We remark that Table 2 concerns only the case \(s=4\) since for \(s=2,3\) the input LR images are not present in DIV2k-T-d/m and DIV2k-V-d/m datasets.

To test supervised VPI and the benchmark methods on the other datasets detailed in Sect. 5.1, as previously announced, we take as target \(N_1\times N_2\) images the ones in the datasets and apply BIC to them in order to generate the input \(n_1\times n_2\) images. For brevity, in both upscaling and downscaling, we show only the performance results for the scale factors s=2,3,4, which means the input size \(n_1\times n_2\) is computed from the target size according to the following formula

$$\begin{aligned} n_i=\left\{ \begin{array}{ll} sN_i &{} \text {to test downscaling methods}\\ \left\lfloor \frac{N_i}{s}\right\rfloor &{} \text {to test upscaling methods} \end{array}\right. \ i=1,2. \end{aligned}$$
(38)

Tables 3 and 4 concern upscaling and downscaling, respectively, and show the average results of PSNR, SSIM values obtained for the datasets and methods specified in the first columns. Note that for all methods we provide the input images generated from those in the datasets, with size \(N_1\times N_2\), by applying BIC according to (38).

We remark that, in upscaling, the comparison with SCN is limited to DIV2k since, for the images in the datasets of Table 3, the SCN demo version does not always produce the exact size of the HR image, making it impossible to compute the quality measure values.

The bar graphs describing the trend discovered by Tables Tables 1, 2, 3 and 4 are shown in Figs. 4 and 5 for PSNR and SSIM values, respectively.

Table 3 Average performance of upscaling methods with BIC input images
Table 4 Average performance of downscaling methods with BIC input images (oom is the short way to indicate “out of memory”)
Fig. 4
figure 4

PSNR values extracted from Tables 1, 2, 3 and 4

Fig. 5
figure 5

SSIM values extracted from Tables 1, 2, 3 and 4

From the displayed average results, we observe the following.

\(\Box \) Concerning upscaling:

  1. u.1

    On the datasets displayed in Table 3, employing the methods with BIC input images, u-VPI has a slightly higher performance than BIC and u-LCI in terms of the visual quality values;

  2. u.2

    On the DIV2k dataset, providing the input images from the datasets therein included, we observe that the previous trend for BIC, u-LCI, and u-VPI is confirmed. However, we also find SCN that gives the best visual quality values when the input images are generated by BIC (i.e., belonging to DIV2k-T-B and DIV2k-V-B datasets). Nevertheless, in the case of input images classified unknown (i.e., belonging to DIV2k-T-u and DIV2k-V-u datasets), slightly higher performance values are given by u-VPI, except in one case. Indeed, SCN has the best performance on DIV2k-T-u when \(s=4\). On the contrary, SCN always provides the lowest PSNR and SSIM values when the input images are classified as both difficult and mild (see Table 2). In this case, u-VPI continues to provide the best quality values for the Train images (i.e., for input images in DIV2k-T-d and DIV2k-T-m). However, BIC outperforms it for the Valid images (i.e., with input images from DIV2k-V-d and DIV2k-V-m). Finally, no change regards the comparison u-VPI/u-LCI, where u-VPI always gives slightly higher values.

Table 5 Average of the optimal values of \(\theta \) resulting from supervised-VPI with BIC input images. The downscaling  : 3 case is missing since it is independent of \(\theta \)
Fig. 6
figure 6

Boxplot derived from Table 5

\(\Box \) Concerning downscaling (with BIC input images):

  1. d.1

    The best PSNR and SSIM values are achieved by d-VPI, followed in order by d-LCI (ex-aequo in some cases), DPID, BIC, and L\(_0\). For all datasets and scale factors displayed in Table 3, there is a consistent gap between the PSNR values by d-VPI and those provided by BIC, DPID, and \(L_0\). According to the results in [28], also d-LCI provides good values, but generally, d-VPI outperforms it, even if with a smaller gap.

  2. d.2

    The demo version of \(L_0\) has memory problems with input images of large size. In fact, in the case of target images from NY17 and NY96 datasets, \(L_0\) does not give any output for the scale factor \(s=2,3\) that, according to (38), generate a large input size.

  3. d.3

    When \(s=3\), d-VPI confirms the optimal performance proved in Proposition 1 for odd scaling factors. We note that this case is missing in Figs. 4 and 5 since (29) holds for both d-LCI and d-VPI. Nevertheless, we point out that (29) has been proved under the ideal assumptions that the required LR sampling satisfies the Nyquist–Shannon theorem and that the initial data are not corrupted. Hence it is not always true. Indeed, we have verified (29) continues to hold starting from HR input images generated by the nearest neighbor and bilinear methods [33] (using MATLAB imresize with nearest and bilinear option, respectively), but in Sect. 5.2.3 we show it does not hold if we use SCN to generate the input HR image.

5.2.1 Parameter Modulation

In this subsection, we use the previous quantitative analysis looking for some hint about the setting of the parameter \(\theta \) in practice when the target image is unavailable.

To this aim, in the previous tests employing supervised VPI with BIC input images, for each dataset, we compute the average of the optimal values of \(\theta \) corresponding to the output images presenting the minimal MSE. For both upscaling and downscaling, we report these results in Table 5 and show the relative boxplot in Fig. 6. We remark that the downscaling case with scale factor \(s=3\) is missing since, as previously stated, the d-VPI output is independent of \(\theta \).

The experimental results show a wide variability of the optimal value of \(\theta \) for different datasets, even with the same scaling factor. Consequently, it does not seem possible to suggest a particular choice of the parameter \(\theta \) which, in practice, remains a free parameter.

Table 6 Average CPU time in the upscaling cases of Table 3
Table 7 Average CPU time in the upscaling cases of Tables 1 and 2
Table 8 Average CPU time in downscaling tests of Table 4 (oom denotes “out of memory”)

5.2.2 CPU Time Analysis

The CPU time required by each scaling method is also an important aspect to consider in the quantitative performance evaluation. For this reason, in the previous experiments, besides PSNR and SSIM, we measure the CPU time each method takes to produce the output images. Tables 6, 7, and 8 show the average CPU time values we computed, for each scaling factor and dataset, by employing the displayed methods, in upscaling (Tables 6, 7) and downscaling (Table 8).

Regarding VPI, we point out that in Tables 1, 2, 3 and 4 we tested supervised VPI that is optimally structured to produce 19 resized images corresponding to unsupervised VPI called with 19 equidistant values of the input parameter \(\theta \). For this reason, in comparison with the benchmark methods, we report in Tables 6, 7 and 8 the average CPU time that unsupervised VPI takes, providing in input the average values of \(\theta \) reported in Table 5 for each employed datasets and scale factors 2,3,4. For other values of \(\theta \), we did not observe significant variations w.r.t. the displayed results.

Inspecting Tables 6, 7 and 8, we observe the following.

\(\Box \) Concerning upscaling:

The method requiring the least CPU time is BIC. At a short distance, we find u-LCI and u-VPI with CPU times very close to each other. Much higher computation time is required by SCN, which always requires the longest CPU time. Table 7 shows that this trend is independent of how the input image is generated (i.e., -B, -u, -d, -m).

\(\Box \) Concerning downscaling:

Even in this case, the method requiring the least computation time is BIC, closely followed by d-LCI and d-VPI that coincide and are much faster when the scale factor is 3, due to (33). In the ranking, \(L_0\) and DPID follow with much higher computation time than d-VPI, especially on datasets characterized by larger image sizes (such as NY17, NY96, and PEXELS300). In particular, \(L_0\) does not give any output for target images in NY96 and NY17 with scale factors \(s\ge 2\) and \(s\ge 3\), respectively, while it is faster than DPID in the remaining cases.

Table 9 Average performance results of upscaling methods (first column) on PEXELS300 dataset with input images generated by BIC, u-LCI, u-VPI, DPID, and \(L_0\)
Table 10 Average performance results of downscaling methods (first column) on BSDS500 dataset with input images generated by BIC, d-LCI, d-VPI, and SCN

5.2.3 Input Image Dependency

In this subsection, we study the dependency of the VPI performance on the way the input images are generated. To this aim, we repeat the previous quantitative analysis computing the average PSNR and SSIM values for supervised VPI and the benchmark methods, but we let vary the scaling method generating the input images from the target ones in the dataset. More precisely, in downscaling we provide input HR images generated from BIC, u-LCI, u-VPI, and SCN upscaling methods, while in upscaling we get the input LR image by applying the downscaling methods BIC, d-LCI, d-VPI, DPID, and \(L_0\). We point out that whenever u-VPI or d-VPI are employed to generate the input image, we use the unsupervised mode with a prefixed value \(\theta =0.5\).

As above, we consider the scale factors \(s=2,3,4\) and we require for the input images the size \(n_1\times n_2\), determined by (38), where \(N_1\times N_2\) is the size of the target images in the dataset.

Since we have experimented that demo codes of \(L_0\) and SCN have problems in processing images with a large size, we do not consider all datasets for this test, but we focus on PEXELS300 dataset in upscaling and on BSDS500 dataset in downscaling. The average performance results are shown in Tables 9 and 10, respectively.

From these tables, we observe the following.

\(\Box \) Concerning upscaling:

  • SCN provides the highest quality measures only in the case of BIC input images, confirming the trend displayed in Table 1.

  • In the case of input images generated by downscaling methods different from BIC, SCN always provides the lowest values and the best performance is attained by u-VPI except in the upscaling x4 case with \(L_0\) input images when BIC presents slightly higher performance values than u-VPI.

  • Similarly to BIC, u-VPI has a more stable behavior with respect to variations of the input image. The quality measures by u-VPI are always higher than those by u-LCI that behave better than BIC only with BIC input images.

  • In upscaling (x3), we note the same performance values for u-LCI and u-VPI input images (3th and 4th columns of Table 9), confirming that in downscaling (:3), both d-LCI and d-VPI generate the same input images.

\(\Box \) Concerning downscaling:

  • For even scale factors (\(s=2,4\)), d-VPI always provides much higher performance values than DPID and \(L_0,\) which always presents the lowest quality measures. The d-VPI method followed by d-LCI achieves the highest performance, unless in the case of SCN input images where BIC holds the record, followed in order by d-VPI, DPID, d-LCI, and \( L_0 \) For odd scale factors, it is confirmed that d-VPI reduces to d-LCI reaching the optimal quality measures in the case of input images generated by BIC, u-LCI, or u-VPI. Nevertheless, for SCN input images, the ranking of the even cases \(s = 2,4\) is confirmed, i.e., the best performance is given by BIC, followed in order by DPID, d-LCI = d-VPI, and \(L_0.\)

Finally, for the sake of completeness, in Table 11, we report the average of the optimal \(\theta \) resulting from supervised VPI with input images generated by different methods. As in Sect. 5.2.1, the obtained results provide no new insights for choosing \(\theta \).

Table 11 Average of the optimal values of \(\theta \) resulting from supervised-VPI with input images generated by different methods. The average is computed on BSDS500 dataset for downscaling and Pexels300 for upscaling. The downscaling  : 3 case is missing since it is independent of \(\theta \)

5.3 Qualitative Evaluation

We test VPI and the benchmark methods for scale factors varying from 2 to very large values both for supervised and unsupervised mode. In this subsection, some visual results of the numerous performed tests are given.

\(\bullet \) Concerning the supervised mode:

we show some examples of performance results in Figs. 7 and 8 for upscaling and in Figs. 9 and 10 for downscaling with different BIC input images and scale factors 2, 3, 4. In these figures, some Regions of Interest (ROI) are shown in order to highlight the results at a perceptual level. The visual inspection of these performance results confirms the quantitative evaluation in terms of PSSNR and SSIM exhibited in Sect. 5.2. Hence, we deduce that: a) the observable structure of the objects is captured; b) local contrast and luminance of the input image are preserved; c) small details and most of the salient edges are maintained; d) the presence of ringing and over smoothing artifacts is very limited; e) the resized image is sufficiently not blurred.

\(\bullet \) Concerning the unsupervised mode:

we set the free parameter \(\theta \) equal to 0.5 and take the input images directly from the datasets. Unlike the supervised mode, we cannot compute the PSNR and SSIM quality measures since the target image is missing. Consequently, in the following, we evaluate the performance according to our absolute human perceptual ability taking into account the CPU time (briefly denoted by T) when the results are almost equivalent in terms of perceived quality.

Fig. 7
figure 7

Examples of supervised upscaling performance results at the scale factor 2 (left), at the scale factor 3 (middle), at the scale factor 4 (right)

Fig. 8
figure 8

Examples of supervised upscaling performance results at the scale factor 2 (left), at the scale factor 3 (middle), at the scale factor 4 (right)

Fig. 9
figure 9

Examples of supervised downscaling performance results at the scale factor 2 (left), at the scale factor 3 (middle), at the scale factor 4 (right)

Fig. 10
figure 10

Examples of supervised downscaling performance results at the scale factor 2 (left), at the scale factor 3 (middle), at the scale factor 4 (right). For layout reasons, the performance results both for d-LCI and d-VPI are reported although they coincide for the downscaling factor 3

First, we consider as input two images already displayed for the supervised mode and used in that case as target images (see Figs. 7 and 9). We show the performance results at the scale factor 2 (upscaling) in Fig. 11 and at the scale factor 4 (downscaling) in Fig. 12. A careful examination of Fig. 11 does not highlight significant perceptual visual differences in the upscaled images produced by all methods. However, the required CPU times for u-VPI, u-LCI, and BIC are very close, while SCN takes much more processing time. On the other hand, since the input image in Fig. 12 has too high-frequency details, differently from Fig. 9 achieved by BIC input image, aliasing effects are visually detectable for all methods. In particular, d-LCI and d-VPI present more evident aliasing effects than the other methods, although with a minor CPU time with respect to L\(_0\) and DPID.

Even if avoiding aliasing effects is an important part of downscaling methods, this is out of the aim of the present paper, which intends to show a different point of view for both upscaling and downscaling by using a specific approximation theory tool in the image processing framework. Consequently, in Fig. 12, as well as in the remaining experiments, we just show the performance result obtained by a pre-filtering combined with d-VPI (denoted as f-d-VPI). We point out that in f-d-VPI, the type of filter to employ is selected based on the feature images. Our selection includes the following 2-D filters: a) averaging filter (“average”); b) circular averaging filter (“disk”); c) Gaussian filter (“Gaussian”); d) motion (“motion”), they all implemented in MATLAB using hspecial to specify the filter type.

As mentioned in Sect. 4, this solution only partially reduces the aliasing effect in Fig. 12. It does not affect the processing time too much since the CPU time of f-d-VPI is much smaller than that of L\(_0\) and DPID, and very close to the CPU time of BIC.

Intending to highlight the aliasing influence, in the sequel, we consider different kinds of input images extracted from PEXELS300 dataset (so having the same input size 1800 \(\times \) 1800), and we apply to them unsupervised d-VPI and the benchmark methods with the same downscaling factor.

In Fig. 13 downscaling with scaling factor 3 is applied to the images (1,640,882 and 163,064 from PEXELS300) displayed at the top. Some ROIs of the resulting output images are shown in the middle and bottom in order to emphasize the aliasing phenomenon. By visually inspecting of these ROIs, we can check a different behavior of d-VPI that, in this case, coincides with d-LCI being the LR image computed by (37). Indeed, we note that for the input image on the right (163,064) the aliasing effect is not appreciable (see, for instance, the diagonal line in the ROIs) while it becomes visible for the input image on the left (1,640,882). In the latter case, we observe aliasing occurs for all methods to a different extent, but BIC, L\(_0\), and DPID have better performance since the downscaled images are affected by aliasing to a lesser extent than d-VPI (see the vertical elements of the railing). However, in the resized image by f-d-VPI, the aliasing effect is equally present with respect to the other methods without a significant computational burden. It results to be the second fastest method and is competitive with BIC (L\(_0\) and DPID have a greater CPU time).

Fig. 11
figure 11

An example of unsupervised upscaling performance results at the scale factor 2 on an image extracted from DIV2k (size=1356 \(\times \) 2040)

Fig. 12
figure 12

An example of unsupervised downscaling performance results at the scale factor 4 on an image extracted from Urban100 (size=680 \(\times \) 1024). In f-d-VPI a circular averaging filter with size 3 is employed

Finally, in Fig. 14, we test all downscaling methods at the scale factor 8 on the input image displayed on the top (3472764 from PEXELS300). In this case, d-VPI and d-LCI produce better visual performance results since the stars in the sky are more adequately preserved in terms of numbers and shape. BIC and L\(_0\) reduce too much the number of stars and introduce a blurring effect, while DPID provides a downscaled image where the stars are almost all reshaped and doubled. Moreover, f-d-VPI seems not to give new insights. Note that the aliasing is visible in other areas of the image (see, for example, the mountain ridge area) with almost the same intensity for all methods. About CPU time, also in this case, DPID and L\(_0\) are the most expensive.

In conclusion, we point out that the aliasing effect does not always occur at the same scale factor and does not always influence the downscaling performance similarly. Moreover, in some contexts, even the downscaling methods designed to reduce the aliasing can result inadequate to manage this problem and can introduce distortions even greater than aliasing itself. In these cases, as well as when the aliasing is not visible and when the quality of the downscaled image is visually equivalent, d-VPI may prove to be preferable since it provides a good compromise in terms of quality and CPU time.

Fig. 13
figure 13

Two input images extracted from PEXELS300 (top) and ROIs of unsupervised downscaling performance results at the scale factor 3 (size: 1800 \(\times \) 1800) (middle and bottom). In f-d-VPI an average filter is employed

Fig. 14
figure 14

Examples of unsupervised downscaling performance results at the scale factor 8 on an input image extracted from PEXELS300 (size 1800 \(\times \) 1800). The input image is shown with the same printing size of the resulting images to facilitate the visual comparison. In f-d-VPI an average filter is employed

6 Conclusions

This paper proposes a new image scaling method, VPI, which is based on non-uniform sampling grids and employs the filtered de la Vallée-Poussin type polynomial interpolation at Chebyshev zeros of 1st kind.

The VPI method is simple to implement and highly flexible since it can be applied to resize arbitrary digital images both in upscaling and downscaling by specifying the scale factor or the desired size.

VPI depends on an additional input parameter \(\theta \in [0,1]\) that, if necessary, can be suitably modulated to improve the approximation. In particular, taking \(\theta =0\), VPI reduces to the LCI method that has been introduced by the authors in [28] and is based on classical Lagrange interpolation at the same nodes. Nevertheless, for any \(\theta \in ]0,1]\) VPI generally improves the LCI performance and it proves to be more stable than the latter due to the uniform boundedness of Lebesgue constants corresponding to the de la Vallée-Poussin type interpolation. Exceptions are downscaling cases with an odd scaling factor when VPI produces the same LR image of LCI that, for any value of \(\theta \), is given by decimation of the original image according to formulae (33) and (37).

The VPI performance has been evaluated using two commonly adopted quality measures, PSNR and SSIM, and measuring the required CPU time too. Comparisons with other recent resizing methods (also specialized in only upscaling or downscaling) have been carried on a wide number of images belonging to several, commonly available, datasets and characterized by different contents and sizes ranging from small to large scale. During the VPI validation procedure, also the modulation of the free parameter \(\theta \) has been observed experimentally. Further, the dependency on the input image has been considered by applying to the target images in the datasets different scaling methods in order to generate the input images.

The experimental results confirm that VPI has a competitive and satisfactory performance, with quality measures generally higher and more stable than those of the benchmark methods. Moreover, VPI results much faster than the methods specialized in only downscaling or upscaling, with CPU time close to the one required by LCI and imresize, the MATLAB optimized version of bicubic interpolation method (BIC).

At a visual level, VPI captures the object’s visual structure by preserving the salient details, the local contrast, and the luminance of the input image, with well-balanced colors and limited presence of artifacts.

One limitation of VPI concerns downscaling performance when HR images have high-frequency details. In downscaling with odd scale factors, if the Nyquist limit is satisfied, we give a theoretical estimate for the MSE, which, in particular, is null if the input image or some crucial pixels of it are “not corrupted.” Nevertheless, even starting from “exact” HR images, in downscaling VPI can suffer from aliasing problems when the frequency content of the image and the required size for the LR image are such to violate the Nyquist–Shannon theorem. In our experiments, we report cases when aliasing does not occur and cases when aliasing occurs. In the latter case, we suggest to apply an appropriate filter to the input image before running d-VPI. However, according to our experience, different kinds of filter can give better/worse performance results, and further experiments should be done. For this reason, we prefer to leave the aliasing and the possible combination of VPI with other methods as problems to be further investigated.