1 Introduction

The image processing techniques by which a low-resolution (LR) image is transformed into a high resolution (HR) are known as image super-resolution (ISR). The reasons for the existence of LR images in the real world are multi-fold, including: lack of sufficient budget (and hence, unavailability of HR equipment but use of ordinary cameras) and limitation of image transmission channels (e.g., satellite to ground station). It is obvious that ISR is beneficial for the common users of the otherwise LR images. Therefore, there are different ISR techniques [1,2,3,4,5,6,7,8,9] in the literature, designed to convert LR images into respective HR images. Amongst them, the learning-based approaches are one family of the most popular, especially as they have been developed along together with the contemporary Machine Learning (ML) techniques.

By using a substantial amount of training data, the existing learning-based SR techniques can produce a mapping relationship between the corresponding LR and HR images during the training process. More particularly, to simulate the underlying relationship between LR and HR images linear mappings are firstly tried and used [5]. Such techniques are quite simple to implement but at the cost of accuracy, which is not desirable and may be counter-productive to the original purpose of conducting ISR. Consequently, other techniques which involve nonlinear mappings are created in an effort to improve accuracy. Contemporary deep learning (DL) techniques such as deep convolutional neural network (CNN) has been employed to realise the nonlinear mappings between the LR and HR images [2]. Unfortunately, such superior performance is obtained at the cost of requiring a huge amount of training data (and hardware resources as well). Besides, despite promising accuracy results, the resulting learned mapping relationship is not easy for interpretation and explanation. Fuzzy rule-based approaches [3], however, are generally interpretable. Such approaches utilise a set of fuzzy rules to generate an interpretable nonlinear mapping between the LR and HR images, during their training process.

This has led to an inspiring motivation to employ ANFIS [10] to produce promising accuracy results based on the learned fuzzy rules [11]. However, it is still relying upon the strong presumption that there ought to be sufficient training data. Nevertheless, that is not always possible for many real-world applications as there are certain images or regions of images which are not available or certain information is not visible or lost. As such, situations where insufficient training data are available create a significant challenge for ISR through data-driven learning. Based on this observation, ANFIS interpolation techniques have been developed to provide a potentially credible solution to learning with limited data overall. This type of approach works by training one ANFIS with sparse data through interpolating two adjacent ANFIS models that have been trained with sufficient data, reading to a desirable, and interpretable, non-linear mapping for the problem area where no sufficient training data are available [12]. Note that the interpolation here is carried out at the fuzzy rule level, not at the raw data level.

Following the ideas of such an approach, this paper presents an innovative implementation of sparse data-based ISR technique with the support of ANFIS interpolation. More specifically, the implementation process divides the given image training data sets into several data subsets. After that, the divided data subsets are categorised into two major subcategories, containing sufficient and sparse data subsets, respectively. Conventional ANFIS learning is utilised for the former subcategory of the data subsets to learn corresponding mappings, whilst ANFIS interpolation is exploited for the latter subcategory of the data subsets to interpolate the corresponding mappings. Overall, the main contributions of this paper are as follows:

  • A novel image super-resolution approach that works on sparse data, motivated by an extensive review of image super-resolution methods and facilitated by the most recent ANFIS interpolation technique.

  • A complete description of the computational mechanism that implements the proposed approach, supported with an analysis of algorithm complexity.

  • An implemented system that demonstrates the efficacy of the introduced computational mechanism, applied to various natural images.

The rest of this paper is organised as follows. Section 2 presents the problem statement of ISR. Section 3 introduces an overview of the recently developed ANFIS interpolation techniques. Section 4 details the implementation of the training and testing process of the proposed ISR approach via ANFIS interpolation. Experimental evaluation is included and discussed in Sect. 5, demonstrating the efficacy of this work. Finally, Sect. 6 concludes the paper and discusses identified further research.

2 Problem statement

The basic task of ISR is to create an HR image from the provided LR image. LR images contain a lower number of pixels and hence a lower level or amount of detailed information, whilst HR images contain a higher number of pixels (i.e., higher pixel density) and hence more detailed information. Applications of ISR are wide-reaching, including: security surveillance [13], medical diagnosis [14, 15], face recognition [16] and reconstruction [17], remote sensing for earth observation [18], astronomical observation  [19], bio-metric information identification [13], etc. Such vast applications of ISR illustrate its importance and enormous attraction in the general fields of image processing and computer vision [19].

To obtain high-quality images, there is no need to always improve the quality of hardware devices or to purchase expensive devices, because ISR techniques can help to materialise this from software perspective. According to different input LR information, ISR techniques can be subdivided into two main groups, namely single frame and multi frame [20,21,22,23]. If there are multiple images with subpixel alignment or multiple observations (as the LR input) for the same scene of interest, then the technique to create an HR image from the multiple LR images is known as multi-frame ISR. In other cases, where only limited LR data are available without multiple images of the same scene, the technique to create an HR image from such a single LR is known as single-frame ISR.

Nearly two decades ago, the focus of ISR research was focused on the frequency domain. By using the Fourier transform or wavelet transform, an LR image is transformed into the corresponding HR image (in the frequency domain). Yet, such a straightforward process does not take into account of either prior information or degradation process of the image [19]. They are of rather restrictive potential to cope with complex situations. The underlying drawbacks of the approaches established in the frequency domain have been addressed by the spatial domain-based ISR techniques, and the contemporary spatial domain techniques are subdivided into learning-based and reconstruction-based ISR. The learning-based approaches work by borrowing ideas from data-driven machine learning research. For instance, in relevance to the problems tackled in this work, there exist ISR techniques using fuzzy rules-based approaches. In such work, a fuzzy rule-based algorithm is exploited, through the manipulation of learned fuzzy rules to derive the nonlinear relationship mappings between the LR and corresponding HR images [24]. On the contrary, reconstruction-based ISR utilises appropriately designed priors (edges) within an image reconstruction process to recover the missed details [25].

Note that fuzzy models are constructed using either human expertise (which is directly provided by domain experts, typically represented in if-then production rules) or rules that are induced computationally from given data acquired from the problem domain, or a mixture of both. Data-driven models in general, and deep learning algorithms in particular are constructed from the data. A key difference between these two types of model is their inherent interpretability (or in fact, non-interpretability for DL models). DL models are often referred to as black-box as their reasoning is not explainable, unlike fuzzy rule-based models, despite their usually superior modelling accuracy. Besides, the computational derivation of a DL requires a huge amount of data. The particular modelling scheme employed herein is based on ANFIS, which offers a bridge between pure data-driven and pure rule-based approaches, capable of working on sparse data.

Considering the fundamental start point for the present ISR problem, that is, there lacks sufficient training data overall, the research reported herein aims to deal with learning-based single-frame ISR. For this type of problem, the key challenge is lack of multiple LR observations. In other words, training data that may be used to perform the learning of the nonlinear mapper (or ANFIS) are limited overall. Of course, this does not mean that such limitation in data is universal throughout the entire problem space; otherwise, no data-driven methods may work in the first place.

3 ANFIS interpolation

ANFIS interpolation [12, 26] is designed to resolve the problem of sparse data in a target domain (where an unknown image region is to be identified in terms of its underlying physical nature), providing an effective target ANFIS model through rule interpolation, with the support of its two adjacent source ANFIS models (about regions of an understood nature). In general, a target domain can be represented as \(\mathcal {A}_{t}\), whilst the two source domain ANFISs can be represented as \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\), respectively. The ANFIS interpolation mechanism is derived from the classical approach to Fuzzy Rule Interpolation (FRI) [27,28,29]. An overview of the ANFIS Interpolation approach is given in Fig. 1. To be complete, the common process of ANFIS interpolation is outlined in three key steps, as discussed in the following subsections.

Fig. 1
figure 1

Outline of ANFIS interpolation (adapted from [12])

3.1 Rule dictionary construction

The purpose of the first step, named rule dictionary, is to memorise all the antecedent and consequent parts of the fuzzy rules extracted from the two source ANFISs \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\). These two models or their underlying equivalent fuzzy rules are learned through conventional ANFIS training process. The memorised information will be utilised in the interpolation process of the target domain \(\mathcal {A}_{t}\).

For the present application to ISR, any fuzzy rules \(R_{i}\) (\(i\in \{1,2,\dots ,N\}\)) are considered as of a very simple form in the general TSK rule representation [30]. Particularly, the ith extracted rule can be expressed as follows:

$$\begin{aligned} R_{i}: {\mathrm {if}}\ x\ is\ A_{i}\ {\mathrm {then}}\ z_i= p_{i}x+r_{i} \end{aligned}$$
(1)

where x is an input variable, denoting the LR grey value at a certain location within a given image, with its fuzzy set value represented by \(A_i\); and \(z_i\) is the ith rule’s output (namely, the HR grey value) computed as a linear combination of the fuzzy value of x modified with two parameters \(p_i\) and \(r_{i}\). As such, ISR is considered as a regression problem with \(p_i\) and \(r_i\) being the regression coefficients, performing the regression from an LR image to an HR image.

Note that in general, an LR image may be first represented by its features extracted from it; therefore, the rule antecedent part is to be depicted with multiple attributes, each of which represents a certain image feature. An extension from the above simple form of TSK rules to such multi-antecedent rules is not difficult and the corresponding interpolation methods also exist [12], but this is beyond the scope of this paper.

From the fuzzy rules retracted from the given source ANFISs, their antecedent parts and consequent parts can be extracted, respectively. The resulting antecedent parts and consequent parts are collated separately to generate a rule dictionary (with any necessary reorganisation for easy indexing) such that

$$\begin{aligned} D=\{D_{a}, D_{c}\} \end{aligned}$$
(2)

where \(D_{a}\) is the part of the rule dictionary containing all antecedent parts of the fuzzy rules, expressed by

$$\begin{aligned} D_{a} = \{A_1\ A_2\ \cdots \ A_N\} \end{aligned}$$
(3)

and \(D_{c}\) is the part of the rule dictionary containing all consequent parts of the fuzzy rules, expressed as

$$\begin{aligned} D_{c} = \left[ \begin{array}{cccc} {p_1} & {p_2} & {\cdots } & {p_N} \\ {r_1} & {r_2} & {\cdots } & {r_N} \\ \end{array} \right] \end{aligned}$$
(4)

where each column in Eq. 4 indicates the linear coefficients in the consequent part of a certain rule. For instance, \(p_N\) and \(r_N\) represent the linear coefficients of the consequent part of the Nth TSK-type fuzzy rule, as per the general rule expression given in Eqn. (1).

As far as the total number of the rules is concerned, it is calculated by combining the number of all the rules from both the source ANFISs, namely ANFIS1 (\(\mathcal {A}_{s1}\)) and ANFIS2 (\(\mathcal {A}_{s2}\)). For instance, if the number of rules in \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\) is \(n_{1}\) and \(n_{2}\), respectively. Then the total number of rules in the rule dictionary will be \(N=(n_{1}+n_{2})\).

3.2 Intermediate ANFIS creation

Once the rule dictionary is constructed (in the previous step), the next step is to create an intermediate ANFIS. To do this, ANFIS interpolation is applied that works essentially by interpolating a group of fuzzy rules. The starting point is to divide the sparse training data into C clusters using the popular K-Means algorithm, with C being a domain-dependent number that may be specified empirically. However, a more advanced clustering method, e.g., one of those as described in [31, 32] may be used if preferred, particularly for situations where it is desirable to minimise human intervention, so that the required clustering process can be automated (whilst this is beyond the scope of the present work).

Conceptually, let the set of training data (sparse or not) be denoted as \(\{(x,z)\}\)). The generated clusters in the first step help to produce the fuzzy rules with each being interpolated from the centre of a different cluster. From these intermediate rules, an intermediate ANFIS is constructed by simply aggregating them together as per traditional interpretation of a set of TSK rules as an ANFIS. In implementation, the centre \(c^{(k)}\) of each cluster \(C_k\) is first computed, where \(k\in \{1,\ldots ,C\}\). To create an intermediate ANFIS, choose L closest rule antecedents \(\{A_i\in D_a, i=1,\ldots ,L\}\) with respect to \(c^{(k)}\). These rule antecedent parts are taken from \(D_{a}\) of the rule dictionary generated in the previous step. This is accomplished by the use of a distance metric, say for simplicity, \(d^{i} = d(A_i, c^{(k)})=|Rep(A_{i})-c^{(k)}|\), where \(Rep(A_i)\) indicates the representative value of the fuzzy set \(A_i\) [28]. Those L rule antecedents \(\{A_i\}\) whose distance \(d^i\) is smallest are selected, where \(\mathcal {L}\) stands for the index set. To reduce computational complexity, the value of L is chosen to be two [33], unless otherwise stated.

The next step of intermediate ANFIS generation is to identify the best reconstruction weights for the closest rules which were selected in the previous step. Such a problem is an optimisation problem which can be resolved by computing the following:

$$\begin{aligned} w^{(k)}=\min \limits _{w^{(k)}}||c^{(k)}-\sum _{i \in \mathcal {L}}{Rep(A_i)w_{i}^{(k)}}||^{2},\ s.t.\ \sum _{i \in \mathcal {L}}{w_{i}^{(k)}}=1 \end{aligned}$$
(5)

where \(w_{i}^{(k)}\) implies the relative weighting of \(R_i\). The constraint for such an optimisation problem is that the sum of all weights is equal to one and it can be established that the solution to this problem is:

$$\begin{aligned} w^{(k)}=\frac{G^{-1}\mathbf {1}}{\mathbf {1}^{T}G^{-1}\mathbf {1}} \end{aligned}$$
(6)

where 1 indicates a column vector of ones, \(G=(c^{(k)} \mathbf{1} ^{T}-Y)^{T}(c^{(k)} \mathbf{1} ^{T}-Y)\) denotes the well-defined Gram matrix, and the chosen rule antecedents are shown in the columns of Y, which stands for a matrix containing values of the selected neighbours of \(c^{(k)}\). To aggregate information embedded within the kth cluster, weights \(w^{(k)}\) are applied to both the collated antecedent parts and the consequent parts. This practice follows the traditional FRI approaches as discussed in [28, 34], with the result being represented as:

$$\begin{aligned} R_{k}: {\mathrm {if}}\ x\ is\ A_{k},\ {\mathrm {then}}\ z_k=p_{k}x+r_k \end{aligned}$$
(7)

where \(k=1,2, \ldots\), C with the following parameters in describing the intermediate, interpolated ANFIS:

$$\begin{aligned} A_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}A_{i}}, \ p_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}p_{i}}, \ r_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}r_{i}} \ \end{aligned}$$
(8)

with \(k=1,2, \ldots\),C.

3.3 ANFIS fine-turning

The next and final step is to fine-tune the previously generated intermediate, interpolated ANFIS. That is, the interpolated ANFIS is treated as an initial input to this ANFIS fine-tuning process in order to obtain the final ANFIS for the target domain. This fine-tuning process makes direct use of the standard ANFIS training technique as discussed in [10]. Importantly, the otherwise challenge of fine-tuning an ANFIS with limited training data is resolved. This is because the initial setup for the expected network of fine-tuning is provided by the interpolated intermediate ANFIS.

3.4 Algorithm summary

The process of ANFIS interpolation as outlined in the above three steps can be summarised as given in Algorithm 1, for easy reference.

figure a

3.5 Complexity analysis

As shown in Algorithm 1, ANFIS interpolation is comprised of three phases, namely: (1) Rule dictionary construction, (2) Intermediate AFIS creation, and (3) Fine-tuning process. Note that computationally, the last phase (fine-tuning) is the same for traditional ANFIS training. Thus, for complexity analysis, only the first two steps are explored. In the proposed implementation, triangular membership functions defined by their respective three characteristic points [35] are employed for fuzzy set representation, owing to their popularity and simplicity in the existing FRI literature.

The main job of the first step in ANFIS interpolation is to extract rule antecedent parts (called premise parameters in the literature) \(D_a\) and rule consequent parts (parameters) \(D_c\) of the fuzzy rules, from the well-trained two source ANFISs, namely \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\). In the present application for ISR, ANFIS can be represented as a 5-layer network [6]. Thus, antecedent parts are obtained at the first layer, whilst the consequent part of the rules are obtained at the fourth layer. The antecedent parameters are essentially (triangular) fuzzy sets; hence, each contains three sub-parameters, \(a_0\), \(a_1\), \(a_2\) for instance. The total number of antecedent parameters is therefore 3N. The number of parameters for the consequent part is obviously 2N. Hence, the time complexity to extract one single rule from the source ANFISs is O(5N). Similarly, the time complexity to compute all rules of the rule dictionary from the source ANFISs is \(N \times O(5N) = O(5N^2)\).

The second step in ANFIS interpolation is a bit more complex, consisting of C times repetitive three sub-stages: (a) computing Euclidean distance and sorting the results, with a time complexity for the two operations being O(N) and \(O(N^2)\), respectively; (b) computing the weights for each possible rule to be chosen, hence its complexity is O(L); and (c) computing weighted average for each rule with the complexity of O(5N). Hence, the estimated time complexity for the second step is \(C \times ([O(N) + O(N^2) + O(L) + O(5N) ] = O(Cn^2))\).

Collectively, the entire time complexity of the proposed ANIS interpolation including the third fine-tuning step is estimated as \(O(5N^2) + O(Cn^2) = O(Cn^2)\) (since normally \(C>5\)). In nutshell, due to the fact that both N and C are not a large number, and the proposed implementation is generally practical.

4 Implementation

This section describes the implementation of both the training and the testing phases of the proposed sparse data-based ISR algorithm. The overall flowchart is depicted in Fig. 2.

Fig. 2
figure 2

Flowchart for implementation of proposed ISR, reflecting both training and testing phases

4.1 Implementation of training phase

For a given application problem, a number of steps need to be followed in order to construct a set of training images. If the given problem is for the analysis of natural images (as to be dealt with in part of the experimental investigation later), then such steps are: (i) Collecting sharp natural images with various types of image, including: animals, people, plants, buildings, etc. as HR images (typically in Bitmap format). (ii) Using a certain scale factor of s to downsample the training HR images to form the corresponding LR images. (iii) Running bi-cubic interpolation with a predefined scale size to scale up the downsampled version of LR images, such that the size of both LR and HR images becomes the same, but the LR images are still of the lower resolution.

The results from the above steps form the required training data in terms of pixel pairs of LR-HR images. Such data are partitioned into source and target domains, with the source domain containing sufficient data and target domain having sparse data. In particular, the existing data set is partitioned into several sub-datasets of sufficient and sparse training data, again in terms of image pixel pairs. Each subset of sufficient data is fed to the standard ANFIS training process [10] (which is a simple procedure) to derive the required a source domain ANFIS. For those sparse sub-datasets (each of which lacks sufficient training data), the same standard ANFIS learning process is not feasible to produce any accurate network. Consequently, to improve the performance of the sparse data-based training process, ANFIS interpolation is employed to generate the required ANFISs. In nutshell, standard ANFISs and interpolated ANFISs are learned for the image datasets with sufficient and those with sparse data, respectively.

From those well trained ANFISs, two ANFISs which are in the nearest neighbourhood of a given sparse data subset are selected to act as the source ANFISs, in an effort to interpolate the target ANFIS that will cover the sparse data and the similar data points not experienced before. The underlying neighbourhood relationship between data sets is calculated on the basis of their topological locations of the image pixels. To calculate the distance between the centres of two subsets of data, Euclidean distance metric (or any other if preferred) is employed. From this, the ANFIS in a certain target domain can then be computed through the process of ANFIS interpolation as previously discussed, using the two nearest source ANFISs. That is, with respect to the rule dictionary, two nearest antecedents from \(D_a\) are selected to perform ANFIS interpolation. Note that two nearest neighbours are used here as with the common practice in the literature, to minimise computational cost (in addition to minimising the number of source ANFISs). Mathematically, however, more neighbouring ANFISs may be used to perform the interpolation in the exact same way.

To differentiate between sufficient and sparse data sets, a naive way is to employ a manually set threshold: If the cardinality of a subset is greater than the threshold, it is then regarded as one that contains sufficient data. However, this adversely affects the automation level of the approach and also, makes it sensitive to such a manually defined value. In this work, to increase the automation level, the following approach is used instead to classify any data subset as either a sufficient or sparse one. Let there be i data subsets, compute the cardinality of the data points in each subset \(n_i\). After that, calculate their average \(n_a\). If the cardinality of the data in the ith subset \(n_i < \alpha n_a\), then this subset is deemed to be a sparse data set and vice versa for a sufficient data subset. Here, \(\alpha\) is a pre-determined small coefficient that may be empirically adjusted.

4.2 Implementation of testing phase

ANFIS interpolation for ISR works on raw pixel values. The range of the pixels in an LR image is fuzzified with respect to a qualitative quantity space defined by a certain number of fuzzy numbers [35] at the input of each ANFIS network. In this work, for illustration of the underlying ideas, the number of fuzzy values is specified by three. That is, an LR image can be re-expressed using three categorical pixel values: small, medium and large. Without losing generality, the illustrative implementation is presented surrounding three trained ANFISs: two source networks ANFIS \(\mathcal {A}_{s1}\), ANFIS \(\mathcal {A}_{s2}\) and an interpolated target ANFIS \(\mathcal {A}_{t}\). This is sufficient to explain the proposed ideas since any other target ANFISs can be derived by following the same approach.

As the illustration of the implementation that tests an interpolated ANFIS, suppose that Small and large pixel values are fed into the two source ANFISs \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\), respectively, whilst \(\mathcal {A}_{t}\) is fed with those pixel values within the medium range. From this, the interpolated ANFIS learned from the other two networks, supported with the sparse data subset of medium pixels, is set to take novel LR medium pixel values as input and translates them into respective HR pixel values. The results of this process are then converted into conventional representation of an HR image, supported with post-processing that is applied to further reduce noise.

The implementation of the testing phase is to facilitate the analysis and comparison of the proposed approach with potential competitors. Given the extremely limited literature for FRI application to the problem of ISR, three models are considered in this initial work: Model 1 is the reference model that has been trained with sufficient data using the standard ANFIS learning mechanism in the target domain (note that this is of course for evaluation purpose, in reality such domains do not have sufficient data for learning the network); Model 2 is one that has been trained using just sparse data with the standard ANFIS learning mechanism (where the otherwise available sufficiently available training data is deliberately reduced to a rather sparse dataset); and Model 3 is the one that implements the proposed ANFIS interpolation-based approach for ISR, namely, an ANFIS trained by interpolating two nearest neighbouring well-trained ANFISs. These are summarised as shown in Table 1.

Table 1 Three implemented models for evaluation

Note that the aforementioned three models are given the same information on both source domains (1 and 2) but different information content on the target domain (which is to be identified). Significant differences therefore exist in the implementation of these models: Model 1 is the reference model, performing the LR to HR nonlinear mapping with an ANFIS well trained using a sufficient amount of data in the target domain without the need of any interpolation. Model 2 performs the mapping using an ANFIS trained with the sparse data in the target domain, without involving interpolation. Model 3 implements the proposed approach, performing the nonlinear mapping from an LR image to an HR image in the target domain that contains only sparse data, using the ANFIS interpolated from two neighbouring source ANFISs.

As described previously, during the training phase, a given training dataset is partitioned into a certain number of clusters along with the identified centre of each corresponding cluster. Based on such a partition, an ANFIS is either interpolated or trained (depending on whether the data set is sparse or sufficient) per cluster (and its centre). In the testing phase, these trained ANFISs act as the mapper that translate an LR pixel representation into its corresponding HR pixel value. That is, when a testing image X is present, the learned ANFISs simply serve as a nonlinear mapper, working through the following steps.

From the newly presented LR image, partition its pixels with respect to the clustered subsets returned by the training phase. This is implemented by computing the topological distance (using Euclidean distance metric) between an input LR pixel and the individual cluster centres. Then, the ANFIS whose corresponding cluster centre has the smallest distance with the testing LR pixel value is chosen. Each pixel value of the input image is therefore fed through the chosen ANFIS, resulting in the outcome which is regarded as the raw SR pixel value at the corresponding topological location as that of the LR pixel being mapped. Repeating this process for each of the input pixels of the entire LR image will complete the nonlinear translation of the LR image to the HR one, leading to a raw reconstructed SR image. Such a raw image may contain additional noise caused by the inference mechanism. Thus, a denoising filter such as Non-Local Means (NLM) [36] can be exploited to reduce the noise effects. To further refine the SR image, a popular post-processing technique, named Iterative Back Projection (IBP) [37], is employed. This completes the required ISR process, obtaining a HR image that is of higher resolution than the original LR image. The entire process of the proposed implementation is summarised in Algorithm 2.

figure b

5 Experimental evaluation

This section provides the experimental results on the evaluation of the proposed approach. Two common performance criteria are considered for performance analysis.

5.1 Performance criteria

Amongst typical performance metrics for ISR, peak-signal-to-noise ratio (PSNR) and Structure SIMilarity (SSIM) are the two most commonly adopted. Thus, these are used herein also. Generally speaking, the larger the values of both PSNR and SSIM the better the performance of the reconstructed HR images.

5.1.1 Peak-signal-to-noise ratio (PSNR)

As the name of this metric indicates, it calculates the peak-signal-to-noise ratio between a given pair of images. It is based on the assessment of the conventional mean square error (MSE). In fact, both PSNR and MSE can be used as the performance metrics for ISR. In particular, MSE is the cumulative squared error between the ground truth (original) HR image \(\mathbf {Y}\) and the reconstructed HR image \(\hat{\mathbf {Y}}\), which is expressed as follows:

$$\begin{aligned} MSE=\frac{\parallel \mathbf {Y}-\hat{\mathbf {Y}} \parallel _{F}^{2}}{MN} \end{aligned}$$
(9)

where \(\parallel . \parallel _{F}\) denotes the Frobenius norm [38] of a matrix with the length and width of an image being denoted by M and N, respectively. It is evident from the nature of PSNR that its values are wide-ranging; therefore, to achieve a consistent range of values (for easy illustrating the relative performance comparison), logarithmic decibel is employed as follows:

$$\begin{aligned} PSNR=10\log _{10}(\frac{V_{max}^{2}}{MSE}) \end{aligned}$$
(10)

where \(V_{max}\) denotes the maximal pixel value of any image. In this work, it is taken as 255.

5.1.2 Structured SIMilarity (SSIM)

Again, as hinted by its name, SSIM is used to calculate the similarities between the ground truth (original) HR image \(\mathbf {Y}\) and the reconstructed (estimated) HR image \(\hat{\mathbf {Y}}\), which is expressed in the following:

$$\begin{aligned} SSIM=\frac{4\mu _{\mathbf {Y}}\mu _{\hat{\mathbf {Y}}}\sigma _{\mathbf {Y},\hat{\mathbf {Y}}}}{(\mu _{\mathbf {Y}}^{2}+\mu _{\hat{\mathbf {Y}}}^{2})(\sigma _{\mathbf {Y}}^{2}+\sigma _{\hat{\mathbf {Y}}}^{2})} \end{aligned}$$
(11)

where \(\mu _{\mathbf {Y}}\) and \(\mu _{\hat{\mathbf{Y }}}\) are the mean values, and \(\sigma _\mathbf {Y}\) and \(\sigma _{\hat{\mathbf {Y}}}\) are the corresponding standard deviation values, of the pixel values within images \(\mathbf {Y}\) and \(\hat{\mathbf {Y}}\), respectively. Obviously, the values of SSIM vary between 0 and 1.

5.2 Experimental results

To evaluate each compared ANFIS model, test images that are commonly used in assessing the quality of an ISR solution mechanism in the literature are employed here. Particularly, the dataset used in running the experimental studies consists of natural images involving different objects such as human, animal, aeroplane, flowers, etc. The images are publicly available and can be obtained from [39]. Experimental results by employing both sparse data-based as well as sufficient data-based models are included in the experimental evaluation.

Recall that this research is aimed to produce a novel and working algorithm for sparse data-based ISR. The experimental analysis is therefore designed to consist of two sets of distinct experiments on ISR. First set concerns with the situation where full training data are available, and the second is with sparse data. This helps reflect the potential benefits of applying the sparse data-based ANFIS interpolation approach should there be no full data available. It also supports the comparison between the competitive performance of the proposed implementation with that of the ISR algorithms in the literature. For this purpose, in the experimental investigations, although full data sets are given, a large portion of them are deliberately deleted in order to generate sparse data subsets. In so doing, the interpolated ANFISs will have best reference models possible to compare their performance against. In both (full and sparse) experimental settings, the following are used as the common parameters to ensure fair comparison: (i) Original (ground truth) image size (\(\mathbf {Y}\)) = \(256\times 256\) each; (ii) Scale factor = 2; and (iii) Number of subsets of pixel pairs, \(K=3\) (clustered using K-Means for easy illustration).

5.2.1 Experiments with full data

This part of the experimental evaluation is designed for the experiments that are involved in the situation where all data subsets are covered by sufficient training data. In particular, the standard ANFIS model trained with sufficient data is compared with bicubic interpolation as well as with two other existing ISR techniques, namely fuzzy rule-based ISR [3] and sparse representation based ISR [4]. Such results form the best-case scenario and hence are considered as an ideal (or reference) case study. Table 2 presents the quantitative results for this set of experiments, where SD denotes Standard Deviation and those highlighted in bold represent the best outcomes obtained across the different methods employed per image. Qualitatively, Fig. 3 illustrates the resulting images, with their respective patches shown at the top right corner of each image.

Table 2 Quantitative results of full data experiments with scale factor being 2
Fig. 3
figure 3

Qualitative results of full data experiments with scale factor being 2

As shown in Table 2, the average PSNR and SSIM values of all ten test images for this implementation are both better than those achievable by the other compared methods. Particularly, the proposed ANFIS implementation outperforms the others against individual images over more than half of all tested images, in terms of both PSNR and SSIM. Considering SSIM as an individual performance metric alone, the ANFIS implementation performs the best in eight out of the ten images, and there is just a minor performance variation from the best for the remaining two images (hat and butterfly). Moreover, the lower SD values that the present approach leads to indicate the robustness of the algorithm. Overall, it is significantly evident that the proposed ANFIS implementation is of superior performance for the challenging application of ISR than the existing techniques, provided that the sufficient training data set is available for training. Of course, this is not the key contribution of this work, which is aimed to introduce a working method for situations where only sparse data are available overall. This is to be shown next.

5.2.2 Experiments with sparse data

The main objective of this research is to cope up with situations where for certain regions of the given LR images only sparse training data are available. It is therefore time to consider and contrast the three types of model as presented in Table 1. The results from the previous subsection for the condition of sufficient data are for the reference model, demonstrating that Model 1 offers promising results against existing ISR techniques in the literature. To continue the present experimental evaluation, the performance of the proposed approach, Model 3, that is implemented with sparse data-based ANFIS interpolation, is compared with this reference model as well as with the standard ANFIS model, Model 2, which is trained with sparse data following the conventional ANFIS learning mechanism, without involving rule interpolation.

Note that as indicated previously, for the easier illustration, the number of data subsets K for each image considered is set to three, while presuming that the sufficient training data are available for subsets 1 and 3 and only sparse data are available for subset 2. Note also that, to have a challenging situation for all models considered, a large portion of about \(98\%\) from subset 2 of the original sufficient data is purposefully and randomly removed. In order to enhance the performance of ISR, post-processing techniques generally play an important role and most of the existing ISR techniques incorporate it in the algorithms. As indicated previously, post-processing is also employed in all experimental setups herein, equally applied to every individual model.

Figure 4 shows the qualitative results with detailed patches illustrated in the upper right corner of each image. Table 3 lists the quantitative results from the experiments regarding both PSNR and SSIM measures. Importantly, the difference between the average values in terms of either PSNR or SSIM is very minor concerning Model 2 (reference model) and Model 1 (proposed model). It can be observed from the table that the proposed model greatly outperforms the standard ANFIS model without interpolation. These results demonstrate that for the problem of ISR with sparse data, the proposed approach using ANFIS interpolation offers a robust solution.

Fig. 4
figure 4

Qualitative results of sparse data experiments with scale factor being 2

Table 3 Quantitative results of sparse data experiments with scale factor being 2

6 Conclusion

This paper has presented a novel approach for sparse data-based single frame image super-resolution, by exploiting ANFIS Interpolation techniques. Comparative experimental evaluation has been done against different existing ISR methods. It demonstrates that for images involving sparse training data, the results obtained from the proposed approach are on par compared to the standard original ANFIS that is trained with full data (which of course is not necessarily available in a real-world setting).

Whilst the proposed approach is shown to offer significant potential in dealing with challenging ISR problems, it is presently assumed to handle static images. For general ISR problems, e.g., those involving video sequence, it is highly desirable to develop a dynamic learned ANFIS model. Work to address this issue is on-going, by learning from the method reported in [40] where the underlying fuzzy rule models are represented in conventional Mamdani form, instead of in TSK form (of which ANFIS may be seen as a dialect). How other SR techniques, particularly those developed in the field of deep learning [41], may benefit from adapting the present work to help reduce their demand for a huge amount of training data remains an active area of research.