1 Introduction

Image super-resolution (ISR) is a subfield of computer vision and image processing that transforms low-resolution (LR) images into high-resolution (HR) images through enhancing the resolution of the former. HR images contain more pixels and hence, show more detailed information, whereas LR images show less detail and may involve considerable missing information due to a lower pixel count or lower pixel density. In certain situations, images obtained from different cameras may not be clear, and detailed information may be difficult to extract due to blurring or noise effects. Besides, high-quality cameras can be expensive, making it difficult for everyone to afford them and hence to gain direct access to HR images. Indeed, LR images or images with areas of sparse data are common problems encountered when capturing images with conventional cameras.

The problem of ISR has garnered significant attention from the image processing and computer vision community [1]. Given its importance, ISR is widely exploited in various application areas, such as medical diagnosis [2], security surveillance [3], remote sensing for earth observation [4, 5], astronomical observation [6], face recognition [7,8,9], reconstruction [10], vehicle identification [11], and Martian rock type classification [12] among others. High-quality HR images are simply desired in daily life with almost every application problem. Fortunately, companies and organisations can acquire such images without always needing to purchase expensive image-capturing equipment, as ISR techniques can be used to enhance images from a software perspective.

Previously, the main focus of ISR techniques was on the frequency domain, with wavelet or Fourier transform being the major techniques used to convert LR images into HR ones. However, such techniques typically ignored both the degradation process and the prior spatial information about the images concerned [1], restricting their ability to cope with complex situations. To address these limitations, contemporary ISR techniques based on the spatial domain have been developed, including learning-based and reconstruction-based methods [13]. The working mechanism of learning-based approaches is similar to data-driven machine learning. For instance, fuzzy rules learning-based techniques have been used to solve ISR problems, wherein the learned fuzzy rules are exploited to derive non-linear relationship mappings between LR and corresponding HR images [14]. On the other hand, reconstruction-based ISR methods utilise appropriately designed priors, such as edges, to recover missed details during the image reconstruction process [15, 16].

To aid in the successful addressing of challenging ISR problems, this paper proposes a novel lightweight FRFS-based ANFIS [17] interpolation approach that substantially extends the most recently developed ANFIS interpolation technique [13]. A pre-processing step is introduced that consists of feature extraction (FE) and fuzzy rough set-based feature selection (FRFS) mechanisms [18,19,20]. In particular, FRFS is adopted to facilitate the overall ISR system to work in uncertain and imprecise domains while maintaining interpretability. It offers an overall computationally efficient technique without compromising performance, inspired by the observation that many algorithms work on meta-level features of the images rather than their pixels, as pixel-based processing generally consumes more time. The existing ANFIS interpolation solution forms the core to deal with problems where certain images or regions of images are missing and hence, there are insufficient training data to develop a functioning ANFIS with existing ANFIS-learning-based techniques. This tackles a common yet important problem in data-driven ISR.

Following such an approach, in implementation, two adjacent ANFIS models are trained with sufficient data if and when such data is available, with the aim of placing a developing ANFIS model that addresses a certain region of sparse data in between them. As a result, a novel FRFS-ANFISI approach is established here, which works with features instead of pixels and employs the most important features extracted from the images, making the learned model lightweight. To evaluate the working efficacy of the proposed approach, a number of performance metrics are employed, including PSNR and SSIM, assessing the results on the following SR benchmark datasets: Set5 [21], Set14 [22], BSD100 [23], and Urban100 [24]. Promising experimental outcomes are achieved, reflecting the performance of FRFS-ANFISI both quantitatively and qualitatively

The remainder of the paper is organised as follows. Section 2 discusses the system architecture of the proposed approach. Section 3 contains the implementation specifications in principle. Section 4 presents an experimental evaluation of the proposed techniques. Finally, section 5 concludes the paper with interesting further work pointed out.

2 System architecture

The proposed FRFS-supported ANFIS interpolation approach for addressing the ISR problem is depicted in Fig. 1. Its implementation process consists of three phases: feature selection, training, and testing.

Fig. 1
figure 1

Framework of proposed approach, involving three phases of feature selection, training and testing

2.1 Selection of extracted features

In computer vision, one of the popular techniques of image processing is to break down an image into local regions of interest and then, describe the image using certain characteristics or features regarding the individual regions. Such a technique helps to reduce the complexity of actual image data while using their existing characteristics for further processing. Typically, Keypoint detector is used to detect salient regions of an image consistently irrespective of various transformations (e.g., scale, rotation) applied to the image, while Keypoint descriptor is used to extract the most distinctive and useful information from the already identified salient regions of that image [25]. There are many mature techniques for building feature detectors and descriptors in the literature (e.g., [25,26,27,28,29,30]). In this paper, BRISK [25] and KAZE [28] are adopted for the proposed approach owing to their popularity and observed suitability. As these methods are standard in the literature, only a brief introduction is provided here for completion; more details about them can be found in the literature (as referenced).

  • BRISK features: Extraction of binary robust invariant scalable keypoints (BRISK) is a technique to detect (typically corner) point objects from a given image. It is both scale-invariant and rotation-invariant, different from many other (corner) feature detection techniques (e.g., FAST [31] and ORB [32]). A comprehensive evaluation of benchmark datasets reveals its adaptiveness and high-quality performance as with state-of-the-art feature extraction algorithms such as SURF [33] and SIFT [29].

  • KAZE features: The extraction of this type of features is aimed to reduce noise, retain object boundaries, and ultimately achieve superior localisation accuracy [28]. It is a two dimensional multi-scale technique used to automatically detect features of interest at different scale levels or resolutions. KAZE outperforms state-of-the-art techniques such as SURF and SIFT, although it may be less computationally efficient.

Not all features extracted from a given image are important, and some may even be irrelevant or contain noise and be misleading, which will adversely affect the results of subsequent processing. Thus, it is crucial to identify unnecessary features and remove them from the input data. Various feature selection (FS) techniques have been proposed [34], but in many cases, it is not feasible to acquire an optimal feature subset through the required exhaustive search. When dealing with uncertain and ambiguous information, fuzzy rough-set (FR) theory has been shown to be a suitable approach [35]. Therefore, in the proposed approach, FR feature selection (FRFS) techniques [18] are employed, having recognised that fuzzy sets can handle vagueness, while rough sets can handle indiscernibility.

FRFS analyses only the patterns hidden in the data and requires no additional information, such as external expert knowledge or threshold value, for analysis. It works relying upon minimal knowledge representation for the underlying data. While there are many nature-inspired optimisation algorithms that work as subset search mechanisms in FRFS, the approach proposed herein uses particle swarm optimisation (PSO) and genetic algorithm (GA) due to their demonstrated effectiveness [19, 20]. Details on both algorithms and their corresponding encoding in FRFS are summarised in the following.

2.1.1 PSO-based fuzzy rough feature selection (PSO-FRFS)

Being a stochastic algorithm, PSO is commonly used in resolving optimisation problems [36], where a group of particles (representing potential solutions with each having a position and a velocity) move through the search space to seek the optimal solution. The movement is influenced by the particle’s own experience and the experience of the group. At each iteration, the position of each particle is updated with respect to its velocity and the best position visited so far. The velocity of each particle is updated on the basis of: the velocity itself, the difference between its position and the best position it has visited, and the difference between its position and the best position of the group. The process continues until a stopping criterion is met or a maximum number of iterations is reached.

The PSO can be employed in FRFS to optimise the search for the optimal subset of features from the given feature space [19]. Without losing generality, consider the feature space derivable from a given image described in the form of a set of feature subsets. Then, each feature subset can be regarded as a position or a point in this space. There can be \(2^N\) possible types of subsets for N features, and each type may be different in terms of both the subset size (number of elements) and the elements contained. The ideal feature subset would be the one with the lowest number of features without compromising the performance of subsequent processing with the selected subset.

Introducing the particle swarm into the feature space with each particle occupying one position, each particle moves in the space having the aim of exploring and finding the optimal position. To do so, each particle tries to find its best local and global position by iteratively changing its current position and communicating with others. During the iterative process, it is directed to reach the desired optimal solution, thereby identifying the best subset of features within the given feature space [19]. This ability of particles exploring the problem space significantly enhances computational efficiency, on required time and memory, for implementing the process of selecting an optimal subset of features in FRFS. Of course, the underlying fuzzy-rough feature dependency-measures are not altered, retaining the mathematical rigour of the generic FRFS approach.

2.1.2 GA search-based fuzzy rough feature selection (GA-FRFS)

Genetic Algorithm (GA) is another commonly used population-based computational method for solving search and optimisation problems. It implements a meta-heuristic process inspired by natural selection, following biological operators such as chromosome representation, selection, crossover, mutation, and fitness evaluation [20, 37]. More specifically, the GA algorithm begins with the random generation of an initial population \(P_0\), where each individual I represents a potential solution to the problem at hand. A fitness function f is defined to quantify the quality of each individual. Parents are selected from the previous generation based on a selection probability P(I), usually proportional to f(I). The crossover operator C is then applied to pairs of parents, producing offsprings that inherit genetic information. A mutation operator M introduces small, random changes to some individuals. The resulting offsprings, along with surviving individuals from the previous generation, form the new generation \(P_t\). This process iterates for a predetermined number of generations T. The algorithm concludes by identifying the best individual(s) in the final population \(P_T\) based on the fitness function.

The utilisation of GA to aid in implementing FRFS involves the random initialisation of the chromosome population, followed by the calculation of the fitness of each chromosome (which encodes a feature subset) and the selection of the best solution, regarding both individual chromosomes and on the entire population, based on their fitness values. Then, a crossover probability is applied to these local and global bests to generate offspring, which are subject to a mutation probability. The resulting offspring are stored as a new population, and this process iterates until a termination criterion is reached [37]. Once again, this application of a GA does not change the underlying measurements of features or feature subsets, but help accelerate the optimal feature subset search process.

2.2 ANFIS interpolation

Recall Fig. 1, the proposed FRFS-ANFISI approach involves a development process consisting of three learning phases: feature selection, training, and testing. The feature selection phase is performed first, followed by the training and testing phases. Algorithms 1 and 2 provide step-by-step instructions for the training and testing phases, respectively. The common techniques facilitating these two phases are detailed below.

Algorithm 1
figure a

Training phase of proposed approach.

Algorithm 2
figure b

Testing phase of proposed approach.

The ANFIS interpolation technique, whose original ideas are described in [15], aims to overcome the issue of sparse data in the target domain (where less familiar image regions lie). It facilitates the construction of an effective target ANFIS model using rule interpolation, based on two neighbouring source ANFIS models that represent nearby regions to the target domain whose nature is substantially better known. For completeness, a brief introduction to the underlying ANFIS system is first given in the following. The general ANFIS interpolation process is outlined, which can be divided into three stages as described subsequently.

2.2.1 Outline of ANFIS

ANFIS [17] stands for a type of fuzzy inference system that is implemented within the framework of an adaptive network. The information content of an ANFIS includes the network structure and the corresponding parameter learning mechanism. In general, there are totally five layers in a general ANFIS structure. An example of two-input and one-output system is illustrated here, with each input variable of the system described by two fuzzy sets. In this case, the system’s rule base contains 4 fuzzy if-then rules of the so-called TSK type [38], as expressed below:

Rule1: If \(x_1\) is \(A_{1}\) and \(x_2\) is \(A_{3}\), then \(y_{1}=p_{1}x_1+q_{1}x_2+r_{1}\)

Rule2: If \(x_1\) is \(A_{1}\) and \(x_2\) is \(A_{4}\), then \(y_{2}=p_{2}x_1+q_{2}x_2+r_{2}\)

Rule3: If \(x_1\) is \(A_{2}\) and \(x_2\) is \(A_{3}\), then \(y_{3}=p_{3}x_1+q_{3}x_2+r_{3}\)

Rule4: If \(x_1\) is \(A_{2}\) and \(x_2\) is \(A_{4}\), then \(y_{4}=p_{4}x_1+q_{4}x_2+r_{4}\)

Structurally, such an ANFIS can be illustrated as shown in Fig. 2 [15], where square nodes in the first and fourth layer are the adaptive nodes with modifiable parameters, and circle nodes in the remaining layers involve fixed operations without parameters. In the first layer, each square node is defined with a membership function \(\mu _{A_{i}}(x)\), where \(i \in \{1, \cdots , 4\}\), x is the input variable and \(A_{i}\) is a fuzzy set defining an imprecise value of x. The most popularly applied function is triangular shaped due to its simplicity, which is defined by

$$\begin{aligned} \mu _{A}(x)=\left\{ \begin{array}{ccl} {k_{1} x+b_{1}} &{}\ \ \ \ \ {} &{} {a_{0} \le x \le a_{1}} \\ {k_{2} x+b_{2}} &{}\ \ \ \ \ {} &{} {a_{1} \le x \le a_{2}} \\ 0 &{}\ \ \ \ \ {} &{} otherwise \end{array} \right. \end{aligned}$$
(1)

where \(k_{1}=1/(a_1-a_0)\), \(b_{1}=-a_0/(a_1-a_0)\), \(k_{2}=1/(a_1-a_2)\), and \(b_{2}=-a_2/(a_1-a_2)\) are named premise parameters, with \(\{a_{0}, a_{1}, a_{2}\}\) being the three vertexes of the triangular membership function.

Fig. 2
figure 2

Example of ANFIS encoding four rules

The second layer, Layer 2 multiplies the incoming membership value of each variable and outputs the product: \(w_{i}=\mu _{A_{i}}(x_1)\times \mu _{A_{j}}(x_2)\) (\(i \in \{1, 2\}\), \(j \in \{3, 4\}\)). The output of this layer \(w_{i}\) stands for the firing strength of a certain rule. Then Layer 3 normalises each rule’s firing strength by computing the relative proportion of a given rule’s firing strength to the total of N rules’ firing strengths \(\bar{w_{i}}=w_{i}/{\sum _{j=1}^N w_j}\) (here \(N=4\), \(i \in \{1, \cdots , 4\}\)). In the fourth layer, each node is a square node with a linear function \(\bar{w_{i}}y_{i}=\bar{w_{i}}(p_{i}x_1+q_{i}x_2+r_{i})\), where \({p_{i},q_{i},r_{i}}\) are referred to as consequent parameters. Finally, Layer 5 is the output layer, computing the overall output in response to the current input, i.e., \(y=\sum _{i}{\bar{w_{i}}y_{i}}\).

2.2.2 Rule dictionary construction

Suppose that for a given cluster of data, the two source domain ANFIS models and the target domain model are denoted as \(\mathcal {A}_{s1}\), \(\mathcal {A}_{s2}\), and \(\mathcal {A}_{t}\), respectively. The two source ANFISs each contain fuzzy rules which are learned by following the conventional ANFIS training process, given the relevant regions are well known (or with sufficient training data). The task of constructing a rule dictionary is to store rule antecedent parts and rule consequent parts of the corresponding fuzzy rules. Then, the interpolation process is applied to exploit such information of the rule dictionary to learn \(\mathcal {A}_{t}\) in the target domain.

LR images can be represented by the features extracted from them, which can be used to represent the rule antecedent parts with multiple attributes that correspond to specific features of the image. For the present ISR application fuzzy rules \(R_{i}\), \(i\in \{1,2,\dots , N\}\), are of the first-order TSK form [38], such that

$$\begin{aligned} R_{i}: if\ y\ is\ A_{i}\ then\ z_i= p_{i}y+r_{i} \end{aligned}$$
(2)

where y is an input feature selected by FRFS (using either PSO or GA to optimise the search), representing an LR image element at a particular location, and its fuzzy set value is represented by \(A_i\); \(z_i\) is the consequent of this rule (signifying the corresponding HR element), which is calculated as a linear combination of two parameters \(p_i\) and \(r_i\) together with the fuzzy value of y. Thus, ISR is a regression problem, and \(p_i\) and \(r_i\) are regression coefficients which enable the regression operation by transforming an LR image into an HR one.

To construct a rule dictionary, the antecedent and consequent parts are extracted from the fuzzy rules retrieved from \(\mathcal {A}_{s1}\) and \(\mathcal {A}_{s2}\). The resulting antecedents and consequent parts are then separated and arranged for easy indexing, such that

$$\begin{aligned} D=\{D_{a}, D_{c}\} \end{aligned}$$
(3)

where \(D_{a}\) and \(D_{c}\) represent the antecedent parts and the consequent parts within the rule dictionary, respectively. More specifically, \(D_{a}\) containing all antecedents of the fuzzy rules is represented by

$$\begin{aligned} D_{a} = \{A_1\ A_2\ \cdots \ A_N\} \end{aligned}$$
(4)

and \(D_{c}\) containing consequent parts of the fuzzy rules is represented by:

$$\begin{aligned} D_{c} = \left[ \begin{array}{cccc} \vspace{1ex} {p_1} &{} {p_2} &{} {\cdots } &{} {p_N} \\ {r_1} &{} {r_2} &{} {\cdots } &{} {r_N} \\ \end{array} \right] \end{aligned}$$
(5)

where each column represents the linear coefficients in the consequent part of a certain rule.

2.2.3 Intermediate ANFIS creation

After constructing the rule dictionary, the second stage involves creating an intermediate ANFIS. The initial step in this stage involves the use of the K-means algorithm (although a more advanced clustering method, e.g., one of those described in [39, 40], may be used if preferred) to divide the sparse training data into C clusters.

Generally, let the set of training data (sparse or not) be denoted as (xz). By generating clusters in the initial stage, it becomes possible to create fuzzy rules that are interpolated from the centre of distinct clusters. The intermediate rules are aggregated to construct an intermediate ANFIS, following the conventional interpretation of a set of TSK rules as an ANFIS. During implementation, the first procedure is to calculate the centre \(c^{(k)}\) of each cluster \(C_k\), where \(k\in \{1,\dots ,C\}\). To construct an intermediate ANFIS, the L nearest rule antecedents \({A_i\in D_a, i=1,\dots ,L}\) are selected, on the basis of their proximity to \(c^{(k)}\). These rule antecedents are extracted from \(D_{a}\) of the rule dictionary generated in the previous stage. To achieve this, a distance metric is employed, which for the sake of simplicity, may be implemented with \(d^{i} = d(A_i, c^{(k)})=|Rep(A_{i})-c^{(k)}|\). Here, \(Rep(A_i)\) represents the representative value of the fuzzy set \(A_i\) [41]. The antecedents \({A_i}\) of those L rules with the smallest distance \(d^i\) are chosen. To simplify the computational process, the value of L can be set to the minimum, namely two [42], unless specified otherwise.

The next step of intermediate ANFIS generation is to identify the best reconstruction weights for the closest rules that were selected in the previous step. This is an optimisation problem, which can be solved by running the following:

$$\begin{aligned} w^{(k)}=\min \limits _{w^{(k)}}||c^{(k)}-\sum _{i \in \mathcal {L}}{Rep(A_i)w_{i}^{(k)}}||^{2},\ s.t.\ \sum _{i \in \mathcal {L}}{w_{i}^{(k)}}=1 \end{aligned}$$
(6)

where \(w_{i}^{(k)}\) implies the relative weighting of \(R_i\). Such an optimisation problem is subject to the constraint that the sum of all weights is equal to one. It can be shown that the solution to this problem is as follows:

$$\begin{aligned} w^{(k)}=\frac{G^{-1}\textbf{1}}{\textbf{1}^{T}G^{-1}\textbf{1}} \end{aligned}$$
(7)

where 1 indicates a column vector of ones, \(G=(c^{(k)}{} {\textbf {1}}^{T}-Y)^{T}(c^{(k)}{} {\textbf {1}}^{T}-Y)\) denotes the well-defined Gram matrix, and the chosen rule antecedents are shown in the columns of Y.

In order to emphasise on the information contained within the \(k^{th}\) cluster, the corresponding weight \(w^{(k)}\) is employed to modify both the antecedent and consequent parts concerned. This approach follows the general methodology adopted by typical FRI methods (e.g.,  [41, 43]), and can be expressed as follows:

$$\begin{aligned} R_{k}: if\ y\ is\ A_{k},\ then\ z_k=p_{k}y+r_k \end{aligned}$$
(8)

where \(k=1,2, \cdots \), C, with the parameters that describe the intermediate interpolated ANFIS being computed by

$$\begin{aligned} A_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}A_{i}}, \ p_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}p_{i}}, \ r_{k} = \sum _{i \in \mathcal {L}}{w_{i}^{(k)}r_{i}} \ \end{aligned}$$
(9)

2.2.4 ANFIS fine-tuning

The final stage of the ANFIS generation process involves fine-tuning the intermediate ANFIS that was generated through interpolation. At this stage, the interpolated ANFIS is used as the initial input for the fine-tuning process, which ultimately produces the final ANFIS on the target domain. The process is implemented by utilising the standard ANFIS training technique [17]. This approach has an important advantage over directly learning an ANFIS model from data, helping overcome the challenge of obtaining an working ANFIS with limited training data. This is because the interpolated intermediate ANFIS is exploited to serve as the initial setup for the expected network.

3 Implementational specifications

This section specifies the implementation of the proposed approach, for each of the feature selection, training, and testing phases.

3.1 Feature selection phase

To enable informative feature selection, supervised learning strategy is adopted here. A given HR image dataset \(I = \{1, 2, 3, ..., n\}\) is first resized to a consistent standard size of say, \(m \times m\) resolution. Then, scale down those HR images with a scale factor s, using bicubic interpolation to generate their corresponding LR images. After this, resize the LR image back to the size of \(m \times m\), which will make the LR image in its enlarged version containing missing detailed information due to low resolution. As such, two pairs of image datasets result, one for LR images and the other for HR images. Following the common practice in ISR, in preparation for subsequent processing, the resulting LR and HR images are transformed from the RGB space to YCbCr. In the resultant YCbCr space, each image now involves three components of Y, Cb and Cr.

Once the data are prepared as above, BRISK and KAZE features are extracted from the Y components of each image. Based on these features two different implementations can be made depending on which optimisation procedure is preferred to support the FRFS process. Nevertheless, independent of whether PSO-FRFS or GA-FRFS is utilised, there are a number of parameters that must be initialised with respect to the ISR application. The generic ones are specified as follows:

  • T-Norm: Łukasiewicz \((max(x+y-1, 0))\);

  • Implicator: Łukasiewicz \((min(1-x+y,1))\);

  • Dependency measure: Weak Gamma.

Further detailed parameters required in order to run the PSO-FRFS and GA-FRFS algorithms are specified below. These are set by following common practice in the literature, that is, through empirical studies, while considering the practical applications of the algorithms.

  • Parameters for PSO-FRFS: These include the individual optimal solution \(c_1 = 1\), and the global optimal solution that is shared amongst all particles \(c_2 = 2\). The maximum number of generations to evaluate is set such that \(Max generation = 50\), and the number of particles in the swarm is set to 100. The report frequency is set to Maxgeneration which means that the results will be generated after every 50 generations.

  • Parameters for GA-FRFS: In this algorithm, the population size which indicates the number of attribute sets is set to 20 and the maximum generation is also set to 20. The crossover probability C is set to 0.6, whereas the mutation probability M is set to 0.033. The report frequency is set to 20.

3.2 Training phase

The important features selected by FRFS are denoted as \(D_{lr}\) and \(D_{hr}\), respectively for those extracted from the LR and HR images. They act as input in the training phase. Suppose that the K-means algorithm is employed on the pre-processed dataset ‘\(D_{lr}\)’ to divide it into Q clusters. For instance, consider \(Q = 3\), thus \(D_{lr}\) is divided into three categories of feature values, say, ‘Small’ \((D_{lr}^s)\), ‘Large’ \((D_{lr}^l)\) and ‘Medium’ \((D_{lr}^m)\) for source domain 1, source domain 2 and target domain, respectively. From these, the source models \(ANFIS_1\) and \(ANFIS_2\) can then be trained with \(D_{lr}^s\) and \(D_{lr}^l\) datasets, assuming that sufficient training data is available to enable the use of the conventional ANFIS learning procedure.

Given the source models, the target model \(ANFIS_t\) can subsequently be interpolated using these two source ones, guided by the information contained within \((D_{lr}^m)\). Of course, an implicit assumption is herein made such that there is no sufficient data to perform conventional training of \(ANFIS_t\) directly, unlike the situations for the learning of \(ANFIS_1\) and \(ANFIS_2\). The step by step training process has been provided in Alg. 1. Otherwise, the ANFIS interpolation procedure is unnecessary.

3.3 Test phase

In this final phase of the proposed \(FRFS\_ANFISI\) approach, a test image (\(Y^t\)) is taken as an input to be converted into its corresponding HR image (Y). There are several steps to implement this phase as follows. First, following the common practice of the ISR literature, transform the given target RGB image \(Y^t\) into its counterpart in the YCbCr space. Then, its illuminance form, namely, the Y component is extracted from the YCbCr space. Next, transform Y into its corresponding LR image \(Y^l\) using a specified scale factor s (with the original serving as the standard for comparison in the test). From this, apply either PSO-FRFS or GA-FRFS to the resulting LR image to yield selected features, using the same specification as performed during the training phase. Finally, load all trained ANFIS models \(\{ A_i \}\) together with their corresponding cluster centres \(\{C_i\}\) that have been returned by the training phase.

Continuing with the illustration of the training phase, suppose that source domain 1 is for small-valued features, the target domain is for medium-valued features, and source domain 2 is for large-valued features. Then, in the test phase, ‘Small’, ‘Medium’ and ‘Large’ feature values of the test image (\(Y^t\)) are fed to the source domain 1, target domain, and source domain 3, respectively. Before reconstructing the eventual output image, post-processing is performed. For this purpose, non-local means filter (NLM) is employed to remove certain noise effects whereas the iterative back propagation (IBP) algorithm is utilised to refine the fine details of the image. After that, the final HR image is reconstructed for qualitative and quantitative analyses. Details about the above steps of the test phase are summarised in Alg. 2.

4 Experimental evaluation

The above specification has been realised in the following experimental investigations, in an effort to assess and reflect on the potential of the present approach.

4.1 Datasets

The experimental studies are divided into training and test phases. As with the common practice that existing research involves, the datasets employed for training in this study contain the standard set of images regarding people, animals, flowers, and other natural objects [13]. In addition, the standard ISR benchmark BSD200 [23, 44] images are also used together with natural images to create a variety of training datasets to train the ANFIS models, as done in the most recent work [13]. In total, the training data for the experimental analysis contains 275 images. Amongst them, 200 are from the BSD200 benchmark dataset, whilst the remaining 75 are natural images. For illustrative purposes, examples of the training images are shown in Fig. 3a.

To evaluate the performance of the proposed approach, standard benchmark datasets for ISR such as Set5 [21], Set14 [22], BSD100 [23] and Urban100 [24] are employed and visually depicted in Fig. 3b-e. As can be seen, the test images are generally rather different from those used for training.

Fig. 3
figure 3

Visualisation of benchmark super-resolution datasets for training and test phases

4.2 Experimental setup

The experiments were performed on a Dell laptop having Intel(R) Core(TM) i5-10300H CPU @ 2.5GHz (8 CPUs) and memory (RAM) of 8 GB, running the Windows 10 operating system. Other tools and software used throughout the experiments include MATLAB version R2020a and WEKA version 3.7.2.

In the experimental investigation, for comparative evaluation, three different models are implemented and run. Particularly, Model 1 (reference model) is trained by employing ANFIS with sufficient target data \((D_{lr}^m)\). To train Model 2 (ablation model), ANFIS without interpolation is employed on sparse data. For Model 3 (proposed model), the ANFIS interpolation is employed on the sparse data. To examine the impact of sparseness in the target domain, training data in the target domain are created by randomly deleting 95% of those from \(D_{lr}^m\).

4.3 Performance evaluation metrics

The popular performance evaluation metrics used for the ISR problems are the peak signal to noise (PSNR) and the structure similarity (SSIM). These are adopted here, details of which are summarised below for completeness. Generally, the larger the values of PSNR and SSIM, the better the performance in terms of the reconstructed HR image. Furthermore, an experimental analysis of computational complexity is also reported later.

4.3.1 Peak signal to noise ratio (PSNR)

PSNR computes the peak signal to noise ratio between a given pair of images, derived from mean square error (MSE) between the ground truth (original HR) image \(\textbf{Y}\) and the reconstructed (or estimated) HR image \(\hat{\textbf{Y}}\), as follows:

$$\begin{aligned} MSE=\frac{\parallel \textbf{Y}-\hat{\textbf{Y}} \parallel _{F}^{2}}{MN} \end{aligned}$$
(10)

where \(\parallel . \parallel _{F}\) denotes the Frobenius norm [45] of a matrix, M and N represent respectively the length and width of an image. The raw values of PSNR can be quite wide-ranging. To limit its values within a range that facilitates performance comparison, the following logarithmic decibel is used instead:

$$\begin{aligned} PSNR=10\log _{10}(\frac{V_{max}^{2}}{MSE}) \end{aligned}$$
(11)

where \(V_{max}\) denotes the maximal feature value of any image.

4.3.2 Structure similarity (SSIM)

This metric computes the similarity between the ground truth image \(\textbf{Y}\) and the reconstructed HR image \(\hat{\textbf{Y}}\), as follows:

$$\begin{aligned} SSIM=\frac{4\mu _{\textbf{Y}}\mu _{\hat{\textbf{Y}}}\sigma _{\textbf{Y}\hat{\textbf{Y}}}}{(\mu _{\textbf{Y}}^{2}+\mu _{\hat{\textbf{Y}}}^{2})(\sigma _{\textbf{Y}}^{2}+\sigma _{\hat{\textbf{Y}}}^{2})} \end{aligned}$$
(12)

where \(\mu _{\textbf{Y}}\) and \(\mu _{\hat{\textbf{Y}}}\) are the mean values; \(\sigma _\textbf{Y}\) and \(\sigma _{\hat{\textbf{Y}}}\) are the corresponding standard deviations of the pixel values within images \(\textbf{Y}\) and \(\hat{\textbf{Y}}\), respectively; and \(\sigma _{\textbf{Y}\hat{\textbf{Y}}}\) is the covariance of \(\textbf{Y}\) and \(\hat{\textbf{Y}}\). The values of SSIM range between -1 and 1, with a value of 1 indicating perfect similarity and a value of -1 indicating completely dissimilarity.

4.4 Experimental results

The experimental investigations are focussed on the evaluation of model performance, both quantitatively and qualitatively, supported with specific examinations of learning epoch variation and computational complexity.

Table 1 Average PSNR and SSIM results on Set5, Set14, BSD100 and Urban100 datasets

4.4.1 Quantitative evaluation

Table 1 shows the PSNR and SSIM measures. Note that there are 16 sub-experiments in total, four per dataset. The naming schemes used are basically self-explanatory. For example, regarding experiment number 6, the name ‘BRISK_GA_401’ indicates that features are extracted by BRISK and selected by GA-FRFS algorithm, with the last 3 digits signifying that 401 (out of 1600) most important features are selected.

Overall, it can be observed from the average PSNR and SSIM results (in Table 1) that the outcomes achieved by (proposed) Model 3 (which is associated with sparse data but supported with ANFIS interpolation) are either the same or almost equivalent to those obtained by Model 1 (which is the reference model trained with full data). Whilst it is expected that good performance can be attained using ANFIS interpolation. It is a positive surprise and extremely promising outcome that Model 3 can meet the performance of Model 1.

Particularly, PSO-FRFS offers more promising results than GA-FRFS. Indeed, using features extracted by BRISK, the PSNR and SSIM results of Model 3 (by employing PSO-FRFS) are equivalent to those of Model 1. Using KAZE-extracted features, the (PSNR or SSIM) performances of both PSO-FRFS and GA-FRFS are quite similar on certain datasets (especially, BSD100 and Urban100), whereas PSO-FRFS has a slight edge in some of the other datasets (Set5 and Set14). An additional interesting observation is that, while the KAZE features seem to be more descriptive in depicting the information contained within the images than BRISK ones, PSO-FRFS working with BRISK features can also produce the second-best results using Model 3 in a number of experiments (see case numbers 5, 9 and 13) on most datasets (Set14, BSD100 and Urban100). The collection of the quantitative evaluation results shows that the proposed approach is effective, especially when utilising KAZE-extracted features with PSO-FRFS.

4.4.2 Qualitative evaluation

As examples, the visual representation of the baboon image of Set14, the ‘image-002’ image of BSD100 and ‘image-096’ of Urban100 are presented in Figs. 45 and 6, respectively. Note that the (P/S) figures under each of the ISR-returned HR images stand for the PSNR (P) and SSIM (S) values achieved by the corresponding model concerned.

Fig. 4
figure 4

Super-resolution results for baboon image of Set14 by running PSO-FRFS or GA-FRFS with BRISK and KAZE features

Looking at Fig. 4 more closely, it can be seen that results other than that of running the combination of BRISK-extracted features and GA-FRFS, Model 3 produces visual images of the baboon with little distortion from those produced by Model 1, clearly beating those obtainable by Model 2. This shows the potential of applying ANFIS interpolation that is supported by feature selection, conforming to the quantitative findings discussed above.

Fig. 5
figure 5

Super-resolution results for ‘image002’ of BSD100 by running PSO-FRFS or GA-FRFS with BRISK and KAZE features

In general, Fig. 5 illustrates a similar pattern as Fig. 4. Interestingly, the results from the combination of BRISK with PSO-FRFS are very similar to those from that of KAZE FE with PSO-FRFS. These outcomes once again, reflect the quantitative performance measures and indicate the significance of PSO-FRFS, which appears to be less relying upon the choice of any specific feature extraction tool.

The achievements reflected by Figs. 4 and 5 are further reinforced by Fig. 6. In particular, it can be seen from the visual results that the images returned by Model 2 contain certain missing information and noise. Fortunately, the visual results of Model 3 are almost the same as those returned by the reference model (which are quantitatively confirmed by their corresponding PSNR and SSIM measures), demonstrating the great potential for the present approach.

Fig. 6
figure 6

Super-resolution results for ‘image096’ of Urban100 by running PSO-FRFS or GA-FRFS with BRISK and KAZE features

4.4.3 Computational complexity

Computational complexity is herein investigated empirically. For instance, Fig. 7 shows the time consumed by dealing with individual test images from the Set5 dataset. In particular, Fig. 7a presents the average time (in seconds) spent by PSO-FRFS via the use of BRISK or KAZE features and Fig. 7b does that by GA-FRFS. It is evident from these figures that using features extracted by KAZE consumes less time than that by BRISK irrespective of how FRFS is implemented. Interestingly, the time consumption is almost the same for both FRFS implementations. Very importantly, either way, the time consumption is practically light-weighted, considering the processing machine used is of rather low capacity, not specifically devised for complicated image analysis tasks.

Fig. 7
figure 7

Average time cost for ISR on Set5 [21]

Examining the results in more detail leads to the following observation that the highest consumed time is under 5 seconds for both child and bird images by the combination of BRISK and GA-FRFS. However, a more varying behaviour can be observed in terms of the run time cost for GA-FRFS, whilst PSO-FRFS behaves more consistently amongst different combinations. This is because GA-FRFS has to handle a larger number of features (25% of the total features) than PSO-FRFS (which only utilises approximately 5% of the original features). It is not surprising that Model 2 consumes less time than the other two because there is no interpolation employed whilst the data concerned are more restricted and hence, less pattern-matching based inference is carried out. Unfortunately, Model 2 is unable to perform so well as Model 3 (which as indicated earlier matches the performance of the reference model), with respect to either of the performance indices employed.

To reveal the time performance more explicitly, Table 2 lists the average computational test time of different models running on the Set5, Set14, BSD100, and Urban100 benchmark datasets. It can be seen that Model 2 outperforms Model 1 and Model 3 (which is again, as expected as stated above), while Model 3 produces the second-best results from this viewpoint. Since Model 3 runs over ANFIS interpolation despite involving sparse data it consumes a bit more time to perform the required task.

Note that the time consumption by (the proposed) Model 3 is generally less than that by (the reference) Model 1. Indeed, Model 3 systematically outperforms the reference model across all experiments except for two specific cases (namely, numbers 8 and 10, where Model 1 implemented with the combination of KAZE and GA-FRFS runs a bit faster than Model 3). This may reflect the exceptional situations in which random behaviours of the relevant models have gone beyond their statistically expected range. Particularly, the test time for the combination of KAZE and PSO-FRFS is the lowest for the Set5 and Set14 datasets, whilst that for GA-FRFS using KAZE features performs faster for complex datasets such as BSD100 and Urban100. These results imply that GA-FRFS is more suitable for complex datasets whereas PSO-FRFS is more computationally efficient for comparatively smaller datasets. While offering flexibility in developing different useful models, these results also provide an empirical methodological guideline for implementational choices of what method to use in practical problem-solving.

Table 2 Average time results on Set5, Set14, BSD100 and Urban100 by different methods

5 Conclusion

This paper has presented a novel approach for extending the recent ANFIS interpolation techniques to helping perform image super resolution (ISR) tasks. It is supported by efficient fuzzy rough feature selection (FRFS) algorithms to use just the most important features extracted from a given low resolution image. The paper has introduced two types of technique to reinforce the potential of FRFS, one employing PSO and the other utilising GA to improve the search efficiency of FRFS. Both can work with different feature extraction methods, as illustrated with the use of two popular mechanisms in the literature (namely, KAZE and BRISK). Together, they form a useful set of techniques to handle challenging ISR problems, with a flexibility of choice on the most appropriate solution when a certain type of image is involved for ISR. The resulting implementations are particularly helpful to address the difficult problems where there is insufficient or missing data for training.

Whilst the results are very promising, the present work also opens up an avenue for further studies. For instance, an initial attempt has been made to provide a methodological guideline for the overall system design, based on the present experimental investigations. However, a theoretical examination of why PSO-FRFS is computationally more efficient than GA-FRFS for smaller datasets and vice versa for complex datasets requires further study. One particular interest is, in addition to the current computational evaluation, to perform more rigorous analysis of the different combinations of implementation methods, through revealing their underlying mathematical properties. Also, the proposed FRFS-ANFISI approach is herein implemented for images of a small to medium scale. How it may be effectively developed to suit large-scale application problems, such as Martian image super resolution [12] remains active research. Another point to note is that recently, Context-Layered Recurrent Pi-Sigma Neural Network (CLRPSNN) [46] and Memory Recurrent Elman Neural Network (MRENN) [47] have been proposed to perform identification of non-linear systems. These networks are powerful; e.g., facilitated with a context layer, the joint effort of back propagation and Lyapunov-stability method makes CLRSPNN stable and faster for tuning, while MRENN contains a self-feedback feature. These significant characteristics may offer useful means to provide an alternative system modelling approach to the present use of ANFIS. This forms an interesting piece of future work.