Introduction

Remote sensing images contain real-time information and are used to identify accurate depiction in a variety of fields such as monitoring for military, meteorological institutions, and a variety of change in the environment etc. [1,2,3,4,5]. Remote sensing image interpretation is a challenging task for researchers that are continually working hard but additionally need to improve the quality of advanced algorithms and software to comprehend and accurately detect the region. These constraints of remote sensing image enhancement frequently include factors beyond human control, and it is also challenging to accomplish because of the existence of so many dense regions, and the limitations of human vision [6]. The segmentation approach may deal with the problem of precisely determining a region in a remote sensing image. The primary goal of segmentation is to provide a more detailed and informative visual [7]. Conventional image processing and classification approaches mainly consist of visual assessment and statistically based classification algorithms. Visual analysis is the most basic and fundamental classification method, with the advantages of simplicity, flexibility, and ease of extraction of spatial information. However, when interacting with complex images, visual analysis encounters issues with time-consuming classification methods and erroneous outcomes [8]. The statistical analysis-based classification approach primarily uses spectral characteristics of ground objects as categorization features, which frequently results in low classification accuracy [9].

With the rapid growth of big data innovation and artificial intelligence advancements, deep learning is capable of mining the features of complex remote sensing images via a multi-layer network, resulting in efficient and precise classification of images [10,11,12,13]. The deployment of deep learning technology in remote sensing images is currently an extremely topical subject in image analysis. To segment the hyperspectral images, a multi-scale super-pixel segmentation approach was employed, and the weighted multi-scale spatial spectral kernel was used in conjunction with the original spatial spectral kernel to carry out the classification of land resources [14]. Wang et al. also performed work on high-resolution remote sensing image classification [15]. Integrating a fully connected conditional random field and a support vector machine yields a high-resolution remote sensing image classification framework. Zhao et al. implemented a technique of developing change detection based on the Res-UNet model and introduced the concept of semantic segmentation in the field of remote sensing change detection [16]. Chen B et al. develop a remote sensing image analysis model based on a multi-level feature accumulation framework that increases depth feature extraction and up sampling feature combination, which is important for high-resolution remote sensing image analysis [17]. However, it should be emphasised that recently used remote sensing image capture tends to be complicated and high dimensional, and the existing analysis method’s framework structure is insufficient to satisfy its need for efficient computation and analysis [4, 18].

To address the aforementioned challenges, this paper presents a new remote sensing image analysis and classification approach that combines feature based fuzzy C-means (FBFCM) and extreme learning machine (ELM) classifiers. FBFCM-based technique determines the label pixel value assigned to each cluster as the peak value of the number of occurrences of gray values in each cluster. The peak value of gray value occurrence in the cluster is obtained based on the sub-histogram of each cluster and allowing the threshold value to be determined based on the density of pixels present.

In this paper, FBFCM-ELM proposed classifier is integrated into image analysis framework, which can ensure the effective improvement of framework analysis and performance to yield higher visual quality generated images that can help in better interpretation of remote sensing images. Furthermore, the resultant segmented image is translated to an RGB color space model to show realistic and clearly apparent segmented regions in the remote sensing image.

The rest of the paper is organized as follows: “Related work” provides a detailed description of the conventional clustering based approaches related to the remote sensing image segmentation and extreme learning machine. “Basic theory” describes the three important intensity, edge and entropy based clustering methods, and also explain the ELM network model in detail. The proposed FBFCM-ELM technique is described in “Proposed model structure”. The performance metric used is discussed in “Performance metric”. Simulation results obtained and ablation study in reference to the application of proposed FBFCM-ELM to remote sensing images are given in “Experiments”. Finally, “Conclusions” concludes the paper.

Related work

Clustering-based image segmentation

Image segmentation is the process of splitting a digital image into various areas or clusters, each of which is composed of sets of pixels. Image segmentation simplifies and modifies the representation of an image, thus making it more intelligible and easier to comprehend. Image segmentation is primarily utilized in remote sensing images to locate items of interest and boundaries such as edges and contours.

In the literature, there are number of image segmentation methods based on clustering are presented viz. the regional growth [19], watershed transformation [20], active contour model [21], mean shift [22], Graph Cut [23], spectral clustering [24], Markov random field [25], neural network [26], fuzzy clustering [27], and fuzzy C-means [28] etc. It is inferred that the accuracy of the detection of objects in clustering based segmentation methods are dependent on the cluster efficiency of its formation [29,30,31].

Fuzzy C-means (FCM) based clustering performed image segmentation of remote sensing image only on the basis of their pixel gray level values [1, 19, 20, 32,33,34,35]. They do not involves the geographical or spatial information into account when segmenting the remote sensing images. Thus, FCM based segmentation does not be able to retain the complete information in segmented image and it also lacks in locating the actual position of the objects present in the capture image [36,37,38,39,40,41]. This paper presents a novel remote sensing image segmentation method called feature-based fuzzy C-means (FBFCM), which performs remote sensing image segmentation using an efficient clustering approach to retain the maximum information and details present in the original remote sensing image by appropriately maximising the important features of the remote sensing image, including intensity, entropy, and edges.

Extreme learning machine

Extreme learning machine (ELM) is a feed forward neural network with a single hidden layer [42]. The input weights and biases of the hidden layers are randomly initialised, in contrast to the classic learning techniques based on back propagation (BP) neural networks and support vector machines (SVMs) [43, 44], and the output weights of the ELM may be directly evaluated. Additionally, the quadratic programming issue is solved by the ELM more efficiently than the standard SVM and the gradient-based algorithm. The regularised least-squares computation challenge was solved by the ELM initially. ELM offers the benefits of quick training and strong generalisability and can yield the most effective outcomes through one-time training. It is depicted from the literature that the ELM was employed by researchers for performing image segmentation, feature clustering, regression analysis, binary classification, and three-dimensional shape recognition, and multi-classification [45,46,47]. To obtain a remote sensing image with fine details may be achieved by employing learning technique which classify the low resolution and low contrast region inside the captured image. This task of classification process is performed in the proposed method using ELM as shown in Fig. 1.

Fig. 1
figure 1

Architecture composed of a feature based FCM for feature extraction and an ELM for fast learning

Basic theory

Clustering approaches

To enhance the details of a low resolution remote sensing images that has low brightness, contrast, and lighting. The clustering will be possible on three important features viz. intensity, edges, and entropy.

Intensity-based clustering

Intensity-based clustering comprises information of various shades that distinguishes and differentiates the objects, road, minerals, and river, and is highly valuable for target recognition. Remote sensing images are often poor quality, making it difficult to distinguish between distinct objects. In this study, we employed an 8-bit image with an intensity ranging from 0 to 255 varying shades that can distinguish different areas in the remote sensing image.

Edges information based clustering

The key problem for segmentation in distant sensing images is edge preservation. In general, edges have a high frequency component that is almost comprehensive after the clustering procedure. In the current circumstance, the edge information is preserved while clustering is implemented. It indicates the direction and changes noticed in the remote sensing image. Canny edge detector is used to achieve the goal of low error rate, more well localized edge points, and real edge point detection inside remote sensing image. True edge detection is essential in the case of poor resolution remote sensing images. Low light can cause edge discontinuity, which might be difficult to notice. In this case, the high edge quality proves useful.

Entropy-based clustering

It is common to notice that after performing a clustering operations, the amount of information contained in a remote sensing image decreases. The information in the remote sensing image is represented mathematically in terms of entropy. The cluster region’s entropy is given by

$$\begin{aligned} E=\sum _{j=1}^{N}p_j\log _2\left( {1 \over p_j}\right) , \end{aligned}$$
(1)

where \(p_j\) is the probability of occurrence of jth gray level in a cluster.

ELM network model

This section will provide a theoretical analysis of the basic ELM model. The ELM was developed to train single layer feed forward neural network (SLFNN), the most common artificial neural network structure. A usual SLFNN has three layers viz. an input layer, a hidden layer, and an output layer. In hidden layer, number of hidden neurons map input data to \(\psi _{h_n}\)-dimensional featured space, are randomly generated.

Let \(\Psi (\textrm{input})\in \mathbb {R}^{1\times \psi _{h_n}}\) to the output vector of hidden layer w.r.t input and output weight \(W \in \mathbb {R}^{\psi _{h_n}\times \psi _o}\) connects the hidden layer with the final output layer. Mathematically, the output of ELM network model is expressed as

$$\begin{aligned} \mathcal {F}(\alpha _i)=\Psi (\alpha _i)W, \quad i=1,2,3,\dots ,N . \end{aligned}$$
(2)

The weights of ELM network model are computed in a manner to minimize the sum of squared deviation of the prediction errors with the reference to the desired value i.e.

$$\begin{aligned}&\min _{W\in \psi _{h_n}\times \psi _o} \frac{1}{2}\Vert W\Vert ^2+\frac{\kappa }{2}\Vert \textrm{error}_i\Vert ^2 \nonumber \\&\qquad \text {such that~} \Psi (\alpha _i)W=O_i^\textrm{T}-\textrm{error}_i^\textrm{T}, \quad i=1,\dots ,N , \end{aligned}$$
(3)

where \(\textrm{error}_i\) is the error vector w.r.t ith training pattern and \(\kappa \) is the penalty coefficient on the training error.

The optimization function of the ELM can be expressed as

$$\begin{aligned} \min _{W\in \psi _{h_n}\times \psi _o} \frac{1}{2}\Vert W\Vert ^2+\frac{\kappa }{2}\Vert O-\Psi W\Vert ^2 , \end{aligned}$$
(4)

where \(\Psi =\left[ \Psi (\alpha _1)^\textrm{T},\Psi (\alpha _2)^\textrm{T},\dots , \Psi (\alpha _n)^\textrm{T} \right] ^\textrm{T}\). Thus, for given N training input data, optimization of ELM will be solve using Moore-Penrose (MP) generalized inverse matrix and is given as

$$\begin{aligned} \Psi W=O \implies W= \Psi ^{-1}O, \end{aligned}$$
(5)

where \(\Psi ^{-1}\) is the MP inverse matrix of \(\Psi \) matrix.

Fig. 2
figure 2

Flowchart of proposed FBFCM based segmentation

Proposed model structure

The clustering based remote sensing image segmentation technique distinguishes between cluster centers and cluster boundaries, which is critical to acquire high-quality discriminant data points for the purpose of classification in a way to retain maximum possible information and detail as present in the original image. In this section, FBFCM-based remote sensing image segmentation method is first proposed that execute clustering in keeping of considering three important attributes of remote sensing images viz. intensity, entropy, and edges.

Feature based fuzzy C-means (FBFCM)

In this section, FBFCM is presented and it’s flowchart is depicted in Fig. 2. with regards to these three important features-intensity, entropy, and edges, the suggested FBFCM approach achieves an efficient clustering. The Algorithm 1 depicts the processes involved in the proposed FBFCM approach employing on each attribute independently.

figure a

FBFCM Algorithm

The first step involve in FBFCM, is to initialize gray scale cluster centres for the input image based on the chosen attribute. The next step is to repeatedly split the acquired gray scale grouped into distinct clusters based on the degree of membership function values that minimize the given objective function at the optimal value as specified in Eq. 8.

These clustered sections provide considerable information in the form of grayscale values that can vary from underexposure to overexposure. The goal of segmenting remote sensing images is to efficiently distinguish or identify items such as land, sea, objects, and so on. These goals are readily attained by employing the FBFCM technique, which produces clusters with all conceivable gray scale pixel values. Because the FBFCM approach requires the identification of each cluster’s sub-histogram, Thus, the brightness level of each arbitrary cluster is determined by the peak value of that cluster’s sub-histogram that meets the fundamental requirement of minimizing the objective function given by,

$$\begin{aligned} J_{m}&=\sum \limits _{i=1}^{N} \sum \limits _{j=1}^{C} \omega _{ij}^{m}{\Bigg |\zeta _{i} -c_{j}\Bigg |}^2 , \end{aligned}$$
(8)

where m is any real number greater than 1. \(\omega _{ij}\) is the membership degree of \(\zeta _{i}\) in the cluster j. \(\zeta _{i}\) is the ith d-dimensional measured data, \(c_j\) is the d-dimension centre of the cluster. The label assigned to each cluster is determined by adaptively obtained values based on the cluster’s pixel values. Thus, each cluster has been allocated by their local maxima, allowing the maximal information existing in the image to be retained while also segmenting the image in such a way that objects may be easily identified.

Thus, the resulted image is transformed into three channel color space to form a color image to represent information that are clearly visible and identifiable, that can be used efficiently for further image analysis applications like determining the boundaries or shape of the objects and classification of different content present in the image.

FBFCM-ELM

The feature based clustering algorithm based on the attributes of remote sensing images mentioned in “Feature based fuzzy C-means (FBFCM)” can achieve efficient results by integrating the ELM classifier and the FBFCM, but the selected subset cluster is still redundant. To further improve the efficiency and accuracy, the appropriate attributes of the clusters are the next factor that must be considered. In this situation, correctly identified feature extraction is a major way to improve the efficiency after performing FBFCM. Based on the above idea, this paper further optimises and improves the previous classification model, and proposes the framework fusing the FBFCM and ELM, which is shown in Fig. 1

Each set of features has been clustered and classified independently. In the proposed approach FBFCM-ELM, FBFCM clusters pixel-wise features into foreground and background, edge and non-edge as well as homogeneous and non-homogeneous clusters. These two classes are applied for training and testing the ELM. Since, FBFCM employs a natural interpretation of the data points to be grouped. Based on the intensity levels, thus the background pixels are separated from the foreground pixels, and two clusters are generated. Half of the pixels in each cluster are used to train the ELM, while the other half are used to test it. ELM, like other classifiers, maps the data points (features) in higher dimensional space and develops an optimum decision border with the greatest margin from the data points. This yields the most accurate categorization results. The FBFCM is employed to construct two clusters of edge pixels as well as others depending on the pixel-wise values of the edges comprised in the remote sensing images. These clusters are used to train and test ELM to achieve better outcomes. The ELM output is brought together to create an entirely novel image exhibiting high proportion of details. The same technique is conducted using the remote sensing image’s entropy pixels. Clustering and classification of pixels based on entropy variation have been performed, and the image has been reconstructed. In the three color channels, the pixel-wise values of these three remote sensing image attributes, intensity, edge, and entropy, are combined to form a new image with increased visual perception abilities.

Performance metric

The quality of the resulting enhanced image using proposed FBFCM method and FBFCM-ELM has been examined using commonly accepted quantitative measures, which are presented below.

Metric with reference to FBFCM

Computation of signal to noise ratio (SNR)

One of the effective quantitative measurements for determining image or signal quality is the signal-to-noise ratio (SNR). It is calculated as the ratio of average signal power to average noise power [48]. It is mathematically represents as

$$\begin{aligned} \textrm{SNR} \, \text{(in } \text{ dB) } = 10{\log _{10}}\left[ \frac{{{P_{{\mathop \textrm{signal}\nolimits } }}}}{{{P_\textrm{noise}}}}\right] , \end{aligned}$$
(9)

where \(P_\textrm{signal}\) is the average value of the signal and \(P_{noise}\) is the noise variance. When the SNR ratio is greater than one (0 dB), the power of the signal is greater than the power of the noise, resulting in lesser distortion and less noise-induced interference.

Computation of mean square error (MSE)

MSE is defined as the error square expectation which computes the average of the squares of the errors. This error is perhaps the difference between quantity of desired I(ij) and quantity estimated \(\hat{I}\)(i, j) for each pixel of the image of size \(N \times N\) [49] is given by,

$$\begin{aligned} \textrm{MSE} =\frac{\sum \nolimits _{i=1}^{N} \sum \nolimits _{j=1}^{N} {{{\Bigg |{I(i,j) - \hat{I}(i,j)} \Bigg |}^2}}}{N^2}. \end{aligned}$$
(10)

The MSE is a method for determining the best estimation value. The optimal value of MSE is zero.

Computation of peak signal-to-noise ratio (PSNR)

PSNR is the ratio of the highest possible signal strength value to the noise strength of the signal that impacts the quality of its representation. It was usually expressed in decibels (dB). and is provided by:

$$\begin{aligned} \textrm{PSNR} = 10{\log _{10}}\left( \frac{{\textrm{MAX}^{2}}}{\textrm{MSE}}\right) , \end{aligned}$$
(11)

where MAX is the highest pixel intensity value, which for an 8-bit image is 255.

Computation of mean absolute error(MAE)

MAE computes the absolute difference between two images i.e. sample image (I) and enhanced image \((\hat{I})\) [32]. It is given as

$$\begin{aligned} \textrm{MAE} = {{\sum \nolimits _{i=1}^{M} \sum \nolimits _{j=1}^{N} {\Bigg |\hat{I}(i,j) - I(i,j)} \Bigg |}\over {M \times N}}. \end{aligned}$$
(12)

Computation of cube root mean enhancement measure (CRME)

Since a color image is a multispectral signal, it is fair to extend the usual method of grayscale contrast measurements such that it may be used to quantify multidimensional color image contrast. To be more explicit, contrast evaluations may be done not just inside each color plane, but even across color planes. The cross plane contrast represents the differences between color planes and displays color variation and structural variations caused by distinct color components. Based on this concept, researchers created the color/cube root mean enhancement (CRME) to quantify the relative difference between the color cube centre and all of its neighbours in the current color cube, which is stated as [50]

$$\begin{aligned} \textrm{CRME}={1000\over k_{1}k_{2}}\sqrt{\sum _{i=1}^{k_{1}} \sum _{j=1}^{k_{2}} \left( { \log \vert \ I_{i,j}-{i_{c1}+I_{c2}+\cdots +I_{cn}\over n}\vert \over \log \vert \ I_{i,j}-{i_{c1}+I_{c2}+\cdots +I_{cn}\over n}\vert }\right) ^\alpha }, \end{aligned}$$
(13)

where the image is divided into \(k_{1}k_{2}\) blocks, \(I_{i,j}\) is the center pixel intensity in block i, j, \({i_{c1}+I_{c2}+\cdots +I_{cn}\over n}\) is the average intensity in block i, j. n is the total number of pixel within each block.

Computation of color image quality index (CIQI)

Color image quality index (CIQI) combined the chrominance information along with sharpness and contrast of the color image and is expressed as [18]

$$\begin{aligned} \textrm{CIQI}=&c_{1} \times \textrm{CIQI}\_{}\textrm{colorfulness}+c_{2}\nonumber \\&\quad \times \textrm{CIQI}\_{}\textrm{sharpness} +c_{3} \times \textrm{CIQI}\_{}\textrm{contrast} \end{aligned}$$
(14)

where

$$\begin{aligned}{} & {} \textrm{CIQI}\_{}\textrm{colorfulness}=\frac{\sqrt{\sigma _{\alpha }^{2} +\sigma _{\beta }^{2} } +0.3\sqrt{\mu _{\alpha }^{2} +\mu _{\beta }^{2} }}{85.59},\\{} & {} \textrm{CIQI}\_{}\textrm{sharpness}=1-{\left( 1-\frac{(\mathrm {tep_{estimated}}-\mathrm {tep_{proposed}} )}{\mathrm {tep_{proposed}}} \right) }^{0.2},\\{} & {} \textrm{CIQI}\_{}\textrm{contrast}=\text {max} \left( {\textrm{local}}\_{}\textrm{contrast}=\frac{\sum \nolimits _{i=9}^{15} {\mathrm {Bond_{i}}}}{\sum \nolimits _{i=1}^{8} {\mathrm {Bond_{i}} }}\right) \end{aligned}$$

and \(\sigma _{\alpha }^2, \sigma _{\beta }^2, \mu _{\alpha }, \mu _{\beta }\) represent the variance and mean values along the two opponent color axes. \(\mathrm {tep_{estimated}}\) denotes number of edge pixels estimated; \(\mathrm {tep_{proposed}}\) denotes number of edge pixels counted using proposed FBFCM-ELM; \(\textrm{Bond}_i\) is the ith coefficient of the total 15 bands of 8\(\times \)8 blocks of DCT coefficients. \(c_1, c_2, c_3\) are weighted coefficients.

Metrics with reference to FBFCM-ELM

Accuracy, F1 score, and G-mean were popular measures for assessing performance of the classifier in machine learning.

Accuracy

Accuracy presented the percentage of data testing correctly classified. Mathematically, accuracy is expressed as

$$\begin{aligned} \text {Accuracy} = \frac{(\text {TP} + \text {TN})}{(\text {TP} + \text {FN} + \text {FP} + \text {TN})}, \end{aligned}$$
(15)

where TP represents true positive) and TN represents true negative respectively stated the number of samples in positive class and negative class correctly classified. FN represents false negative and FP represents false positive respectively which describes the number of samples in positive class and negative class incorrectly classified.

Precision (P), recall (R), G-mean and F1 score

Precision is expressed by dividing the total number of positive pixels predicted using the model by the number of true positive predicted pixels, and it represents the proximity that the confirmed limitations to the reference limits. The proportion of correctly recognized reference limitations has been calculated by dividing the number of true positive predicted pixels over the total number of positive pixels that should have been detected. The F1 score was calculated as the harmonic mean of accuracy and recall. G-mean, or geometric mean, was calculated by taking the squared root of accuracy and recall, which was stated mathematically as

$$\begin{aligned} \text {F}1 = \frac{2}{(\text {Precision}^{- 1} + \text {Recall}^{- 1})}, \end{aligned}$$
(16)

where

$$\begin{aligned} \begin{array}{l}\text {Precision} = \frac{\text {TP}}{(\text {TP} + \text {FP})} \\ \text {Recall} = \frac{\text {TP}}{(\text {TP} + \text {FN})} \\ G\text {-mean} = \sqrt{(\text {Precision} \times \text {Recall})} \end{array}. \end{aligned}$$
(17)

Other derived metrics are MCC, kappa, and AUC.

MCC

MCC was majorly used on image processing as a performance metric. MCC value can be expressed mathematically as

$$\begin{aligned} \text {MCC} =\displaystyle \frac{\text {TP} \times \text {TN-FP} \times \text {FN}}{\sqrt{((\text {TP} + \text {FP})(\text {TP} + \text {FN})(\text {TN} + \text {FP})(\text {TN} + \text {FN}))}}. \end{aligned}$$
(18)

It is observed that the MCC value lies in the range \([-1,1]\). if either sum of TP and FN or sum of TP and FP or the sum of of TN and FP, or sum of TN and FP becomes zero, then MCC was unrecognisable.

Kappa

Kappa is a metric to measure the strength of inter-rater agreement. In a categorical data analysis, is mathematically expressed as

$$\begin{aligned} \kappa = \frac{\text {p}_0 - \text {p}_{\text {e}}}{1 - \text {p}_{\text {e}}}, \end{aligned}$$
(19)

where \(p_0= \frac{(\textrm{TP} + \textrm{TN})}{(\textrm{TP} + \textrm{FN} + \textrm{FP} + \textrm{TN})}\), \(p_e = p_{.1} p_1 + p_{.2} p_2\). with \(p_{.1} p_1 = \frac{(\textrm{TP} + \textrm{FN}) (\textrm{TP} + \textrm{FP})}{(\textrm{TP} + \textrm{FN} + \textrm{FP} + \textrm{TN})^2}\) and \(p_{.2} p_2. = \frac{(\textrm{FP} + \textrm{TN}) (\textrm{FN} + \textrm{TN}) }{(\textrm{TP} + \textrm{FN} + \textrm{FP} + \textrm{TN})^2}\).

When the disagreement cell was zero, the maximum Kappa value was one. Kappa was negative when the number of values in the disagreement cell was greater than the number of values in the agreement cell.

AUC

AUC is another prominent statistic for assessing classifier performance. AUC was calculated by plotting the false positive rate (FPR = FP/FP + TN) in the x axis and the true positive rate (TPR = TP/TP + FN) in the y axis on a ROC (receiver operating characteristic) curve.

Experiments

TensorFlow, a Google Colab framework focused on high-performance digital computing, was used to carry out distributed image processing. The adoption of Google Colab, a fast and flexible accessible information platform, remarkably decreases the time needed to train our ELM on massive remote sensing images. The datasets used to assess the efficacy of the proposed FBFCM approach are described in this section. Following that, the training environment was explained, which is built on a TensorFlow and Google Colab platform. The third component of this section will present the ablation study of the proposed FBFCM-ELM implementation stages. Finally, the proposed approach was put through its paces in terms of training duration and classification accuracy.

Dataset

The EuroSAT collection of remote sensing images was used for this investigation, which covers vast geographical areas and contains many various dynamics of land cover (viz. mountain, urban, water, plant, soil, and road). Each image-set contains a number of images that depict naturally occurring substances in classified satellite imagery. Each image-set is made up of a number of remote sensing images that depict objects retrieved from classified remote sensing images as shown in Fig. 3a. The dataset is separated into two sets, the first of which is used to train our FBFCM-ELM to predict the correct classification, and the second is used to assess the trained network’s accuracy. After setting the FBFCM-ELM model, Google Colab is set up for training. The experimental study is performed on set of 12,650 images and arbitrarily shown for a five satellite images displayed in Fig. 3 to validate the efficiency of the suggested technique.

Fig. 3
figure 3

Remote sensing segmented images obtained using FBFCM along with K-NN, RF, DT, EVC, SVM classifier and proposed FBFCM-ELM

Methodology used

The simulation was completed successfully using MATLAB 2022b on an Intel 64 bit CPU I3, and the findings were presented. To demonstrate the efficacy of the proposed FBFCM-ELM approach, it was examined on different type of remote sensing images and compared to traditional and state-of-the-art methods. Different assessment parameters are generated to illustrate and compare the quantitative performance of each method’s results. Signal to noise ratio (SNR), mean squared error (MSE), peak signal to noise ratio (PSNR), mean absolute error (MAE), contrast based root mean enhancement (CRME) and color image quality index (CIQI) are the performance metrics obtained and summarized in Tables 1, 2, 3, 4, 5 and 6 respectively.

Fig. 4
figure 4

Remote sensing image segmentation of the test images, proposed FBFCM-ELM images in gray and RGB colorspace

Results analysis

The first row of Fig. 3 shows five arbitrary low contrast remote sensing test images selected from NASA’s Earth Observatory dataset. The images acquired after performing feature based fuzzy C-means K-nearest neighbor (FBFCM-K-NN) are shown in the second row of Fig. 3, while those obtained with feature based fuzzy C-means Random Forest (FBFCM-RF) are shown in the subsequent row of Fig. 3. The FBFCM-K-NN and FBFCM-RF images clearly demonstrate that due to variations in the average intensity of test images, images are either over or under enhanced. The presence of the increase in dynamic range and smoothing processes is the source of artifacts in the resulting images using FBFCM-K-NN.

The fourth row of Fig. 3 shows the resulted enhanced and segmented remote sensing images using Feature Based Fuzzy C-means decision tree (FBFCM-DT) method. The FBFCM-DT method is not able to perform good segmentation in the case of large dynamic range of low resolution remote sensing images as seen in the resulted images shown in Fig. 3.

The next rows of Fig. 3 illustrate the results of the feature based fuzzy C-means ensemble voting classifier (FBFCM-EVC) remote sensing images and the results of the feature based fuzzy C-means support vector machine (FBFCM-SVM) based images. The segmented images improved greatly when compared to the FBFCM-DT approach, although these algorithms resulted in excess segmentation, as can be seen in the segmented image.

Table 1 Performance metric SNR values
Table 2 Performance metric MSE values
Table 3 Performance metric PSNR values
Table 4 Performance metric MAE values
Table 5 Performance metric CRME values
Table 6 Performance metric CIQI values

The images obtained with the proposed FBFCM-ELM method is shown in the last row of Fig. 3. The images obtained using the proposed method perform significantly better than FBFCM-SVM and other methods depicted in Fig. 3. The images obtained using the proposed method have a higher brightness level in the high intensity region than the other methods, although the intensity value in the low intensity region increases in the other cases. In compared to the approaches presented, these images have been enhanced and feature excellent homogeneous segmented regions obtained.

Fig. 5
figure 5

Comparison of a MSE values and b MAE values using proposed method with state of art methods

Quantitative analysis

The enhancement and segmentation assessment metrics SNR, MSE, PSNR, MAE, CRME and CIQI were reported in Tables 1, 2, 3, 4, 5 and 6 respectively. These six quantitative performance indicators compared the performance of proposed and traditional segmentation approaches. The low value of MSE in the case of the proposed method is represented in Table 2, which shows that the resultant clustered remote sensing image achieved using the proposed FBFCM-ELM method is considerably closer to the reference image than conventional techniques as seen in Fig. 5a. Similarly, two more metrics are used to measure the enhancement of remote sensing images are Contrast based root mean enhancement (CRME) and the color image quality index (CIQI). The remote sensing image formed with proposed method are within the quality range of standard remote sensing image.

The proposed method achieves a signal to noise ratio of 67.19 dB, 59.83 dB, 57.39 dB, 62.08 dB, and 59.81 dB for distinct five sample test images, which is greater than the FBFCM-K-NN, FBFCM-DT, FBFCM-RF, FBFCM-EVC and FBFCM-SVM methods listed in Table 1 and clearly depicted in Fig. 6a.

Similarly, higher PSNR and lower MAE values of the proposed FBFCM-ELM technique are shown in Tables 3 and 4 in contrast to previously mentioned segmentation and classification strategies. The lower MAE value in Fig. 5b and higher PSNR value achieved with the proposed approach in Fig. 6b demonstrate that the resultant image is significantly closer to the reference remote sensing image.

As a result, it is apparent that the proposed approach outperforms other conventional strategies. Image improvement based on entropy, edges, and intensity is performed to improve the visibility of segmented regions even further. As a result, each image is segmented individually using the FBFCM-ELM technique. Following that, enhanced segment images in the RGB color domain are fused in proposed FBFCM-ELM. As illustrated in Fig. 4, the resulting fused remote sensing image has more information, edges, and intensity. The first column of Fig. 4 depicts the test images, the second column depicts the enhanced segmented remote sensing image using the proposed method, and the last column depicts the fused segmented image that has colored segmented remote sensing images that retain maximum details, enhanced homogeneous regions, and is perfectly capable of detecting shapes. The resulting fused images can be widely used for a number of applications.

The research presents a novel approach for segmenting remote sensing images based on entropy, intensity, edges, and further into clusters with homogeneous areas to retain the content of the original remote sensing image so that its appearance is comparable to its natural appearance.

Fig. 6
figure 6

Comparison of a SNR values and b PSNR values using proposed method with state of art methods

Table 7 Performance metric computed after changing the classifier
Table 8 Computed parameters under different activation function employed in proposed FBCM-ELM

Evaluation of the proposed FBFCM-ELM approach

The performance of the proposed FBFCM-ELM employed to carryout the remote sensing image classification is confirmed in terms of accuracy through comparison with five classifier models along with FBFCM feature extractor, viz. K-nearest neighbor (K-NN) [35], random forests (RF) [51], decision tree (DT) [52], ensemble voting classifier (EVC) [17], and support vector machine (SVM) [43]. In Table 7, the seven parameters are used for classification of remote sensing images are accuracy (OA), recall, precision, F1 score, MCC, Kappa values and AUC are presented. Results demonstrate that the proposed method offers better accuracy for the classification of the remote sensing image with a value of 99.89% for OA and 0.67 for Kappa. The proposed FBFCM-ELM algorithm has the best value in comparison to the other state of art methods mentioned.

Ablation studies of the proposed FBFCM-ELM model

The ablation investigations in the present article are carried out on the proposed FBFCM-ELM model by changing and making adjustments to the values of key parameters and subsystems. Three distinctive changes of the parameters employed in the proposed framework are used in this study. These are as follows: (1) changing the classifier in the FBFCM-ELM model; (2) varying the number of neurons in the FBFCM-ELM model’s ELM classifier; and (3) changing the activation function in the FBFCM-ELM model’s ELM classifier. All of these variations are implemented using the FBFCM model as a feature extractor and the ELM as a classifier. The goal of this extensive investigation is to find and pick just those characteristics that lead the utilized model to yield the best or optimum results. We now present our compiled results for these variations below:

Table 9 Varying the number of neurons in the ELM classifier of proposed FBFCM-ELM
Table 10 Computed results obtained from two ablation studies

Changing the classifier used in the FBFCM-ELM model

The proposed FBFCM model serves as the feature extractor, and the Table 7 aggregates and quantifies the classification accuracy when employing different classifiers. It has been noted that when ELM is employed as a classifier approach, the greatest degree of accuracy is attained.

Changing the activation function in the ELM classifier of the FBFCM-ELM model

This investigation was also carried out in the article by being chosen the optimal activation function from a pool of prospective functions. The results of this analysis are compiled in the Table 8. The leaky_relu function has been determined to be the most reliable function, with an accuracy of 99.89%.

Changing the number of neurons in ELM classifier

As the ELM is eventually chosen as the classifier for the design that was proposed, the number of neurons in this single-layer feed-forward neural network should be varied. This number has been modified for this purpose, as illustrated in Table 9. It has been determined that 99.89% accuracy is attained with 1024 neurons, which remains unaltered as the number of neurons becomes higher.

It is clearly evident from the results shown in the Tables 7, 8 and 9, that the FBFCM-ELM combination yields the best performance when the appropriate settings have been selected and adapted. The best-chosen parameters from each of the aforementioned two ablation studies are listed in Table 10. In consideration of these facts, it concludes that the proposed FBFCM-ELM combination provides an outstanding accuracy of 99.89% for the images included in the NASA Earth Observatory dataset.

Conclusions

The proposed paper thoroughly investigates an important remote sensing image segmentation method. In the present work, several transfer learning models are tested in conjunction with a series of classifiers such as ELM, SVM, RF, DT, K-NN and Ensemble Classifier. The detailed performance evaluation is carried out through ablation studies by calculating the testing accuracy used for feature extraction and the classification task. The entire work is executed in three phases. The unique feature of using FBFCM with ELM classifier is proved to be the best in terms of testing accuracy (99.89%) for classification of remote sensing images of the NASA Earth Observatory dataset. This proves that the proposed combination of FBFCM-ELM architecture is not only the best among the selected ones but is also fit to exhibit real time capabilities. The proposed technique might potentially be applied in biological signal processing for accurate localization and detection.