1 Introduction

A W-operator is an image transformation that is locally defined inside a window W, invariant to translations (Benalcázar et al. 2015). Let W be a window that defines a neighborhood of each pixel to process, a W-operator labels each pixel based only on the values observed within the window neighborhood W. The simplest case is when the window is reduced to a single pixel (Gonzalez and Faisal 2019; Hirata and Papakostas 2021). The machine learning-based approach to designing W-operators consists of estimating the W-operator from collections of input–output image pairs, called training data, that describe the result of the desired transformation. In general terms, learning takes place from patterns or configurations collected from the input images, the training set, and the corresponding values of the points to be analyzed in the ideal images (Hirata and Papakostas 2021; Barrera et al. 2022). The collected patterns determine an estimate of the probability of occurrence of the pair, configuration-output value, that is used to define the W-operator. In this way, the representation of the W-operators is a decision table formed by patterns called observation vectors and their corresponding estimated labels (Guevara et al. 2019).

All the possible observation vectors collected through a window must be represented in the table and must have an associated output value, even those configurations that do not appear in the training images since they may later be present in other images different from the training ones. In this case, the operator must be able to assign a value to them, that is to say, it must be able to generalize (Benalcázar et al. 2012). However, to complete the table with all the possible configuration vectors, large amounts of training images would be needed but in practice are finite and limited. On the other hand, increasing the window size leads to an exponential increase of the search space and the lack of training images prevents the table being completed with all the possible configurations vectors. For example, with 256 gray levels and a \(3\times 3\) window, the size of the search space, i.e. the decision table, is equal to \(256^{3\times 3}\). The finite and limited number of training images in addition to the exponential increase of the search space when the size of the window increases, both emphasize the generalization problem.

Some works propose several techniques to solve the generalization problem. In Hirata Junior et al. (2002) and Chlapinski and Ciota (2009) the authors use pyramidal multiresolution and aperture to restrict the spatial domain of the windows and the range of gray levels in order to reduce the search size in the designed tables. They apply their proposal in deblurring while in Hirata Jr et al. (2015) authors apply the techniques mentioned above for eyes segmentation on a human face. Other solutions are presented in Benalcázar et al. (2014) and Benalcázar et al. (2015) where W-operators are designed using aperture and feedforward neural networks that model the conditional probability of each observation vector.

On the other hand, in Comas et al. (2014) membership functions are implemented to represent knowledge in a mathematical language based on fuzzy sets theory. Fuzzy sets (FS) take care of the imprecision and the vagueness in human understanding systems and provide a framework to describe, analyze, and understand vague and uncertain events. FS give a theoretical framework to model gray level images because of their imprecision and, also, for predicting unknown values. The imprecision is due to the ambiguity in the gray levels which is generated in the process of capturing the image and the spatial ambiguity caused by the imprecision at the boundaries of objects or the edges in the image. In this way, each region of the image can be modeled as a fuzzy set where the membership function assigns to each pixel a membership degree in the range [0,1] to solve the ambiguity in the image scene (Acharya and Ray 2005). FS theory is applied in different areas related to image processing. In Huang and Wang (1995), Cheng et al. (1997), Chaira and Ray (2004), Aja-Fernández et al. (2015), Mahajan et al. (2021), membership functions play an important role in finding one or more appropriate threshold values for image segmentation, determining the relationship of a pixel with its membership region. Clustering algorithms are another important task in which FS theory has achieved a good performance. For example, in fuzzy C-means algorithm (FCM) where pixels belong to various clusters with varying membership degrees. This algorithm has been modified in Zhang and Chen (2004); Yang et al. (2009); Sing et al. (2015); Adhikari et al. (2015) to improve its robustness in the segmentation area.

In this paper, we present a solution to address the generalization problem encountered in the automatic design of W-operators for grayscale images using membership functions. Our method introduces several novel aspects:

  • The capability to determine the optimal dimension of the W-operator and the set of weights assigned to each pixel in the image for distinguishing a discrete target set of classes.

  • Membership functions are utilized to assign membership degrees to each observation vector not within the domain of the W-operator. Consequently, classes are assigned based on their membership degrees.

  • The choice of the type of membership functions employed depends on the nature of the data; that is, any type can be used depending on the dataset.

  • We propose the application of this methodology to brain MRI. Nevertheless, it can be applied to any type of image where the segmentation of regions of interest is desired.

The remaining sections of this paper are structured as follows. Section 2 provides an overview of W-operators and membership functions. Section 3 introduces the proposed methodology. Section 4 demonstrates an application of the methodology in brain MRI segmentation. The results and discussion are presented in Sect. 5 and a comparison between the proposed method and other MRI segmentation methods is developed in Sect. 6. Finally, conclusions are drawn in Sect. 7.

2 Theoretical framework

In this section, we present some theoretical definitions that provide background and describe the proposed approach.

2.1 W-operators

Digital images can be represented by a function \(f:E\rightarrow L\), where \(E=Z^2\) and \(L={0,\ldots ,l-1}\) denotes the set of gray levels of the image. For binary images \(l=2\) and for grayscale images \(l=25\) (Hirata and Papakostas 2021). If the set of all images defined on E with gray levels in L is denoted as \(L^E\), an image operator is any mapping of the form \(\Psi :L^E\rightarrow L^E\) (Montagner et al. 2016).

The W-operators are a particular case of image operators, that label each pixel of the image based only on the values observed within the window neighborhood W. More information about the functional properties of W-operators can be found in Benalcázar et al. (2012, 2015).

The automatic design of W-operators consists of two stages: a training stage and a testing stage. In the training stage, the W-operator called \(\mathbf {\Psi }\), is designed. Thus, a set of pairs of training images (OI) is considered, where \(O:E\subset Z^2\rightarrow \{0,1,\ldots ,255\}\) represents the observed images, \(I:E\subset Z^2\rightarrow \{1,\ldots ,c\}\) represents the ideal images and c is the number of classes. The W-operator \(\mathbf {\Psi }\) is defined as a classifier

$$\begin{aligned} \mathbf {\Psi }:{\varvec{X}}\rightarrow \{1,\ldots ,c\}, \end{aligned}$$
(1)

which maps an observation vector \(X=(x_1,\ldots ,x_k)\) to one of the labels or classes of the set \(\{1,\ldots ,c\}\). An observation vector \(X=(x_1,\ldots ,x_n)\) is a vector composed of n values, with \(x_k\in L\), where each value is defined by \(x_k=f(t+w_k)\) (Montagner et al. 2016).

The window W is translated pixel a pixel through the images O and I, simultaneously (Benalcázar et al. 2012). The values of the window W inside the image O generates the observation vector X (see Fig. 1) and the value of the central pixel of the window W in the image I gives the label or class. The window W is translated through the entire set of training images. Each time an observation vector appears, the corresponding label frequency in the table is increased. Finally, the label of each configuration vector is estimated based on the highest frequency value. As we do not always have large amounts of training images, it is difficult to obtained a complete configuration table. Figure 2 shows, as an example, the construction of this configuration table.

Fig. 1
figure 1

Visualization of an observation vector made up of the values within a window

Fig. 2
figure 2

Design of a W-operator using a window W and a pair of training images

Finally, in the testing step, the error of the W-operator designed is estimated and its predictive capacity is evaluated.

2.2 Membership functions

Fuzzy logic is a tool that allows to represent knowledge in a mathematical language through the fuzzy sets theory and their membership functions (Comas et al. 2014). The FS theory provides a framework for describing, analyzing and interpreting vague and uncertain events, i.e., to model the imprecision and vagueness existing in human understanding systems. A grayscale image has ambiguity related to the process of capturing images and the spatial ambiguity caused by the imprecision in the objects’ boundaries. Each region in the image can be modeled as a FS defined by a membership function, assigning to each pixel a membership degree in the range [0,1] (Acharya and Ray 2005). Formally, a membership function for a fuzzy set F included in X (universe of discourse), is a map \(u(x):X_F\rightarrow [0,1]\), where each element of X is mapped to a value between 0 and 1. This value quantifies the grade of membership of the element in X to the fuzzy set F.

In the context of the automatic design of W-operator, a membership function is defined for each class using the training images. These functions will assign membership degrees to each element of an image within a window, giving rise to a fuzzy window or fuzzy observation vector. Thus, a label according to their membership degrees is assigned to each vector not present in the table designed in the training stage (Robalino et al. 2020). Therefore, membership functions solve the generalization problem present in the automatic design of W-operators for grayscale images.

Fig. 3
figure 3

Gaussian membership functions for classes 1 and 2 based on a pair of training images

3 Proposed methodology

Due to the lack of training images there will be vectors that will not be found in the configuration table. This lack of predictive capacity of the W-operator designed for the configuration vectors not registered in the training stage is called “generalization problem”. This problem increases the error of the W-operator leading to poor results. We propose the use of membership functions to solve the lack of information produced by the null or low frequency of the observation vectors in the table.

3.1 Definition of membership functions

The membership functions \(u_{j}\), for all \(j \in \{1,\ldots ,c\}\), are chosen according to each problem and to the nature and type of the data. The construction of membership functions that adequately capture the meanings of the variables has been addressed by several authors (Mendel and Wu 2010; Klir and Yuan 1995). Membership functions can be represented in multiple ways. Due to their mathematical simplicity, the most common are: triangular; trapezoidal, Gaussian, sigmoidal, gamma, among others (Medaglia et al. 2002). Conceptually, there are two approaches to determine the membership function associated with a set. The first approach is based on expert knowledge and the second approach uses a collection of data to design the function. This last approach is used in the automatic design of W-operators, to define the membership functions from a set of training images. Figure 3 shows an example of definition of Gaussian membership function for two classes, from a pair of training images.

In Sect. 4.2 an example of how to determine membership functions based on the problem is introduced.

3.2 Design of W-operators by membership functions

The representation of the W-operators is a large decision table where all the possible configurations or observation vectors collected through a window must be represented. Each observation vectors must have an associated output value, even those configurations that do not appear in the training images since they may later be present in other images different from the training ones. In this case, the operator must be able to assign a value to them, that is to say, it must be able to generalize. In this paper, we propose the use of membership functions to solve the generalization problem present in the automatic design of W-operators for gray level images. The membership functions assign membership degrees to each observation vectors not present in the domain of the W-operator.

Each observation vectors is searched in the table generated in the training stage. If the vector appears in the table, then the corresponding label is assigned. Otherwise, the membership functions are used to assign the label. Let \(R_{j}\) the gray levels range of the the class j, i.e., \(R_{j} = [g_{j_{min}},g_{j_{max}}]\) where \(g_{j_{min}}\) and \(g_{j_{max}}\) are the minimum and maximum gray level in the j-class, for all \(j \in \{1, \ldots , c \}\). To calculated the membership degree of the vector X to the corresponding class j, first the membership functions are selected from the central pixel \(x_{central}\). If the value of the central pixel \(x_{central}\) belongs to \(R_{j}\) then the membership function \(U_{j}\) is applied to the observation vector X. If the value of the central pixel \(x_{central}\) belongs to the range of two or more classes, the membership functions of those classes are applied to each of the gray values of the observation vector, and then the average of the degrees of membership is calculated as shown in the following equation:

$$\begin{aligned} U_j (X)=\frac{\sum _{i=1}^{k} u_j (x_i)}{k} \end{aligned}$$
(2)

where k is the size of the observation vector X. The assignment of a label to the observation vector will depend on the maximum of the degrees of membership of the analyzed classes.

In the next section we apply our proposed approach to a MRI segmentation problem to test its performance.

4 Application to MRI segmentation

The proposal for design W-operators using membership functions was applied to magnetic resonance images (MRI) segmentation. One of the advantages of MRI is its ability to discriminate various types of tissues for subsequent quantification and, thus, help in the diagnosis of different pathologies Segmentation of these types of images is a constant requirement in medical science (Meschino et al. 2008). However, one of the main difficulties when working with these images is the overlap between the ranges of gray levels in their tissues, generating fuzzy boundaries between their tissues. This problem increases when MRIs are corrupted by noise, which is unavoidable in these kinds of images. Therefore, the use of membership functions for tissue segmentation in MRI is a good example of the application of our proposal. Figure 4 shows the diagram of the application of the proposed method.

Fig. 4
figure 4

Diagram of the proposed method

4.1 Materials and methods

Simulated 3D images from the Montreal Neurological Institute, McGill University were used (Kwan et al. 1996; Montréal Neurological Institute 2007). From the database, 50 images of size \(271\times 181\), weighted in T2 (TR = 3300 ms, TE = 35 ms, 120 ms) were selected. These images contain white Gaussian noise levels at 0, 1, 3, 5, 7, and 9\(\%\). The selection criterion was the broad presence of all four tissue types in the simulated brain MRI images.

The most common MRI sequences are T1 and T2-weighted scans. While in T1 images the contrast and brightness of the image are predominately determined by T1 properties of tissue, conversely, in T2 images the contrast and brightness are predominately determined by the T2 properties of tissue. Although images weighted in T2 were used, the proposed method can be applied, without loss of generality, to T1 images. In the case of the latter type of images, there are widely recognized tools and processing software for these images that serve as gold-standard in the neuroimaging research field, such as Computational Anatomy Toolbox for SPM (CAT) (Gaser et al. 2022) and FreeSurfer (Fischl 2012).

We define different experiments to compare the performance of the W-operators. The technical details are described in the following list:

  • Images: the noise levels used were 0, 1, 3, 5, 7 and 9\(\%\).

  • Dataset partition: the dataset is separated in training images and test images. Three partitions were used, in percentage 80-20, 70-30 and 50-50.

  • Size windows: five dimensions were used \(3\times 3\), \(5\times 5\), \(7\times 7\), \(11\times 11\) and \(15\times 15\).

  • Membership functions: given the nature of the training images, the membership function utilized was the Gaussian membership function.

4.2 Definition of membership functions

In our experiment, we consider four classes \((c=4)\) since each pixel will be classified into one of the following classes: background, white matter, cerebrospinal fluid, and gray matter.

Gaussian membership functions were chosen based on the shape of the histograms of the different classes, as it can be seen in Fig. 5a–c. These types of membership functions were selected, not only because their capability to adapt to the shape of these specific histograms, but also because their versatility and simplicity.

To define the Gaussian membership functions of each class, background rank \(R_{b}\), white matter rank \(R_{wm}\), cerebrospinal fluid rank \(R_{cf}\), and gray matter rank \(R_{gm}\) the mean and the standard deviation must be calculated (Comas et al. 2014). Figure 5d–f shows these functions and how they cover the frequency distribution of the gray levels for each class. The gray matter is represented with the color red, the cerebrospinal fluid with blue color, the white matter with yellow color and the background with black color. This last class is not visible because their mean is equal to 0.

Fig. 5
figure 5

Histograms and Gaussian membership functions using 80-20 dataset partition for the different noise levels. ad \(0\%\). b–e \(5\%\). cf \(9\%\)

For each image \(O_i\) and each gray levels range \(R_j\), the mean \(m_{ij}\) and the standard deviation \(\sigma _{ij}\) are determined. Once the estimation of the parameters \(m_{ij}\) and \(\sigma _{ij}\) have been calculated, the average for each class \(j=\{1,\ldots ,c\}\) is calculated using the following equations:

$$\begin{aligned} {\overline{m}}_j=\sum _{i=1}^{N} \frac{m_{ij}}{N} \quad {\overline{\sigma }}_j=\sum _{i=1}^{N} \frac{\sigma _{ij}}{N}. \end{aligned}$$
(3)

The Gaussian membership functions for each class are defined using the parameters \({\overline{m}}_j\) and \({\overline{\sigma }}_j\). Therefore, four membership functions \(U_b\), \(U_{wm}\), \(U_{cf}\) and \(U_{gm}\) are established for the background, white matter, cerebrospinal fluid, and gray matter, respectively.

The proposed approach is presented in the pseudocode given in Algorithm 1.

Algorithm 1
figure a

Proposed algorithm to design W-operators

5 Results and discussion

In this section we present the results of the robustness analysis performed in order to validate the W-operators defined by membership functions. The metrics used to evaluate the performance of the operators were classification error, sensitivity, and specificity, all calculated from the values of the average confusion matrix of each experiment. These metrics are calculated with the following equations (Sokolova and Lapalme 2009):

$$\begin{aligned}{} & {} {\text {Classification Error}} = \frac{{\text {FP}}+{\text {FN}}}{{\text {TP}}+{\text {FN}}+{\text {FP}} +{{TN}}} \end{aligned}$$
(4)
$$\begin{aligned}{} & {} {\text {Sensitivity}} = \frac{{\text {TP}}}{{\text {TP}}+{\text {FN}}} \end{aligned}$$
(5)
$$\begin{aligned}{} & {} {\text {Specificity}} = \frac{{\text {TN}}}{{\text {TN}}+{\text {FP}}} \end{aligned}$$
(6)

where \({\text {TP}}\) corresponds to true positives, \({\text {TN}}\) to true negatives, \({\text {FP}}\) to false positives, and \({\text {FN}}\) to false negatives.

As an example of the results obtained in the experiments, some tables are shown. In Table 1 the results of the W-operator, using windows of sizes \(3\times 3\), \(5\times 5\), \(7\times 7\), \(11\times 11\) and \(15\times 15\), are presented. Those results are obtained using the partition 80-20. The gray matter is represented with red color, the cerebrospinal fluid with blue, the white matter with yellow and the background with black.

Table 1 Results of the W-operator using \(3\times 3\), \(5\times 5\), \(7\times 7\), \(11\times 11\) and \(15\times 15\) windows and the partition 80-20

Tables 2, 3, 4 show the classification error, the sensitivity and the specificity using the different partition for the noise level \(0\%\) and \(9\%\). The results presented show that the smallest windows provide the better segmentations. The best results are obtained with windows of size \(3\times 3\) and \(5\times 5\). As might be expected, when the noise level decrease, the better segmentation results.

Table 2 Classification error, sensitivity and specificity for W-operator using the partition 80-20
Table 3 Classification error, sensitivity and specificity for W-operator using the partition 70-30
Table 4 Classification error, sensitivity and specificity for W-operator using the partition 50-50

Figures 6, 7 and 8 show the classification error, the sensitivity and the specificity graphs of the W-operators designed with windows of sizes \(3\times 3\), \(5\times 5\), \(7\times 7\), \(11\times 11\) and \(15\times 15\), using the different sets of training and test images with different noise levels. The graphs show that when the size of the window grow up, the error increase while the sensitivity and the specificity decrease.

Fig. 6
figure 6

Classification error of the W-operators designed for the different partition. a 80-20. b 70-30. c 50-50

Fig. 7
figure 7

Sensitivity of the W-operators designed for the different partition. a 80-20. b 70-30. c 50-50

Fig. 8
figure 8

Specificity of the W-operators designed for the different partition. a 80-20. b 70-30. c 50-50

For a window of size \(3\times 3\), the error is less than \(3\%\) in all the partitions used when working with images with noise from 0 to \(5\%\). The error is less than \(6.2\%\) when noise increase to \(9\%\). The partition 80-20 proved to be the best, despite the percentage of error.

When the noise level increases, there is a greater overlap between the ranges of each region or class, as could be observed in the training images histograms (see Fig. 4). The greatest overlap can be observed between the white matter (yellow line) and the gray matter (red line) classes, while the overlap of the cerebrospinal fluid class (blue line) is slight.

At this point, it is important to know the generalization capacity (GC) of the proposed method. To achieve this goal, it is essential to calculate the total number of observation vectors (total \(\#\) of X) obtained from the set of test images, and the number of observation vectors that were labeled by the membership functions (\(\#\) X labeled by MF). Subsequently, these values are substituted into the following equation:

$$\begin{aligned} \text {GC} = \frac{\# X \text { labeled by MF }}{\text {total } \# \text {of X}}\times 100\%. \end{aligned}$$
(7)

Equation 7 was applied to calculate the generalization percentage of the proposed method in MRI with 0\(\%\) and 9\(\%\) noise. The total number of observation vectors obtained from the set of test images is equal to 962,125. In the case of the application of the W-operator designed using Gaussian functions and a \(3\times 3\) window to MRI with 0\(\%\) noise, the number of observation vectors labeled by the membership functions is equal to 423,882. Consequently, the generalization percentage was 44.06\(\%\). Conversely, when applying this W-operator to MRI with 9\(\%\) noise, the generalization percentage increases to 48.09\(\%\).

It is evident from the generalization percentages in each experiment that the W-operator designed with membership functions demonstrates a higher level of generalization in MRI scenarios characterized by elevated noise levels. This is due to the greater number of new observation vectors that do not exist in the domain of the W-operator under such conditions.

6 Comparison of the proposed method with other MRI segmentation methods

The proposed method was compared with other techniques focused on brain MRI segmentation. In Dubey and Mushrif (2015), propose calculating a set of thresholds to segment each tissue in brain MRI, using the intuitionistic diffuse roughness measure (IFRM), obtained using the histogram as a lower approximation and the intuitionistic diffuse histon as a top approximation. In Dubey et al. (2016), propose an intuitionistic FCM clustering algorithm for MRI segmentation. The initial centroids are obtained by means of the aforementioned intuitionistic diffuse roughness measure (RIFCM). In both of these segmentation methods, the evaluation of performance is based on common metrics, including the Jaccard coefficient and the Dice coefficient. These metrics are calculated based on the confusion matrix, composed of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The Jaccard similarity coefficient measures the similarity between two sets by comparing their shared and dissimilar members. Its values range from 0 to 1, with higher values indicating greater similarity between the two sets. The Jaccard coefficient is defined as:

$$\begin{aligned} {\text {Jaccard coefficient}} = \frac{{\text {TP}}}{{\text {TP}}+{\text {FP}}+{\text {FN}}}. \end{aligned}$$
(8)

The Dice coefficient, a measure of similarity between two sets, is defined as:

$$\begin{aligned} {\text {Dice coefficient}} = \frac{2 \; {\text {TP}}}{2 \; {\text {TP}}+{\text {FP}}+{\text {FN}}}. \end{aligned}$$
(9)

This similarity metric falls within the range of 0 to 1, and the higher its value, the greater the similarity between the two sets, in this instance, the ideal image and the image segmented by a particular method.

In Table 5, it can be observed that the similarity coefficients evaluated in our methodology exhibit slight discrepancies compared to those obtained by other approaches. Nevertheless, our proposal to utilize membership functions generated from the training data addresses one of the most prominent challenges in the automatic design of W-operators. This challenge lies in generalizing the w-operators for the processing of new images that were not used during the training phase. As a result, the ability to perform multi-class segmentations is significantly expanded, as illustrated in the case of segmentation in MRI.

Table 5 Comparison of the proposed method with other methods focused on brain MRI segmentation

7 Conclusions

A new approach was introduced for the automatic design of W-operators using membership functions to solve the generalization problem in the case of multi-class segmentation. W-operators designed with the proposed approach were applied to segment magnetic resonance images. The experiments were carried out with different numbers of training and test images, different windows sizes and different noise levels. The error of the designed W-operators increases as the window size increases, obtaining the smallest error for a \(3\times 3\) window. The classification error also increase as the noise level does.

The proposed method, using membership functions, had good performance in all experiments when working with windows of size \(3\times 3\), solving the generalization problem and achieving the segmentation of each tissue in images in gray levels with class ranges with overlap and fuzzy boundaries. As future work, the proposed method will be applied to real magnetic resonance images.