1 Introduction

Texture is a kind of coherent image feature [1], which is helpful to partition one image into different pieces. In some pieces, people may have special interests. These subjective interests can be described objectively from the angle of texture analysis. Still now, texture is various in styles with no unified definition [2]. In structure, texture can be taken as a set of pixels with repeated sequence. From the angle of vision, texture can be perceived as some image component with stable statistical characteristic in a certain scale. Within this scale, this characteristic is almost constant or changes periodically. Texture segmentation is the process to find the boundaries between different texture regions. One texture region should have one coherent feature. Theoretically, texture segmentation will experience two stages: the first one is feature analysis and extraction; the second one is feature classification. To analyze and extract the feature of texture, five methods are usually used, which are statistics-based [2,3,4], stochastic process-based [2, 3], structure-based [4,5,6], fractal-based [2, 4,5,6], and time–frequency analysis-based [6,7,8,9,10,11]. To classify the feature of texture, Fuzzy C-Means cluster analysis is now the most popular method in unsupervised texture segmentation studies [12].

In this paper, the time–frequency analysis-based method and the fractal-based method are used for feature analysis and extraction. In concepts, the texture image is a kind of two-dimensional signal. For two-dimensional signals, the time–frequency analysis-based method is actually a space-frequency analysis-based method. Specially, when doing space-frequency analysis, the analytic phase theory is introduced. Another key step in this paper is that the Bi-dimensional Empirical mode decomposition (BEMD) theory is brought in this paper [13, 14]. Accordingly, Thomas method or Bovick Method were improved in this paper [15, 16]. In details, there are some effective tools in space-frequency analysis-based method, such as the Gabor filter and the quaternionic Gabor filter [16, 23]. These tools had been well used in Thomas method or Bovick Method [16, 25]. Certainly, the fractal-based method has its own tools [26, 27]. Different tools are sensitive to different texture styles; just as different person’s eyesight or resolution has different directivity. Based on this point, a primary idea of parallel unsupervised texture segmentation is proposed in this paper. This idea can be considered as inviting 10 different persons to watch the same sophisticated texture image together. By recording everyone’s judgment, the segmentation result is acquired. Guided by this idea, an effective texture segmentation approach is designed in this paper.

This paper is organized as follows: related mathematics bases are introduced in Sect. 2; the details of “side-by-side superposition unsupervised texture segmentation (SSUTS)” approach are explained in Sect. 3; experiment results are shown in Sect. 4; the necessity, reasonability and limitation of the SSUTS approach are concluded in Sect. 5.

2 Mathematics bases

According to the primary idea of this paper, these 10 different person’s eyesight is the cornerstone. To investigate these 10 different persons’ eyesight is one important work. To help these 10 different persons to improve their eyesight is another important work. Mainly around these two missions, some special algorithms and mathematics concepts were brought in this paper.

2.1 BEMD and IMF1

In 1998, Empirical mode decomposition (EMD) was firstly introduced by Huang et al. [13]. For non-stationary signals, EMD is an adaptive multi-scale analysis tool. In EMD process, the original signal can be decomposed into a series of intrinsic mode functions (IMFs) which is helpful to reflect the intrinsic time–frequency or space-frequency characteristics of the original signal. When the EMD studies are carried to two-dimensional cases, we can call it the Bi-dimensional Empirical Mode Decomposition (BEMD).

The decomposition result of BEMD is:

$$f(x,y) = \sum\limits_{i = 1}^{n} {c_{i} (x,y) + r_{n} (x,y)}$$
(1)

In formula (1): \(c_{i} (x,y)\) are the IMFs; \(r_{n} (x,y)\) are the residues. If the decomposition goes only once, we can simplify (1) as (2) shows:

$$f(x,y) = c_{1} (x,y) + r_{1} (x,y)$$
(2)

In this paper, we define \(c_{1} (x,y)\) as the first IMF (IMF1). IMF1 is a two dimensional signal with frequency identity, which is quite different from the original signal (original image). That means IMF1 is a two-dimensional signal with mono instantaneous phase and mono instantaneous frequency. This character is surely important to extract analytic features from two-dimensional signals. On the other hand, IMF1 has inherited over 90% image contents from the original image [14]. Thus, IMF1 should be taken into consideration when doing texture segmentation.

2.2 Analytic phase theory

Analytic phase is the most important analytic feature. The concept of analytic phase is special for two- dimensional signal. Actually, analytic phase is an extension of one-dimensional instantaneous phase. Instantaneous phase is a primary concept in time–frequency analysis theory [15, 16]. The image is a classic two- dimensional signal in space. Therefore, the concept of image analytic phase can be defined corresponding to one-dimensional instantaneous phase. The acquirement of analytic phase is depended on the construction of the two-dimensional analytic signal. Principally, the two-dimensional analytic signal construction should be the extension of one-dimensional analytic signal construction. One-dimensional analytic signal has a list of properties. Surely, these properties turned to be guidelines to construct two-dimensional analytic signal [15]. Since now, the most successful approach about two-dimensional analytic signal construction is the quaternionic analytic signal definition [16,17,18,19,20].

In spatial domain, the quaternionic analytic signal [16] is

$$\hat{f}_{q} (x,y) = f(x,y) + if_{H}^{x} (x,y) + jf_{H}^{y} (x,y) + kf_{H} (x,y)$$
(3)

where \(f_{H}^{x} (x,y)\), \(f_{H}^{y} (x,y)\) are the partial Hilbert transforms and \(f_{H} (x,y)\) is the total Hilbert transform. When calculating the mod [16], the quaternionic analytic signal can be expressed as (4):

$$q = \left| q \right|e^{i\phi } e^{k\psi } e^{j\theta } \;\left( {\phi ,\theta ,\psi } \right) \in \left[ { - \pi ,\pi } \right) \times \left[ { - \pi /2,\pi /2} \right) \times \left[ { - \pi /4,\pi /4} \right]$$
(4)

\(\phi\),\(\theta\), \(\psi\) are the two-dimensional analytic phase in space. Till now, there are two methods to calculate the two-dimensional analytic phase, which are brought out by Thomas and Cui Feng [16, 21].

As a classic time–frequency analysis process, calculating the analytic phase is an important feature extraction technology in texture segmentation. When doing texture segmentation, a difficult issue is how to distinguish the deviation between the same textures. The same textures have the same structure, the same statistics result, the same roughness, and the same self-similarity. Thus, other methods don’t have the capability to describe the deviation boundaries. However, the time–frequency analysis method has such a language to describe it, because the deviation within the same textures can be defined as a phase jumping of the two-dimensional narrow-band signal. The shortcoming of the time–frequency analysis method is to describe the subtle changes in structure.

One kind of popular subtle changes in structure is called production structure. The time–frequency analysis method improved a lot in analyzing production structure, after the appearance of the two-dimensional mono-component signal theory [22]. For real signals, if they themselves are mono-component signals, their analytic phase will have a clear physical meaning. For this reason, IMF1 is used to calculate the two-dimensional analytic phase \(\phi\), \(\theta\), \(\psi\) in this paper, because IMF1 is a kind of classic two-dimensional mono-component signal with mono instantaneous phase and mono instantaneous frequency.

2.3 Gabor filter and quaternionic Gabor filter

Gabor filter is a direction filter. Gabor function is a Gaussian function modulated by a complex sinusoidal function[23]:

$$G(x,y) = g(x^{\prime},y^{\prime})\exp (2\pi i(Ux + Vy))$$
(5)

where \((x^{\prime},y^{\prime}) = (x\cos \theta + y\sin \theta , - x\sin \theta + y\cos \theta )\). \(\theta\) is the rotating angle of coordinates in spatial domain. \((U,V)\) is the central frequency. According to Gabor function’s mathematics properties, Gabor filter is easy to distinguish textures with different directions [24].

Imitating the traditional Gabor filter, Bulow Thomas designed the quaternionic Gabor filter [16], which can be expressed as:

$$G^{q} (x,y) = g(x^{\prime},y^{\prime})\exp (2\pi iUx)\exp (2\pi jVy)$$
(6)

The parameters in formula (6) are exactly the same as those in formula (5). If the quaternionic Gabor filter is convoluted with a two-dimensional real signal, the result is quite similar with a quaternionic analytic signal, where the \(k\) term will appear after the \(i\) term products the \(j\) term. The quaternionic Gabor filter is not very sensitive to the texture direction, but it is very sensitive to the texture production structure.

Totally speaking, the Gabor series filters can be qualified as a good imitation to our eyes, which can greatly balance the resolution between the spatial domain and the frequency domain [23].

2.4 Bovick method &Thomas method

In Bovik method, analytic phase is brought in when using Gabor filter to do texture segmentation [25]. Thus, Bovik method solved the problem of checking phase shift in texture segmentation. Even though checking texture phase jumping clearly, Bovik method doesn’t have the capability to detect production texture.

Thomas method is to modify the quaternionic Gabor filter with analytic phase in texture segmentation. We can say that Thomas method improved and expanded Bovik method. The main idea of Thomas method is: the quaternionic Gabor filter substituted the traditional Gabor filter to convolute with the original image (a two-dimensional real signal); the quaternionic Gabor filtered result is quite similar with a quaternionic analytic signal; this quaternionic analytic signal can be calculated to get two-dimensional analytic phase. Thomas called them “local phase”. These “local phase” have direct relations with the original two-dimensional real signal’s intrinsic structure. Therefore, Thomas method can detect production texture and classify it reasonably.

2.5 Fractal-based method & fuzzy C-means cluster analysis

In so many texture segmentation methods, fractal-based method is an important method with good noise immunity. Fractal-based method is capable to detect plentiful image details [26] and is very sensitive to texture roughness. As a classic fractal-based method, Differential Box-counting (DBC) method is used in this paper. The details of DBC method was discussed in reference [27].

Fuzzy C-Means(FCM) cluster analysis is the most popular partition-based method, which was presented by J. C. Bezdek in reference [12].

3 The primary idea of parallel unsupervised texture segmentation

Texture segmentation approaches are divided into two kinds: supervised segmentation or unsupervised segmentation. In most cases, the exact number of classes is not known for a texture image. Thus, unsupervised segmentation approach is welcomed. For unsupervised texture segmentation, there are two issues to be considered, calculating the number of texture classes and determining the boundary of one texture’s area. The FCM algorithm is possible to cluster in feature space. However, in FCM process, the clustering number is required as a known condition. This is a key difficulty in unsupervised texture segmentation approach, because whether a sophisticated image could be segmented properly is decided by whether an appropriate method could be found to calculate the number of the texture classes exactly.

According to current studies, lots of researchers raised different Cluster Evaluation Functions (CEF) [28,29,30]. CEF will give quantum judgments for the reliability of clustering results. The common idea of different unsupervised texture segmentation algorithms is to take the CEF as the clustering stopping criterion. The real-time number of texture classes will be accepted as the actual number, when CEF reaches the maximum value (meanwhile the clustering process will stop). The segmentation method based on CEF is like to force one person to observe a sophisticated texture image for 24 h. This single person’s resolution ability is naturally limited, but this person should provide us a judgment at last. In this process, some potential problems are required to be considered. Firstly, this is an indirect approach to determine the number of texture classes. Secondly, this is an unbalanced approach since the clustering feature vector sets always come from a single feature analysis and extraction tool. Thirdly, this is not a robust approach because the segmentation result would be far different from its actual structure if the clustering number is wrongly determined, which means that the clustering is going to a complete failure at last.

On the other hand, according to the studies on different texture segmentation tools, we found that different segmentation tools have different senses on different texture styles: Bovik method is good at distinguishing textures with different direction and frequency; Thomas method does well in detecting production texture; fractal-based method is very sensitive to texture roughness and has good noise immunity.

Considering all the reasons discussed above, a novel unsupervised texture segmentation method is brought out in this paper, which can be called “side-by-side superposition unsupervised texture segmentation (SSUTS)” approach, as shown in Fig. 1.

Fig. 1
figure 1

The scheme of SSUTS approach

This approach can be described as follows: a sophisticated texture image can be segmented with different tools and be divided into fixed number clusters. As a result, different texture segmentation tool will naturally divide the original image into several areas where the texture feature is most easily detected. In order to segment more exactly, the number of clusters is limited when using a single segmentation tool, which is no more than 5 (normally set as 2 or 3). It is necessary to do the first post processing in this step, because the small piece errors produced by a single segmentation tool may disturb the superposition process in the next step. After that, the areas detected by one segmentation tool are assigned with different gray value which is actually the segmentation result image of this single segmentation tool. In the end, the segmentation result images produced from different segmentation tools can be superimposed together. Therefore, the superimposed result is the primary segmentation result of SSUTS. Certainly, the second post processing is also necessary in order to get a better segmentation result.

The SSUTS approach is like to invite 10 different persons to watch the same sophisticated texture image for just 1 min and then record everyone’s judgment. Everyone in this course only need to take a simple judgment by realizing the model of fixed number cluster. Thus, the SSUTS approach is to realize the unsupervised cluster course through several simple and stable fixed number cluster courses. This is the basic reason why the SSUTS approach can easily avoid the serious risk caused by class judgment failure.

Just as FCM, BEMD is also a totally data driven method without any supervisor, we can do BEMD for texture images firstly and then deal with IMF1 with SSUTS. Compared with other segmentation fusion methods, such as STAPLE method [34] and the methods of reference [35, 36], the main difference is that SSUTS extracts two-dimensional mono-component signal firstly.

During the first and the second post processing, we used the morphological method as well as the special algorithm describes as follows [31]. Normally, a small local grid is designed within the area of less than one percent of the original image. In this small grid, if the one pixel is found to be in a different cluster, it should be marked. Meanwhile, a large local window including this pixel is also designed within the area of less than twenty-five percent of the original image. In the large window, those marked pixels will be determined to join the cluster with the largest number of pixels in the large window. Throughout this process, the small grid will move pixel by pixel till cover the whole window. The diagram of this special algorithm is shown in Fig. 2, where the window is set as one in sixteen as the original image. After processing the first window, we can repeat the same process in the other fifteen windows. Certainly, the post processing algorithm in SSUTS is also different with other segmentation fusion methods.

Fig. 2
figure 2

The diagram of the post processing algorithm

4 The experiment results

According to the scheme of SSUTS, we segmented several texture images of different styles and compared the experiment results with some classic image segmentation methods.

In the first experiment, we segmented a classic texture image spliced by 5 synthetic textures as shown in Fig. 3. After the process of BEMD on the original, we segmented the first IMF image with Thomas method. For the first IMF, the local phase is calculated as shown in Fig. 4, Fig. 4a is the three-dimensional local phase, Fig. 4b is the grayscale local phase, Fig. 4c is the median filtering result of Fig. 4b.

Fig. 3
figure 3

The five synthetic texture image

Fig. 4
figure 4

The \(\psi\) value of the local phase of the five synthetic texture image

From Figure 4 we can see that Thomas method has clearly distinguished the left top region from the other four, because the texture structure in the left top region is non-productive, while the other four regions’ texture structure is productive. However, Thomas method can not clearly differentiate the left bottom texture from the middle round texture, because these two regions’ productive structure is quite similar. As shown in Figure 5, these two texture regions can not be clearly departed after FCM process. The experiment result is still not so good even after the post processing step with morphological method, as shown in Figure 6.

Fig. 5
figure 5

The FCM result of the five-synthetic texture image

Fig. 6
figure 6

The morphology results of the five-synthetic texture image

When using the SSUTS approach, the result is much better; the option is briefly described as follows:

  1. (1)

    In Thomas method, the original image of Fig. 3 is segmented in the model of fixed number cluster (the clustering number is 3). The result is shown in Fig. 7.

  2. (2)

    In Bovick method, the original image of Fig. 3 is convoluted with Gabor filter banks whose direction is 45°. The results after median filter are shown in Fig. 8.

  3. (3)

    In Figs. 7, 9 is superimposed on Fig. 8, where Fig. 9a is the superposition result and Fig. 9b is the open-close operation result of Fig. 9a. In the scheme of SSUTS approach, Fig. 9b is the second post-processing result.

Fig. 7
figure 7

The three classes FCM result with the Thomas method

Fig. 8
figure 8

The two classes FCM with the Bovik approach

Fig. 9
figure 9

The segmentation after superposition

Comparing Fig. 9b with Figs. 5 and 6, we know that the SSUTS approach has obvious advantage over single Thomas method when dealing with similar productive structure texture image. Table 1 shows the error pixel rate (EPR) of Figure 3 segmentation result with different methods.

Table 1 Error pixel rate (EPR) of Fig. 3 segmentation result with different methods

The second experiment is about a synthesized texture image with D16, D22, D55 and D106 in the Brodatz Texture Database (BTD), as shown in Fig. 10a.

Fig. 10
figure 10

The segmentation of the four texture image of D16, D22, D55, D106

Figure 10b and c are the segmentation result and the post-processing result of Bovik method; Figure 10d and e are the three-dimensional local phase and the post-processing result of Thomas method; Figure 10f is the superposition result of Fig. 10c and e.

The third experiment is about the classic texture image nat-5 in the Brodatz Texture Database (BTD), as shown in Fig. 11a. In this experiment, nat-5 was cut to the size of 100×100. The original image’s

Fig. 11
figure 11

The segmentation of the nat-5 texture image

Image nat-5 is a challenging case in texture segmentation. The result in Fig. 11f can be called a desirable result. Table 2 shows the error pixel rate (EPR) of nat-5 segmentation result with different methods.

Table 2 Error pixel rate (EPR) of nat-5 segmentation result with different methods

Because Fig. 12 (a) is a classic directional texture image, its segmentation result of Bovik method is quite good, whose error pixel rate is only 4.3%. However, the SSUTS approach got an even better result, whose error pixel rate is 1.9%.

Fig. 12
figure 12

The segmentation of the underwater sand hill image

The fifth experiment is about an underwater plane image, as shown in Fig. 13a.

Fig. 13
figure 13

The segmentation of the underwater plane image

To suppress the influence of noise and get a better post processing effect, boundary zones were set around the image. The width of the boundary is from 5 pixels to 10 pixels. In this experiment, only the error pixels of the plane were counted to calculate EPR.

Except the low EPR compared with these classic texture segmentation methods as shown in Table 3, the SSUTS approach also has an obvious advantage over existing methods with similar idea of merging the segmentation results [32, 33]. That is the SSUTS approach is easy to avoid the serious risk caused by class judgment failure. Judgment failure can simply be considered as clustering false alarms and underreporting in texture segmentation. In this paper, we can define the Judgment Failure Rate (JFR) as a percentage whose calculation formula is (7).

$$JFR = \frac{{\begin{array}{*{20}c} {clustering\;} \\ \end{array} number\;of\;false\;alarms\;and\;underreporting}}{total\;region\;number\;of\;the\;original\;texture\;image} \times 100\%$$
(7)
Table 3 Error pixel rate (EPR) of underwater plane segmentation result with different methods

To test the JFR of SSUTS, an experiment was designed with Brodatz Texture Database (BTD). Randomly, texture images were taken out from BTD to splice into a new picture. Thus, if this new picture is a combination of 4 texture images from BTD, it turns into a 4 region-picture. The author built up seven groups of new pictures. Every picture in one group has the same region number (RN). The RN of these seven groups of pictures is from 4 to 10. In every group, there are 50 new spliced pictures. The author tested different segmentation approach in every group. The segmentation approaches include the DBC approach, the approach of reference [32], the approach of reference [33] and the SSUTS approach. In every test, the clustering number of false alarms and underreporting was counted to calculate the JFR. Table 4 shows the JFR of different segmentation approach in every group.

Table 4 Judgment Failure Rate (JFR) of BTD with different approaches

From Table 4, we can draw the conclusion that the SSUTS approach is much easier to avoid the serious risk caused by class judgment failure, especially when the RN is less than 10. The approach of reference [33] is more stable. However, if the roughness of adjacent textures is similar, the boundaries between adjacent textures won’t be so clear. Under this condition, the probability of clustering underreporting will increase obviously when using the approach of reference [33]. This is the main reason why the JFR of [33] is higher than the SSUTS approach, when the RN is less than 10.

5 Conclusion

In texture segmentation process, unsupervised clustering is very important. However, whether unsupervised clustering should only depend on the mathematical optimization is really worth discussing. Logically, the mathematical optimization method has no inevitable connection with the image segmentation process. Traced back to human being’s image segmentation process, we can find that it is a direct and parallel process. Therefore, from the angle of bionics, the SSUTS approach is brought out in this paper.

The image segmentation process is to imitate the judgment behavior of human beings’ vision system. On one hand, the traditional segmentation method based on mathematical optimization is like to force one person to observe a sophisticated texture image for 24 h. This single person’s resolution ability is naturally limited, but this person should provide us a final judgment. On the other hand, the SSUTS approach is like to invite 10 different persons to watch the same sophisticated texture image for just 1 min and then record everyone’s judgment.

Just as one single person’s eyesight has its own directivity, one single texture segmentation tool surely has its own advantages and shortcomings. For this reason, the combination of different segmentation methods should be a necessary choice. At least, it’s worth trying. In this paper, the fractal method was brought in, because its theory system is independent from the time–frequency analysis theory. With high experiment quality, the fractal method can be taken as a mirror mapping the time–frequency analysis based methods in this paper. This paper also provided some evidences about the improvements in texture segmentation, when the BEMD and the two-dimensional analytic phase theory were applied in some classic texture segmentation methods. These methods include the Thomas method, the Bovicks method and the fractal-based method, etc. These evidences are also very important in proving the reasonability of the SSUTS approach.

Compared with classic texture segmentation methods such as Bovik method, Thomas method and the DBC method, the SSUTS approach has a lower error pixel rate. Compared with existing methods of merging the segmentation results, the SSUTS approach is much easier to avoid the serious risk caused by class judgment failure.

In the SSUTS approach, the last segmentation result is acquired from superposition. During the superposition course, the same judgment from different methods will be determined as the same cluster; the different judgment from different methods will produce a new cluster. Meanwhile, there would be some “boundary deviation” accidents. These “small deviation parts” can’t be taken as a new cluster. On the contrary, some error pixels will come from these “small deviation parts”. Ordinarily, the open-close operation of morphology and the method in reference [31] can solve this problem. However, this is not a strict feature identification process. The superposition course determined that the total clustering number is limited. According to the experiences of this paper, the SSUTS approach is fit for the segmentation task whose texture regions is less than 10.

To improve the SSUTS approach, the choice of new texture segmentation tool is the key point. In the SSUTS approach, the primary principle for every texture segmentation tool is diversity. Thus, the new coming texture segmentation tool must have obviously different directivity from the existing tools in the SSUTS approach.

One single texture segmentation tool will do fixed number clustering in the SSUTS approach. This fixed number is an experience value in this paper, usually can be 3–5. In future study, we can design a program to calculate a “clustering number list”. In this list, one tool’s sensitivity to different texture styles will be quantified to its “clustering number”. When a new sophisticated texture image comes, this program can calculate the “clustering number list” firstly. Reading the “clustering number list”, the program of SSUTS approach can complete the process of unsupervised segmentation more effectively.