GPU-accelerated image segmentation based on level sets and multiple texture features

In this paper, we present a fast multi-stage image segmentation method that incorporates texture analysis into a level set-based active contour framework. This approach allows integrating multiple feature extraction methods and is not tied to any specific texture descriptors. Prior knowledge of the image patterns is also not required. The method starts with an initial feature extraction and selection, then performs a fast level set-based evolution process and ends with a final refinement stage that integrates a region-based model. The presented implementation employs a set of features based on Grey Level Co-occurrence Matrices, Gabor filters and structure tensors. The high performance of feature extraction and contour evolution stages is achieved with GPU acceleration. The method is validated on synthetic and natural images and confronted with results of the most similar among the accessible algorithms.


Introduction
Image segmentation is one of the most fundamental problems in computer vision. Deformable models [34] are a successful class of segmentation algorithms based on the idea of a deforming shape that adapts to the desired image region. The fundamental form of the deformable model-based segmentation method was proposed by Kass et al. [25] as an active contour model (ACM), also known as a "snake". The snake model is a parametric curve with an evolution process controlled by a set of external and internal energies. External energies attract the shape to the desired image area and move it towards the boundaries of the segmented region, while the internal forces control the contour smoothness. This method cards. Even if the building blocks of the proposed algorithm are known, their join use is undoubtedly original and open to further extensions.
The main objective of this work is a new fast segmentation method that is able to use a relatively large bank of different features and perform the costly level set evolution while maintaining computation time suitable for interactive usage. The method combines both classical and region-based level set schemes and is open to extension with additional descriptors.
The concept of a multi-stage approach was initially outlined in [42] in the context of classical discrete parametric contour (snake). Apart from the similarities in the feature selection stage, the snake-based model employed a vastly different evolution scheme that considered only the closest neighbourhood of the contour points. In contrast, the proposed approach uses both local and global level set-based ACMs that are more robust in handling of multiple features and offer little topological restrictions.
The rest of the paper is organised as follows. Section 2 gives a brief review of the up-todate related work on texture segmentation. It also recalls the utilised feature extraction and level set methods. Section 3 describes the proposed method and discusses implementation concerns. Section 4 presents an experimental evaluation of the method on both synthetic and natural images, as well as a comparison with other segmentation methods. Conclusions and possible future works are given in the last section.

Related work on texture segmentation
Active contour-based methods were widely adapted to the task of texture segmentation. While the parametric contours are still used [33,52], the level set-based models are gaining more and more attention. Generally, texture-based methods can be divided into supervised and unsupervised. The supervised methods often take advantage of information from the initial texture analysis stage. Notably, Paragios and Deriche [38] proposed an extension of the GAC model that performs an off-line feature analysis. The method requires a set of texture patterns present in the image. The analysis results are then integrated in the region and boundary image forces. Pujol and Radeva [39] employed the GLCM features in their ACM. Linear Discriminant Analysis was used to reduce a feature space, which was then applied to create a likelihood map that guided the contour deformation.
In the case of the unsupervised approaches, no prior analysis or classification is performed. Shen et al. [48] proposed a method for creating a likelihood deformation map from the intensity statistics and edge information of the initial region. This solution was integrated with 2D and 3D models. Huang et al. [23] presented a similar approach, where the intensity characteristics were used for the textures with small texons, and a bank of Gabor filters was utilised in the case of textures with larger patterns. Awate et al. [3] proposed a fast multi-phase level-set method based on threshold dynamics [15] that introduces a general nonparametric statistical image neighbourhood model instead of specific texture features. Rousson et al. [45] presented a method based on a variational framework, where the texture features are extracted with a structure tensor and nonlinear diffusion scheme. Savig et al. [46] created a texture edge detector from the Gabor feature space based on the Beltrami framework [50] and integrated it with the GAC and ACWE models. Houhou et al. [22] also used the Beltrami framework to propose a descriptor that captures both edge and textural properties. In this case, the ACM was based on Kullback-Leibler distance.
More recently, Wu et al. [57] presented a method based on ACWE, where GLCM and Gabor features are fused together to create the final feature space. Min et al. [32] integrated the image intensity and a novel adaptive texture descriptor into the ACWE-based model. Mewada et al. [31] combined the ACWE approach with a modified version of the linear structure tensor. The ACWE model was also applied for texture segmentation by Wang et al. [55], where the contour was driven by a local histogram. Tatu and Bansal [53] proposed a GAC-based model that uses intensity covariance matrices for the texture features. Gao et al. introduced a factorisation-based ACM that utilises the local spectral histogram as the texture feature [16], and, more recently, a model that performs a fusion of intensity and Gabor-based features along with a factorisation scheme [17]. Dong et al. [14] also employed a factorisation-based ACM, combined with neutrosophic sets, in the task of color texture segmentation. Dahl and Dahl [11] created a method based on ACWE and probabilistic image patch dictionaries. Ahmad et al. [1] proposed a fuzzy version of a variational level set model that uses the coefficient of variation as a region statistics. The model was proven to be effective on textured and inhomogeneous images [59]. Wang et al. [56] also tackled this problem by introducing an inhomogeneity entropy descriptor that was driving the energy of a level set-based ACM.

Texture feature extraction
Grey Level Co-occurrence Matrix The GLCM is a square matrix of order equal to the number of pixel grey levels. For a spatial window w on an image I , the GLCM contains the probabilities of pair-wise occurrence of pixels with two grey levels. Each of the (i, j ) matrix elements contains a number of occurrences, where a pixel with intensity i is adjacent to a pixel with intensity j . The adjacency of the pixels is defined by the distance between them in a given orientation. This matrix, after normalisation, is then used to calculate a set of texture feature descriptors, like Entropy, Correlation, Homogeneity, Contrast or Energy [20]. The feature generation process usually considers a selected set of features and generates a feature space with a combination of parameters: window size, pixel distance and orientation. To capture the properties of various patterns in different scales and orientations, the highlydimensional space can contain a large number of features, therefore a feature selection [39] or fusion [57] is often performed.
Gabor features A two-dimensional Gabor filter can be defined as a Gaussian kernel function modulated by sinusoidal wave plane. In this form, the response of the filter can be obtained by its convolution with the input image. Gabor filters were designed to model the behaviour of mammalian visual cortex cells [12] and have the ability to localise the information in both spatial and frequency domain. These properties make them useful in the image feature extraction process, particularly in the texture analysis field [24,40]. In practice, the feature extraction is often performed by convolving the image with a set (bank) of Gabor filters with different orientations, scales and wavelengths.
Structure tensor For a scalar two-dimensional image I with a pixel coordinate p = (x, y), the structure tensor [4,5] is defined as a matrix derived from the gradient of the image: where g(p) is a Gaussian kernel with a σ standard deviation and I x and I y are partial derivatives of I in a window w centred at p. The size of the w window is crucial for capturing the image features of a desired scale. As shown in (1), the tensor gives three features for one pixel. Moreover, the resulting S(p) matrix contains information about the direction and strength of the edges in the neighbourhood of the p point. The two eigenvectors of the matrix are aligned orthogonally and parallel to the local edge (intensity change) in the windows w, while the corresponding eigenvalues indicate the strength of the edge in both directions.

Level sets and the ACWE model
An active contour, according to the level set idea [37], can be defined as a curve C in a form of a set of zero-height points p = (x, y) on the surface φ(p, t). This formulation gives C = {p : φ(p, t) = 0}, where φ(p, t) : 2 → and t is the evolution time step. The curve evolves in its normal direction according to the differential equation: where φ 0 is the initial contour and F is a speed function F (p, t), which allows the contour to expand or contract in order to encompass the segmented region. While the originally proposed level-set based methods mostly relied on edge information to drive the contour [6,7], Chan and Vese proposed a global region based ACWE model [9] inspired by the Mumford-Shah formulation of the segmentation problem [35]. Given an image I , defined in the domain, a piecewise constant segmentation of I into foreground and background can be obtained with the curve C, evolved by minimising the ACWE energy functional defined as: where c + and c − represent the average pixel intensities of I inside and outside of the curve C, respectively, I (p) is the image intensity in the point p, Length(C) is the total length of the curve, and μ ≥ 0 and λ 1 , λ 2 > 0 are positive constant weights. This functional aims to minimise the intensity variance inside and outside the contour. This minimisation problem can then be redefined using the level set approach. The curve represented as a zero level set function C = {p ∈ | φ(p) = 0}, where φ(p) > 0 is the inside of the curve and φ(p) < 0 is outside. With this curve formulation, the energy functional takes the form: where H (z) and δ(z) are regularised versions of 1D Heaviside and Dirac delta functions, defined as: To finally solve the problem, the Euler-Lagrange equation can be obtained by minimising the energy with respect to φ and parameterising the descent direction by the evolution time t while keeping c + and c − constant: where div denotes the divergence operator. After each iteration, c + and c − are updated according to: The initial ACWE model formulation was extended to handle vector-valued images [8]. Instead of two average intensity values (inside and outside the curve), the method uses vectors of the average intensities for each image channel. In this form, the image defined on consists of multiple channels where are two vectors with the average values of the image channels inside (c + ) and outside (c − ) of the curve, and λ + i and λ − i are now separate weights for each channel.
The advantage of this model lies in its ability to separate image regions even without a sharp edge between them. This can be visible in the case of some texture features (see Fig. 1) when the separation can be possible, but the edges are blurred. The vector-valued version can also detect region dissimilarities that are present only in some of the channels.

Method overview
The proposed segmentation method is based on a two-dimensional level set representation of an active contour. After a manual initialisation of the contour, the algorithm performs three main stages: initial texture feature extraction and selection, followed by two curve evolution stages (see Fig. 2). The first stage generates a set of texture feature maps that will be used to guide the further evolution of the contour. The next stage is based on a traditional level set propagating front formulation. The propagation speed function uses a strict textural stopping term that tracks sudden changes in the selected feature maps. Finally, the last stage switches to a variational level set framework based on vector-valued ACWE method. The second stage goal is to give a general outline of the segmented region, while the third stage is designed to provide a refined final form of the contour. The method also makes heavy use of GPU acceleration in most of the performance-critical computations.

Texture feature extraction and selection
The method starts with a texture feature extraction and selection stage, similar to the method described in [42]. The stage results in a set of texture feature maps M best , where a single feature map m is a scalar image that contains the feature values for each pixel of the original image. Any feature extraction method that is able to generate such a map can potentially be applied. The extraction starts with a list of all possible feature maps that can be generated with the available algorithms. Instead of calculating whole maps for the entire image, the process generates the M init set of maps only in the bounding box of the initial contour. For best results, this initialisation should capture a representative sample of the pattern. The features are then pre-selected to the M best set defined as: where r is the user-provided threshold (equal to 65% by default) and %RSD(m) is the Relative Standard Deviation of feature map m inside the initial contour. In this way, the process step picks the maps with the most uniform feature values inside the contour, according to a specified threshold. This criterion assumes that the feature uniformity identifies it as a suitable descriptor of the initial pattern. Furthermore, its important change during the contour evolution can signal a change in the texture. Next, the feature maps in M best are re-calculated for the entire image and are ready for the following stages. The current implementation uses GLCM, Gabor and structure tensor features. As for GLCM, five features are calculated: Entropy, Correlation, Homogeneity, Contrast and Energy. A feature space is generated using a combination of specific parameters: window size, displacement and orientation. The extraction process starts with a fixed set of GLCM properties and, if necessary, performs additional computations. Automatic increase of the window size (from 3×3, 5×5 and 7×7 upward) can be performed in the case of larger patterns and redundant channels for different angles can be omitted or merged in case of isotropic texture patterns. Please refer to [42] for more details on these operations. Moreover, a bank of 24 Gabor filters with four orientations (0 • , 45 • , 90 • and 135 • ), three wavelengths and two Gaussian envelope scales is considered. The generation process also employs an angle detection scheme, similar as in the case of GLCM features. Finally, the method constructs a set of structure tensor features. This set consists of maps of both eigenvalues of tensors generated from σ and w combinations for a total of 18 feature maps.

Fast level set evolution
In the second stage, the initialised contour is deformed using the level set method. This stage is based on the traditional level set formulation presented in (2), where the speed function F in the point p combines image and curvature terms [27] as: where D(p) is the image data term that drives the deformation, C(p) = div(∇φ(p)/|∇φ(p)|) is the curvature and α ∈ [0, 1] is a user-defined balancing parameter. Instead of the original intensity-based image term, we apply a texture-based term D tex (p) [41] that takes into consideration the previously selected texture feature set. First, for a given image point p, a set of features S p is defined as: where m is a texture feature map in the selected set M best ,x(m) and σ (m) are the feature map mean and standard deviation inside the initial contour, m(p) is the value of the map m in the point p and θ is a user-defined parameter that denotes the term sensitivity. In this way, the S p set will contain the features in point p that are sufficiently dissimilar to their mean values inside the initial contour. Next, a function over(p) is defined to return a number of features that overstep the condition in (11) for each point in the image I : Finally, the texture data term is defined as: where v is a predefined constant. In this way, the curve is encouraged to expand into points where the texture features have values similar to the interior of the initial contour, but even one feature dissimilar enough will result in a strong penalty (see Fig. 3). The second stage of the algorithm is expected to give a rough outline of the segmented region. An exact segmentation can be hard to achieve due to characteristic of some feature extraction methods [5,20] that can provide distinctive, but blurry region boundaries. Figure 4c shows a result of this stage, where the contour covers most of the target region, but still contains some internal irregularities. Its edges are close to, but still not exactly on the desired boundaries of the region. This way of action is our deliberate choice: we assume that it is better to stop before passing the boundary than to cause an oversegmentation (see Fig. 4e). To encourage this behaviour, the default values of parameters were experimentally set to v = 20, α = 0.05 and θ = 4, where θ is the main user-adjustable method sensitivity.
The level set equation is solved with a GPU-accelerated implementation of the traditional numerical scheme [37]. The simulation is performed until the convergence of φ or a specified maximum number of iterations is reached.

Final contour refinement
The third stage is designed to provide a final form of the contour. At this point, the method switches from a classical level set algorithm to a region-based vector-valued ACWE model that will work on previously selected texture feature maps. Furthermore, the information about the texture dissimilarities from (11) influences the length term of the energy functional (8). Originally, the μ parameter, along with the length contour μ · Length(C), is used for scaling/regularising of the contour shape. Smaller values of μ permit the contour to detect smaller objects in the image, at the cost of the contour smoothness.
Conventionally, the μ length weight is set constant for the entire evolution process [9]. In the proposed approach the μ is scaled according to the previously obtained information.
After the second phase, the over(p) function can give some information about the features that caused the contour stoppage in each point. In the proposed approach, the contour length term is redefined to include a normalised sum of the texture features that did not meet the similarity condition (11) in a given point. The new term μ stop (p) is defined as: This term gives a different response in each of the image points. Its main idea is to generally discourage the contour from expanding to the areas with a large number of sufficiently different features. This penalty is also scaled according to the strength of the feature response function over(p). Larger values of over(p) will result in a greater penalty, while lower values will encourage the contour to grow slowly, but still in a noticeable way. An example of the benefit of this length control term is shown in Fig. 5. The μ stop (p) term is visualised in a form of a "stop map" (Fig. 5a) for each of the image points and shows a strong discrimination between the central texture patch and the target region. Without the influence of this term the method clearly fails (Fig. 5b). The provided φ heat maps and the plot along the marked line show that the improved length term helps to give a clear boundary between the regions.

Implementation concerns
The most computationally intensive parts of the method, like GLCM or Gabor feature generation and the second stage level set algorithm, were implemented using the GPU acceleration. In the optimal case, a GPU-running function (kernel) should be designed to work independently on its own part of the data and have the relatively small local memory requirements. In the case of the GLCM features, a local co-occurrence matrix has to be calculated in a window around each of the image pixels. To mitigate a potential memory problem, a quantisation is applied to reduce the matrix size. As for the Gabor features, the implementation uses a simple convolution of the original image with a given filter, which easily scales on a GPU. The second stage level set method is also well suited for the GPU, as each separate position on the level set function can be recalculated independently. The The whole algorithm is implemented using the MESA system [43] -a platform for designing and evaluation of the deformable model-based segmentation methods. While the platform uses the Java language, the GPU-accelerated algorithms were written in C using OpenCL.

Experimental evaluation
The proposed method was tested on synthetic images created using the Brodatz texture dataset, as well as on a set of natural images. The initial contours were manually placed inside the desired regions and scaled to the preferred size. Unless specified otherwise, only the sensitivity parameter θ was manually adjusted (between 4 and 12), while the other parameters were left on their default values. The first stage of the algorithm selected usually from 20 to 30 feature maps from 60 to 90 available.
The experiments were performed on a workstation with Intel Xeon E5-1620v2 CPU, 16 GB RAM, and Nvidia Titan Xp GPU. The total segmentation time was between 3 to 6 seconds for the analysed synthetic and natural image examples. As the implementation is heavily GPU-bound, other graphics cards were also tested.
The segmentation quality was assessed with five error measures: overlap error (OE), area difference (AD), average symmetric surface distance (ASSD), Hausdorff distance (HD), and Dice Coefficient (DC) [21]. Given two sets of image pixels S and G, where S is the tested result and G is the ground truth segmentation, the measures are defined as: DC and OE are popular overlap error measures. In contrast to these two, the AD measure does not factor the actual overlap of the sets but quantifies just their area difference. Together with OE and DC, however, it can indicate over-or undersegmentation of the results.
The ASSD takes into consideration the distances between the surfaces/borders of the sets (i.e. the voxels/pixel that have at least one background or edge point in their vicinity). For each point s G in the border set S(G) the function d(s G , S(R)) denotes the Euclidean distance from s G to the closest pixel in S(R). These distances are also symmetrically calculated from the border pixels of R to G. All distance values are then averaged, which defines the ASSD as: In a similar fashion, the HD is the longest distance between any two points from the borders of the segmentation and ground truth regions (i.e. the greatest of all the distances from a point on one border the closest point in the other). With the border sets S(G) and S(R), and d(g, r) as the distance between pixels g and r, the HD can be defined as:

Synthetic images
The first example (see Fig. 6) presents segmentation the results of synthetically composed images. The method was able to successfully segment textures of different scales and orientations, even when multiple patterns were present. It should be noted that a more careful placement and scaling of the initial contour was necessary only in the last example. The experiments were also performed with the feature pre-selection criterion from (9) disabled. Normally, about 50 to 70% of the initial feature maps were pre-selected. The omission of this operation did not have a real influence on the segmentation results but caused an about 30% increase in the segmentation time.
The second example in Fig. 7 illustrates the influence of the θ sensitivity parameter. The first case (Fig. 7a) shows the second stage result with θ set to the default value of 4, where  the contour exhibits some significant gaps, but the third stage provides an acceptable final result (Fig. 7b). The result of the second stage with θ = 7 is closer to the desired form, but in both of the cases, the final result is comparable. Generally, the θ parameter permits a wide range of values that can give an acceptable segmentation.
The third example contains a particularly difficult case (Fig. 8), where different types of features were necessary to segment the target region. On their own, the GLCM, Gabor and tensor features were not able to provide a strong boundary between all of four adjacent regions, but together they were able to perform a successful segmentation.
The next example compares the proposed method with a deep learning-based approach that uses fully convolutional networks for the task of image segmentation (FCNT), proposed by Andrearczyk and Whelan [2]. The experiments were performed on mosaics created from the Kylberg texture dataset [26], and the segmentation was compared with the results presented in [2]. Figure 9 contains selected outcomes of both algorithms, while complete quantitative quality comparison is presented in Table 1. The tested mosaics contain quite challenging patterns, but both methods are mostly able to provide reasonable segmentation, indicated by low VOE and RVD, as well as DC greater than 0.95 in many cases. The border-based metrics (ASSD and HD) show the biggest differences in favour of the proposed method. Some of the FCNT segmentations contain many disjointed patches outside of the target region, which are not present in the results of the proposed method. This is indicated by lower ASSD and significantly reduced HD (especially in Fig. 9d and Fig. 9e).
The last example (see Fig. 10) presents the influence of image noise on the performance of the proposed method. The ability to deal with noisy images is a desirable feature of robust segmentation methods, especially in medical applications [13], where input noise is common. The test image in the example was modified by the addition of Gaussian noise with increasing value of its standard deviation (σ from 45 to 85). Although the original   Figure 11 contains the segmentation results on various natural images. The results of the proposed algorithm were compared with two state-of-the-art texture segmentation methods: Dictionary-based Active Contour (DAC) [11] and Factorisation-based Active Contour (FAC) [16]. Both methods employ a region-based level set framework but take different approaches to texture feature extraction. DAC constructs a dictionary of texture patches present in the segmented image. Each image patch is then assigned to an element of the dictionary. Using the initial form of the contour, the pixels in the dictionary patches are labelled as being inside or outside of the contour. This labelling is then used to calculate the probability of being inside the target region for each image pixel. The level set evolution process expands the curve into regions where the probability is high and retracts it from low probability areas. The FAC method uses a local spectral histogram [28] as the texture descriptor. After the feature extraction, the image is modelled as a product of a matrix with the features values and a matrix of the features weights for the segmented regions. FAC assumes two regions: foreground and background. During the contour evolution, the level set framework uses a factorisation algorithm to minimise the differences between the model (features and weights matrices) and the original image (real values of the extracted features). The DAC and FAC results were obtained with MATLAB implementations shared by the authors. For the FAC, the parameters were set to 12 histogram bins and the integration scale of 20. The default circular initialisation was also used. As for the DAC method, the initialisation corresponded with the initial contours used by the presented method, with the suggested patch sizes from 9 to 15. The quantitative analysis of the segmentation quality is presented in Table 2.

Natural images
The first three natural images (Leopard, Zebra1, and Zebra2) are more bi-modal in nature. For the Leopard image, the proposed method performance is on par with the other two algorithms. All methods achieved the DC around 0.93 and the OE values were between In the next image (Eel) the scene is composed of multiple objects with different patterns. All methods were able to encompass most of the target region, with FAC also including a The last two images (Cheetah1 and Cheetah2) contain multiple texture patterns. Apart from the target object, there are at least two other background regions. The proposed method performed significantly better than FAC and DAC, especially in the case of the second image (Cheetah2). Since DAC and FAC target bi-modal images, the proposed method gave better results thanks to the employment of multiple features. As the previous examples show, the method can differentiate between distinct regions even without an explicit multi-phase formulation. It should be noted that there is also a multi-phase version of the DAC algorithm that could perform better in these conditions. However, due to the nature of the proposed method, comparison with other two-phase methods seemed more appropriate.

Time performance
Finally, the performance results of the method on GPUs of different classes are presented in Fig. 12. Four Nvidia cards were tested: low-end GT 1030, mid-range GT 1050 Ti, high-end Titan Xp, and server-grade Tesla P100 (please see Table 3 for specifications). The execution time is shown as a stacked chart of three stages of the algorithm. The tests were performed on three versions of the first synthetic image from Fig. 6. The image, originally of resolution 256×256, was also scaled to 128×128 and 512×512 pixels. The first stage of the algorithm was fixed to use GLCM features, since their extraction it the most computationally intensive. In the case of high-end cards, the provided example did not pose a significant challenge (less than 1, 2, and 6 s for the three test images). The CPU-based third stage took the most time for the larger image sizes. Two lower-end cards were the more realistic targets for a 2D segmentation task. In their case, the achieved times were still reasonable. Even on the low-end GT 1030, the execution took less than 10 s for the largest image and was below 1 s and 3 s for 128 × 128 and 256 × 256 images, respectively.
The proposed method benefits significantly from the GPU acceleration. For example, the level set evolution in the second stage executes in less than 0.5 s in most of the performed tests. In contrast, a CPU-based version can take from 4 to 8 s to finish on a 256 × 256 image and up to 25 s in the 512× 512 case. The GPU-based GLCM algorithm also exhibits similar improvements in execution time.

Discussion
The proposed method was compared against other active contour-based methods, as well as against a deep learning-based approach. Despite the fact that the presented algorithm essentially works in a two-phase mode, it can clearly perform well on multi-modal images with multiple textures. The relevant multiple texture features can complement each other in those situations, while the unused descriptors are not getting in the way of the final results. The GPU-based implementation provides a clear improvement in the runtime performance. Historically, algorithms like level sets and GLCM were notoriously computationally intensive, which limited their practical applicability. The experiments show that modern GPUs can compute relatively large banks of texture features and perform the "fast" level set evolution in sub-second time, which invalidates some previous restrictions. The GPUbased level set method in the second stage is driven by a speed function with a relatively small influence of the curvature, and the parameters of the evolution are tuned for possibly fast convergence. Furthermore, the computations are performed on the entire image domain. Typical optimisations aim to reduce the range of the level set function updates with sparse schemes [27], where only a part of the image domain is considered (e.g. only a narrow band around the level set zero-crossing is modified in each iteration). These modifications can fundamentally change the final results, while our approach is clearly efficient enough, at least in the 2D case.
The comparison with the FCNT, which uses fully convolutional networks, indicates that the proposed method can be competitive against current state-of-the-art approaches. Deep learning algorithms provides some advantages over active contours and level sets, like the lack of necessity for manual initialisation and interactive parameter tuning. On the other hand, the (often supervised) training stage is unavoidable and depends on the availability of prepared and annotated datasets.
The presented method employs a multi-stage approach that was firstly outlined in [42]. The original work used a snake-based ACM that employed a vastly different evolution scheme that considered only the closest neighbourhood of the curve discrete points. The proposed approach uses both local and global level set-based ACMs that are more robust in handling of multiple features. The topological restrictions of the parametric snake are also eliminated (see Fig. 7). Furthermore, the final refinement stage is not only different from the heavily heuristic approach of the snake version, but also uses a distinct, region-based level set model than the previous stage. This cooperation of those two models is one of the main features of the proposed method. The computational advantage of the discrete snake was also nullified with the GPU utilisation.

Conclusions and future works
In this paper, a multi-stage texture-based active contour has been presented. The proposed method combined both a classical and region-based level set contour formulations and smartly integrated multiple and varied texture features. The algorithm was carefully validated on synthetic and natural images and compared with the most similar among the available and accessible state-of-the-art approaches. The method was able to successfully segment various patterns in case of bi-and multi-modal images. It should be noticed that the presented approach is not tied to any particular texture descriptor and can be easily extended to integrate additional features. Moreover, it efficiently employs a large number of different kinds of descriptors thanks to the GPU acceleration.
We are aware that the current form of the method can be improved in many ways. Although the algorithm does not require any prior information about the texture classes in the image, it relies on manual contour initialisation. A more advanced feature selection scheme could also be employed in the first stage. As for the performance, the third ACWEbased stage would benefit greatly from GPU acceleration. Moreover, the general level set form of the algorithm could be extended into a multi-phase version that could segment multiple regions [3]. Now, we are working on an adaptation of the proposed segmentation method into 3D [18,27]. We are also considering assessing the method on biomedical images and more experimental comparisons with very popular, fully convolutional approaches [29,54]. Finally, we plan to tackle a more complex task of semantic segmentation.
Marek Kretowski received a joint Ph.D. degree in 2002 from the Faculty of Computer Science at the Bialystok University of Technology, Poland, and from the University of Rennes 1, France. He is a Professor in the Faculty of Computer Science at the Bialystok University of Technology, Poland. His research interests focus on the biomedical applications of computer science (modelling for image understanding, image analysis), bioinformatics, and data mining.