# Analysis of Vessel Connectivities in Retinal Images by Cortically Inspired Spectral Clustering

- First Online:

- Received:
- Accepted:

- 5 Citations
- 5.2k Downloads

## Abstract

Retinal images provide early signs of diabetic retinopathy, glaucoma, and hypertension. These signs can be investigated based on microaneurysms or smaller vessels. The diagnostic biomarkers are the change of vessel widths and angles especially at junctions, which are investigated using the vessel segmentation or tracking. Vessel paths may also be interrupted; crossings and bifurcations may be disconnected. This paper addresses a novel contextual method based on the geometry of the primary visual cortex (V1) to study these difficulties. We have analyzed the specific problems at junctions with a connectivity kernel obtained as the fundamental solution of the Fokker–Planck equation, which is usually used to represent the geometrical structure of multi-orientation cortical connectivity. Using the spectral clustering on a large local affinity matrix constructed by both the connectivity kernel and the feature of intensity, the vessels are identified successfully in a hierarchical topology each representing an individual perceptual unit.

### Keywords

Retinal image analysis Fokker–Planck equation Cortical connectivity Spectral clustering Gestalt## 1 Introduction

### 1.1 Clinical Importance of Retinal Blood Vessels

Epidemic growth of systemic, cardiovascular, and ophthalmologic diseases such as diabetes, hypertension, glaucoma, and arteriosclerosis [38, 48, 67], their high impact on the quality of life, and the substantial need for increase in health care resources [43, 68] indicate the importance of conducting large screening programs for early diagnosis and treatment of such diseases. This is impossible without using automated computer-aided systems because of the large population involved.

### 1.2 Vessel Extraction and Its Difficulties

The vasculature can be extracted by means of either pixel classification or vessel tracking. Several segmentation and tracking methods have been proposed in the literature [9, 26, 31]. In pixel classification approaches image pixels are labeled either as vessel or non-vessel pixels. Therefore, a vessel likelihood (soft segmentation) or binary map (hard segmentation) is created for the retinal image. Although the vessel locations are estimated in these approaches, they do not provide any information about vessel connectivities. On the contrary, in tracking-based approaches, several seed points are selected and the best connecting paths between them are found [2, 10, 12, 16, 17, 18, 34, 57, 58, 78]. The main benefit of vessel tracking approaches is that they work at the level of a single vessel rather than a single pixel and they try to find the best path that matches the vessel profile. Therefore, the information extracted from each vessel segment (e.g., diameter and tortuosity) is more accurate and reliable.

There are several difficulties for both vessel segmentation and tracking approaches. Depending on imaging technology and conditions, these images could be affected by noise in several degrees. Moreover, non-uniform luminosity, drift in image intensity, low contrast regions, and also central vessel reflex make the vessel detection and tracking complicated. Several image enhancement, normalization, and denoising techniques have been developed to tackle these complications (e.g., [1, 29, 50]).

The tracking methods are often performed exploiting the skeleton of the segmented images. Thus, non-perfect segmentation or wrong skeleton extraction results in topological tracing errors e.g., disconnections and non-complete subtrees as discussed in several methods proposed in the literature [2, 16, 17, 34, 44]. Typical non-perfections include missing small vessels, wrongly merged parallel vessels, disconnected or broken up vessel segments, and the presence of spur branches in thinning. Moreover, the greater difficulty arises at junctions and crossovers: small arteriovenous crossing angles, complex junctions when several junctions are close together, or presence of a bifurcation next to a crossing makes the centerline extraction and tracing challenging. These difficulties are mentioned as the tracking limitations in the literature. Some of these challenging cases are depicted in Fig. 1 with their corresponding artery/vein ground truth labels. Arteries and veins are annotated in red and blue colors, respectively. The green color represents the crossing and the types of the white vessels are not known.

### 1.3 Gestalt Theory and Cortically Inspired Spectral Clustering

Visual tasks like image segmentation and grouping can be explained with the theory of the Berliner Gestalt psychology, that proposed local and global laws to describe the properties of a visual stimulus [70, 73]. In particular, the laws of good continuation, closure, and proximity have a central role in the individuation of perceptual units in the visual space, see Fig. 2. In [54], perceptual grouping was considered to study the problem of finding curves in biomedical images. In order to study the property of good continuation, Field, Hayes, and Hess introduced in [27] the concept of an association field, that defines which properties the elements of the stimuli should have to be associated to the same perceptual unit, such as co-linearity and co-circularity. In [7], Bosking showed how the rules of association fields are implemented in the primary visual cortex (V1), where neurons with similar orientation are connected with long-range horizontal connectivity. A geometric model of the association fields based on the functional organization of V1 has been proposed in [13]. This geometric approach is part of the research line proposed by [37, 45, 49, 56, 63, 80] and applications to image processing can be found in [6, 23, 24].

In this work, a novel mathematical model based on this geometry has been applied to the analysis of retinal images to overcome the above-mentioned connectivity problems in vessel tracking. The proposed method represents an engineering application of segmenting and representing blood vessels inspired by the modeling of the visual cortex. This shows how these models can be applied to the analysis of medical images and how these two fields can be reciprocally used to better understand and reinforce each other.

This method, which is not dependent on centerline extraction, is based on the fact that in arteriovenous crossings there is a continuity in orientation and intensity of the artery and vein, respectively, i.e., the local variation of orientation and intensity of individual vessels is very low. The proposed method models the connectivity as the fundamental solution of the Fokker–Planck equation, which matches the statistical distribution of edge co-occurrence in natural images and is a good model of the cortical connectivity [60].

### 1.4 Paper Structure

## 2 Geometry of Primary Visual Cortex

### 2.1 Lifting of the Stimulus in the Cortical Space

In this section we recall the structure of the geometry of the primary visual cortex (V1). Hubel and Wiesel [42] first discovered that the visual cortex is organized in a hypercolumnar structure, where each point corresponds to a simple cell sensitive to a stimulus positioned in (*x*, *y*) and orientation \(\theta \). In other words, simple cells extract the orientation information at all locations and send a multi-orientation field to higher levels in the brain. Also, it is well known that objects with different orientations can be identified by the brain even when they are partly occluded, noisy, or interrupted [41].

Motivated by these findings, a new transformation was proposed [21, 22], to lift all elongated structures in 2D images to a new space of positions and orientations (\(R^2\times S^1\)) using elongated and oriented wavelets. By lifting the stimulus, multiple orientations per position could be detected. Thus, crossing and bifurcating lines are disentangled into separate layers corresponding to their orientations.

Constructing this higher-dimension structure simplifies the higher-level analysis. However, such representation is constrained and the invertibility of the transformation needs to be guaranteed using the right wavelet. Cake wavelets as a class of proper wavelets [21, 22, 33] satisfy this constraint and avoid loss of information. Cake wavelets are directional wavelets similar to Gabor wavelets and have a high response on oriented and elongated structures. Moreover, similar to Gabor wavelets, they have the quadratic property; so the real part contains information about the symmetric structures, e.g., ridges, and the imaginary part contains information about the antisymmetric structures, e.g., edges. Although blood vessels can have several scales, using the cake wavelets multi-scale analysis is not needed, because they capture the information at all scales. A visual comparison between Gabor and cake wavelets is presented in Fig. 3. As seen in this figure, using the cake wavelet in all orientations, the entire frequency domain is covered; while the Gabor wavelets, depending on their scale cover a limited portion of Fourier space. Therefore, they are scale dependent. The reader is referred to [5] for more detail.

*f*(

*x*,

*y*) is correlated with the anisotropic cake wavelet \(\psi \) [21, 22, 30]:

*f*(

*x*,

*y*) represents the image intensity at position (

*x*,

*y*) the stimulus is lifted to the extended 4-dimensional feature space:

### 2.2 The Connectivity Kernels

*v*the probability density to find a particle at the point (

*x*,

*y*) considering that it started from a given location \((x', y')\) and that it is moving with some known velocity. This probability density satisfies a deterministic equation known in literature as the Kolmogorov forward equation or Fokker–Planck equation:

### 2.3 Affinity Matrix

## 3 Spectral Analysis

The goal of clustering is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar to each other. The cognitive task of visual grouping can be considered as a form of clustering, with which it is possible to separate points in different groups according to their similarities. In order to perform visual grouping, we will use the spectral clustering algorithm. Traditional clustering algorithms, such as K-means, are not able to resolve this problem [51]. In recent years, different techniques have been presented to overcome the performance of the traditional algorithms, in particular spectral analysis techniques. It is widely known that these techniques can be used for data partitioning and image segmentation [47, 55, 64, 72] and they outperform the traditional approaches. Above that, they are simple to implement and can be solved efficiently by standard linear algebra methods [69]. In the next section we will describe the spectral clustering algorithm used in the numerical simulations of this paper.

### 3.1 Spectral Clustering Technique

*A*. This matrix contains information about the correct segmentation and will identify perceptual units in the scene, where the salient objects will correspond to the eigenvectors with the highest eigenvalues. Even though it works successfully in many examples, in [72] it has been demonstrated that this algorithm also can lead to clustering errors. In [47, 69] the algorithm is improved considering the normalized affinity matrix. In particular we will use the normalization described in [47]. Defining the diagonal matrix

*D*as formed by the sum of the edge weights (representing the degrees of the nodes, \(d_i = \sum _{j = 1}^{n} a_{ij}\)), the normalized affinity matrix is obtained as:

The important step is selecting the best value of K, which can be done by defining an a-priori significance threshold \(\epsilon \) for the decreasingly ordered eigenvalues \(\lambda _{i}\), so that \(\lambda _{i} > 1-\epsilon , \forall 1 \le i \le K\). However, selecting the best \(\epsilon \) value is not always trivial, and the clustering results get very sensitive to this parameter in many cases. Hence, considering the diffusion map approach of [15] and following the idea of [14], using an auxiliary diffusion parameter (\(\tau \), big positive integer value) to obtain the exponentiated spectrum \(\{\lambda _{i}^\tau \}_{i=1,\dots n}\), the gap between exponentiated eigenvalues increases and sensitivity to the threshold value decreases very much. Using this new spectrum, yields to the stochastic matrix \(P^\tau \), that represents the transition matrix of a random walk in defined \(\tau \) steps. The difference between thresholding the eigenvalues directly or the exponentiated spectrum is shown in an example in Fig. 5. As seen in this figure, selecting the best discriminative threshold value for the eigenvectors (Fig. 5c) is not easy, while with the exponentiated spectrum (Fig. 5d) the threshold value can be selected in a wide range (e.g., \(0.05\le 1-\epsilon \le 0.9\)). The value of \(\tau \) need to be selected as a large positive integer number (e.g., 150).

Possible neural implementations of the algorithm are discussed in [14]. Particularly, in [8, 25] an implementation of the spectral analysis is described as a mean-field neural computation. Principal eigenvectors emerge as symmetry breaking of the stationary solutions of mean-field equations. In addition, in [62] it is shown that in the presence of a visual stimulus the emerging eigenvectors are linked to visual perceptual units, obtained from a spectral clustering on excited connectivity kernels. In the next section the application of this algorithm in obtaining the vessel clustering in retinal images will be presented.

## 4 Experiments and Results

In this section, the steps proposed for analyzing the connectivities of blood vessels in retinal images and validating the method are described. In addition, the parameter settings and the obtained results are discussed in detail.

### 4.1 Proposed Technique

In order to prove the reliability of the method in retrieving the connectivity information in 2D retinal images, several challenging and problematic image patches around junctions were selected. First step before detecting the junctions and selecting the image patches around them, is to apply preconditioning on the green channel (*I*) of a color fundus retinal image. The green channel provides a higher contrast between vessels and background and it is widely used in retinal image analysis. The preconditioning includes: (a) removing the non-uniform luminosity and contrast variability using the method proposed by [29]; (b) removing the high frequency contents; and (c) denoising using the non-linear enhancement in SE(2) as proposed by [1]. A sample color image before and after preconditioning (\(I_\mathrm{enh}\)) are shown in Fig. 6a, b, respectively.

In next step, soft (\(I_\mathrm{soft}\)) and hard (\(I_\mathrm{hard}\)) segmentations are obtained using the BIMSO (biologically inspired multi-scale and multi-orientation) method for segmenting \(I_\mathrm{enh}\) as proposed by [1]. These images are shown in Fig. 6c, d, respectively. The hard segmentation is used for detecting the junctions and selecting several patches with different sizes around them; while soft segmentation is used later in connectivity analysis.

*s*/ 2. For these junctions, a new patch including both nearby junctions (with a size equal to three times the distance between them) is considered, and its center is used for finding the distance of this new patch with the other ones. These steps are repeated until no more merging is possible or the patch size reaches the maximum possible size (we assumed 100 as the maximum possible value). Thus, all nearby junctions are grouped in order to decrease the number of patches that overlap in a great extent. This results in having different patch sizes (\(0 \le s_{p_{i}}\le 100,1\le i\le m\)) that could include more than one junction all over the image. Figure 6f shows the junction locations and the corresponding selected patches overlaying on artery/vein ground truth.

In order to analyze the vessel connectivities for each image patch (\(I_{p_i}\)), we need to extract the location (*x*, *y*), orientation (\(\theta \)), and intensity (*f*(*x*, *y*)) of vessel pixels in these patches. Hence for each group of junctions (*i*) with the size \(s_i\), two patches from \(I_\mathrm{enh}\) and \(I_\mathrm{soft}\) are selected, called \(I_{\mathrm{enh},p_i}\) and \(I_{\mathrm{soft},p_i}\) respectively. Then \(I_{\mathrm{soft},p_i}\) is thresholded locally to obtain a new hard segmented image patch (called \(I_{\mathrm{hard},p_i}\)). This new segmented image patch is different from selecting the corresponding patch from \(I_\mathrm{hard}\), because \(I_\mathrm{hard}\) was obtained by thresholding the entire \(I_\mathrm{soft}\) using one global threshold value, but this is not appropriate at all regions. If there are regions with very small vessels with low contrast (often they get a very low probability of being vessel pixels), they are normally removed in the global thresholding approach. Accordingly, wrong thresholding leads into wrong tracking results e.g., C1, C2, C6 in Fig. 1 are some instance patches with missing small vessels. In this work, we selected one threshold value for each patch specifically using Otsu’s method [53], to keep more information and cover a wider range of vessel pixels. Consequently, thicker vessels will be created in \(I_{\mathrm{hard},p_i}\) and the results will be more accurate.

*x*,

*y*) other information could be extracted for these locations using \(I_{\mathrm{enh},p_i}\). So

*f*(

*x*,

*y*) equals the intensity value in \(I_{\mathrm{enh},p_i}\) at location (

*x*,

*y*). Moreover, by lifting \(I_{\mathrm{enh},p_i}\) using cake wavelets (see Eq. 1), at each location the angle corresponding to the maximum of the negative orientation response (real part) in the lifted domain is considered as the dominant orientation (\(\theta _d\)) as Eq. 11. The negative response is considered because the blood vessels in retinal images are darker than background.

*H*is the number of steps of the random path and \(\sigma \) is the diffusion constant (the propagation variance in the \(\theta \) direction). This finite difference equation is solved for

*n*(typically \(10^5\)) times, so

*n*paths are created. Then the estimated kernel is obtained by averaging all the solutions [36, 62]. An overview of different possible numerical methods to compute the kernel is explained in [79], where comparisons are done with the exact solutions derived in [19, 20, 23]. From these comparisons it follows that the stochastic Monte-Carlo implementation is a fair and accurate method. The intensity-based kernel (\(\omega _2\)), the final connectivity kernel (\(\omega _f\)), and the affinity matrix (

*A*), were calculated using Eqs. 7, 8, and 9, respectively. Finally, by applying the proposed spectral clustering step in Sect. 3, the final perceptual units (individual vessels) were obtained for each patch.

The above-mentioned steps for a sample crossing in a \(21\times 21\) image patch are presented in Fig. 7. After enhancing the image (Fig. 7a), obtaining soft segmentation (Fig. 7b) and thresholding it locally (Fig. 7c), the vessel locations, intensity, and orientation have been extracted. As shown in Fig. 7d arteries and veins have different intensities and this difference helps in discriminating between them. Though, orientation information is the most discriminative one. The lifted image in *SE*(2) using the \(\pi \)-periodic cake wavelets in 24 different orientations is shown in Fig. 7h. The disentanglement of two crossing vessels at the junction point can be seen clearly in this figure. The dominant orientations (\(\theta _d\)) for the vessel pixels are also depicted in Fig. 7e, using line segments oriented according to the corresponding orientation at each pixel.

In the next step, this contextual information (intensity and orientation) is used for calculating the connectivity kernel (Fig. 7i) and the affinity matrix (Fig. 7j) as mentioned in Sect. 2. For this numerical simulation, \(H, n, \sigma \), and \(\sigma _2\) have been set to 7, 100000, 0.05, and 0.1 respectively. Next, by applying the spectral clustering on the normalized affinity matrix using \(\epsilon \) and \(\tau \) as 0.1 and 150, only two eigenvalues above the threshold will remain (Fig. 7k). This means that there are two main salient perceptual units in this image as it was expected. These two units are color coded in Fig. 7f. The corresponding artery and vein labels are also depicted in Fig. 7g which approve the correctness of the obtained clustering results.

### 4.2 Validation

To validate the method, the proposed steps were applied on several image patches of the DRIVE [66] dataset. This public dataset contains 40 color images with a resolution of \(565\times 584\) (\(\sim \)25 \(\upmu \)m/px) and a \(45^\circ \) field of view. The selected patches from each image were manually categorized into the following groups: simple crossing (category A), simple bifurcation (category B), nearby parallel vessels with bifurcation (category C), bifurcation next to a crossing (category D), and multiple bifurcations (category E), and each category narrowed down to 20 image patches. These patches have different complexities, number of junctions and sizes and they could contain broken lines, missing small vessels and vessels with high curvature. The parameters used in the numerical simulation of the affinity matrix and spectral clustering step (including \(\sigma , H, n, \sigma , \sigma _2, \epsilon \), and \(\tau \)) are chosen for each patch differently, with the aim of achieving the optimal results for each case. Automatic parameter selection remains a challenging task and will be investigated in future work.

Some sample figures of these cases are depicted in Fig. 8. For each example, the original gray scale enhanced image, hard segmentation (locally thresholded), orientation and intensity information, and finally the clustering result together with artery/vein labels are depicted (Fig. 8a–f, respectively). Although the complexity of these patches is quite different in all cases, the salient groups are detected successfully. All the vessel pixels grouped as one unit have similarity in their orientations and intensities, and they follow the law of good continuation. Therefore, at each bifurcation or crossover point, two groups have been detected.

The parameters used in numerical simulation of the image patches shown in Fig. 8 and their corresponding sizes

Name | size |
| \(\sigma \) | \(\sigma _2\) |
---|---|---|---|---|

G1 | \(21\times 21\) | 7 | 0.02 | 0.3 |

G2 | \(21\times 21\) | 8 | 0.03 | 0.3 |

G3 | \(41\times 41\) | 10 | 0.03 | 0.1 |

G4 | \(39\times 39\) | 9 | 0.03 | 0.3 |

G5 | \(33\times 33\) | 8 | 0.03 | 0.3 |

G6 | \(51\times 51\) | 20 | 0.03 | 0.3 |

G7 | \(71\times 71\) | 17 | 0.07 | 0.3 |

G8 | \(73\times 73\) | 24 | 0.03 | 0.3 |

G9 | \(89\times 89\) | 30 | 0.03 | 0.3 |

G10 | \(97\times 97\) | 24 | 0.03 | 0.3 |

*H*and \(\sigma \) determine the shape of the kernel. Based on the experiments, the appropriate value for the number of steps of the random path generation is approximately 1 / 3 of the image width. Selecting this parameter correctly is very important in connecting the interrupted lines. The parameters \(\sigma \) and \(\sigma _2\) which determine the propagation variance in the \(\theta \) direction and the effect of the intensity-based similarity term do not have a large sensitivity to variation. To quantify this, the mean and variance of these two parameters for each of the above-mentioned categories are calculated and presented in Table 2. Since the selected patches have varying sizes and

*H*is dependent on that, this parameter is not presented in this table. Moreover, to evaluate the performance of the method, we introduced the correct detection rate (CDR) as the percentage of correctly grouped image patches for each category. These values are presented in Table 2. By considering higher number of image patches per category, the CDR values will be more realistic.

The correct detection rate and the mean and variance of \(\sigma \) and \(\sigma _2\) used in numerical simulation for each category

Category | CDR (%) | \(\sigma \) | \(\sigma _2\) | ||
---|---|---|---|---|---|

mean | variance | mean | variance | ||

A | 85 | 0.032 | 0.0001 | 0.28 | 0.0039 |

B | 95 | 0.033 | \(\simeq 0\) | 0.3 | \(\simeq 0\) |

C | 85 | 0.0269 | \(\simeq 0\) | 0.22 | 0.01 |

D | 75 | 0.035 | 0.00013 | 0.248 | 0.0125 |

E | 95 | 0.03 | \(\simeq 0\) | 0.3 | \(\simeq 0\) |

## 5 Conclusion and Future Work

In this work, we have presented a novel semi-automatic technique inspired by the geometry of the primary visual cortex to find and group different perceptual units in retinal images using spectral methods. Computing eigenvectors of affinity matrices, which are formed using the connectivity kernel, leads to the final grouping. The connectivity kernel represents the connectivity between all lifted points to the 4-dimensional feature space of positions, orientations, and intensities, and it presents a good model for the Gestalt law of good continuation. Thus, the perceptual units in retinal images are the individual blood vessels having low variation in their orientations and intensities.

The proposed method allows finding accurate junction positions, which is the position where two groups meet or cross each other. The main application of these connectivity analyses would be in modeling the retinal vasculature as a set of tree networks. The main graph constructed by these trees would be very informative in analyzing the topological behavior of retinal vasculature which is useful in diagnosis and prognosis of several diseases especially in automated application in large-scale screening programs.

The detection of small vessels highly depends on the quality of the soft segmentation, not the hard segmentation. These vessels could easily be differentiated from noise based on the size of the group. Noisy pixels have random orientations and intensities and they build smaller groups. Our method represents some limitations at blood vessels with high curvature. One possible solution is to merge the two detected perceptual units and form one unique unit, if there are no junctions at these locations. The other stronger extension is to use other kernels that take into account the curvature of structures in addition to positions and orientations. Moreover, it is also possible to enrich the affinity matrix with other terms e.g., the principle curvature of the multi-scale Hessian (ridgeness or vesselness similarity). All these solutions will be investigated in the future.

With this model we have analyzed many challenging cases, such as bifurcations, crossovers, small and disconnected vessels in retinal vessel segmentations. These cases not only have been reported to create tracing errors in the state-of-the-art techniques, but also are very informative for the clinical studies. Based on the results shown in the numerical simulations, the method is successful in detecting the salient groups in retinal images, and robust against noise, central vessel reflex, interruptions in vessel segments, presence of multiple junctions in a small area, and presence of nearby parallel vessels. For this reason, this can be considered as an excellent quantitative model for the constitution of perceptual units in retinal images. To the best of our knowledge, this is the first time that the vessel connectivities in such complex situations are solved by one single solution perfectly.

## Acknowledgments

This project has received funding from the European Union’s Seventh Framework Programme, Marie Curie Actions- Initial Training Network, under Grant Agreement No. 607643, “Metric Analysis For Emergent Technologies (MAnET)”. It was also supported by the Hé Programme of Innovation, which is partly financed by the Netherlands Organization for Scientific research (NWO) under Grant No. 629.001.003.

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.