1 Introduction

1.1 Clinical Importance of Retinal Blood Vessels

Epidemic growth of systemic, cardiovascular, and ophthalmologic diseases such as diabetes, hypertension, glaucoma, and arteriosclerosis [38, 48, 67], their high impact on the quality of life, and the substantial need for increase in health care resources [43, 68] indicate the importance of conducting large screening programs for early diagnosis and treatment of such diseases. This is impossible without using automated computer-aided systems because of the large population involved.

The retina is the only location where blood vessels can be directly monitored non-invasively in vivo [77]. Retinal vessels are connected and form a treelike structure. The local orientation and intensity of vessels change gradually along their lengths; however, these local properties may vary highly for different vessels. Studies show that the quantitative measurement of morphological and geometrical attributes of retinal vasculature, such as vessel diameter, tortuosity, arteriovenous ratio, and branching pattern and angles are very informative in early diagnosis and prognosis of several diseases [11, 28, 35, 40, 46, 65, 71]. More specifically, since two blood vessels with the same type never cross each other and arteries are more likely to cross over veins, studying the behavior of blood vessels at crossings for detecting the branch retinal vein occlusion (occlusion of the vein) and arteriovenous nicking helps in diagnosis of hypertension (increased arterial blood pressure), arteriosclerosis (hardening of arteries), and stroke [32, 75, 76]. The other clinically highly promising but still underrated information is based on studying the smaller vessels, because it is expected that the signs of diseases such as diabetic retinopathy appears in smaller vessels earlier than in larger ones.

Fig. 1
figure 1

A sample image selected from the DRIVE dataset [66] with centerlines extracted by applying the morphological skeletonization on the segmented image (top left), and its corresponding artery/vein ground truth from the RITE dataset [39] (top right). Several difficult cases are highlighted in this figure. C1 complex junction with the presence of a bifurcation and a crossing with a narrow crossing angle; C2 interrupted lines and missing small vessels; C3 high curvature vessel; C4 complex junction, a crossing and bifurcation, wrong thinning at crossing; C5 two nearby parallel vessels merged as one group; C6 missing small vessel, merged parallel vessels, and interrupted segment

1.2 Vessel Extraction and Its Difficulties

The vasculature can be extracted by means of either pixel classification or vessel tracking. Several segmentation and tracking methods have been proposed in the literature [9, 26, 31]. In pixel classification approaches image pixels are labeled either as vessel or non-vessel pixels. Therefore, a vessel likelihood (soft segmentation) or binary map (hard segmentation) is created for the retinal image. Although the vessel locations are estimated in these approaches, they do not provide any information about vessel connectivities. On the contrary, in tracking-based approaches, several seed points are selected and the best connecting paths between them are found [2, 10, 12, 1618, 34, 57, 58, 78]. The main benefit of vessel tracking approaches is that they work at the level of a single vessel rather than a single pixel and they try to find the best path that matches the vessel profile. Therefore, the information extracted from each vessel segment (e.g., diameter and tortuosity) is more accurate and reliable.

There are several difficulties for both vessel segmentation and tracking approaches. Depending on imaging technology and conditions, these images could be affected by noise in several degrees. Moreover, non-uniform luminosity, drift in image intensity, low contrast regions, and also central vessel reflex make the vessel detection and tracking complicated. Several image enhancement, normalization, and denoising techniques have been developed to tackle these complications (e.g., [1, 29, 50]).

The tracking methods are often performed exploiting the skeleton of the segmented images. Thus, non-perfect segmentation or wrong skeleton extraction results in topological tracing errors e.g., disconnections and non-complete subtrees as discussed in several methods proposed in the literature [2, 16, 17, 34, 44]. Typical non-perfections include missing small vessels, wrongly merged parallel vessels, disconnected or broken up vessel segments, and the presence of spur branches in thinning. Moreover, the greater difficulty arises at junctions and crossovers: small arteriovenous crossing angles, complex junctions when several junctions are close together, or presence of a bifurcation next to a crossing makes the centerline extraction and tracing challenging. These difficulties are mentioned as the tracking limitations in the literature. Some of these challenging cases are depicted in Fig. 1 with their corresponding artery/vein ground truth labels. Arteries and veins are annotated in red and blue colors, respectively. The green color represents the crossing and the types of the white vessels are not known.

1.3 Gestalt Theory and Cortically Inspired Spectral Clustering

Visual tasks like image segmentation and grouping can be explained with the theory of the Berliner Gestalt psychology, that proposed local and global laws to describe the properties of a visual stimulus [70, 73]. In particular, the laws of good continuation, closure, and proximity have a central role in the individuation of perceptual units in the visual space, see Fig. 2. In [54], perceptual grouping was considered to study the problem of finding curves in biomedical images. In order to study the property of good continuation, Field, Hayes, and Hess introduced in [27] the concept of an association field, that defines which properties the elements of the stimuli should have to be associated to the same perceptual unit, such as co-linearity and co-circularity. In [7], Bosking showed how the rules of association fields are implemented in the primary visual cortex (V1), where neurons with similar orientation are connected with long-range horizontal connectivity. A geometric model of the association fields based on the functional organization of V1 has been proposed in [13]. This geometric approach is part of the research line proposed by [37, 45, 49, 56, 63, 80] and applications to image processing can be found in [6, 23, 24].

In this work, a novel mathematical model based on this geometry has been applied to the analysis of retinal images to overcome the above-mentioned connectivity problems in vessel tracking. The proposed method represents an engineering application of segmenting and representing blood vessels inspired by the modeling of the visual cortex. This shows how these models can be applied to the analysis of medical images and how these two fields can be reciprocally used to better understand and reinforce each other.

This method, which is not dependent on centerline extraction, is based on the fact that in arteriovenous crossings there is a continuity in orientation and intensity of the artery and vein, respectively, i.e., the local variation of orientation and intensity of individual vessels is very low. The proposed method models the connectivity as the fundamental solution of the Fokker–Planck equation, which matches the statistical distribution of edge co-occurrence in natural images and is a good model of the cortical connectivity [60].

Starting from this connectivity kernel and considering the Euclidean distance between intensities of blood vessels, we build the normalized affinity matrix. Since at crossings the vessels have different intensities (types), including this feature in the construction of the affinity matrix adds more discriminative information. This spectral approach, first proposed for image processing, is inspired by [15, 55, 64, 72]. Moreover, recent results of [62] show how the spectral analysis could actually be implemented by the neural population dynamics of the primary visual cortex (V1). Finally, we use a spectral clustering algorithm to find and group the eigenvectors linked to the highest eigenvalues of the affinity matrix. We will describe how these groups represent different perceptual units (vessels) in retinal images. Originally, the spectral clustering was exploited for good continuation, closure, and proximity, see Fig. 2. It excelled in finding connections, e.g., broken vessel segments. In this paper, we go further by solving many challenges at vessel crossings and bifurcations.

Fig. 2
figure 2

Examples of a good continuation, b closure, and c proximity Gestalt laws

1.4 Paper Structure

The remainder of the article is organized as follows. In Sect. 2, after describing the geometry of the functional architecture of the primary visual cortex, it is explained in detail how to lift the stimulus in cortical space using a specific wavelet transform. This lifting converts the image from the space of positions (\(R^2\)) to the joint space of positions and orientations (\(R^2\times S^1\)). Then the connectivity kernel and the construction of the affinity matrix based on this kernel and the intensity is described. Next step is the spectral clustering algorithm, that is used to extract the grouping information of the stimuli, explained in Sect. 3. The experiments applied on retinal images are explained step by step and the results are presented in Sect. 4. Finally, the paper is concluded in Sect. 5 by briefly summarizing the proposed method, discussing its strengths in preserving the connectivities in the retinal vasculature network and proposing potential improvements as future work.

Fig. 3
figure 3

Visual comparison between cake and Gabor wavelets (top and bottom rows, respectively). a and b real and imaginary parts of the wavelets in spatial domain, c the wavelet in Fourier domain, and d the illustration of Fourier space coverage by wavelets in 36 orientations

2 Geometry of Primary Visual Cortex

2.1 Lifting of the Stimulus in the Cortical Space

In this section we recall the structure of the geometry of the primary visual cortex (V1). Hubel and Wiesel [42] first discovered that the visual cortex is organized in a hypercolumnar structure, where each point corresponds to a simple cell sensitive to a stimulus positioned in (xy) and orientation \(\theta \). In other words, simple cells extract the orientation information at all locations and send a multi-orientation field to higher levels in the brain. Also, it is well known that objects with different orientations can be identified by the brain even when they are partly occluded, noisy, or interrupted [41].

Motivated by these findings, a new transformation was proposed [21, 22], to lift all elongated structures in 2D images to a new space of positions and orientations (\(R^2\times S^1\)) using elongated and oriented wavelets. By lifting the stimulus, multiple orientations per position could be detected. Thus, crossing and bifurcating lines are disentangled into separate layers corresponding to their orientations.

Constructing this higher-dimension structure simplifies the higher-level analysis. However, such representation is constrained and the invertibility of the transformation needs to be guaranteed using the right wavelet. Cake wavelets as a class of proper wavelets [21, 22, 33] satisfy this constraint and avoid loss of information. Cake wavelets are directional wavelets similar to Gabor wavelets and have a high response on oriented and elongated structures. Moreover, similar to Gabor wavelets, they have the quadratic property; so the real part contains information about the symmetric structures, e.g., ridges, and the imaginary part contains information about the antisymmetric structures, e.g., edges. Although blood vessels can have several scales, using the cake wavelets multi-scale analysis is not needed, because they capture the information at all scales. A visual comparison between Gabor and cake wavelets is presented in Fig. 3. As seen in this figure, using the cake wavelet in all orientations, the entire frequency domain is covered; while the Gabor wavelets, depending on their scale cover a limited portion of Fourier space. Therefore, they are scale dependent. The reader is referred to [5] for more detail.

For lifting the stimulus and constructing the higher-dimension representation, \(U_f: SE(2) \rightarrow \mathbb {C}\), the input image f(xy) is correlated with the anisotropic cake wavelet \(\psi \) [21, 22, 30]:

$$\begin{aligned} \begin{aligned} U_f(\mathbf {x},\theta )&=(\mathcal {W}_\psi [f])(\mathbf {x},\theta ) =(\overline{R_\theta (\psi )} \star f)(\mathbf {x}) \\&=\int _{\mathbb {R}^2} \overline{\psi (R^{-1}_\theta (\mathbf {y}-\mathbf {x}))}f(\mathbf {y})\mathrm {d}\mathbf {y}, \end{aligned} \end{aligned}$$

where \(R_\theta \) is the 2D counter-clockwise rotation matrix and the overline denotes the complex conjugate. Using this transformation and considering only one orientation per position (the orientation with highest transformation response), the points of a curve \(\gamma = (x,y)\) are lifted to new cortical curves and are described in the space \((x,y,\theta )\):

$$\begin{aligned} (x,y) \rightarrow (x,y,\theta ). \end{aligned}$$

These curves have been modeled in [13] as integral curves of suitable vector fields:

$$\begin{aligned} {{\mathbf {X}}}_1 = (\cos \theta ,\sin \theta ,0), \quad {{\mathbf {X}}}_2 = (0,0,1). \end{aligned}$$

The points of the lifted curves are connected by integral curves of these two vector fields such that:

$$\begin{aligned} \gamma : \mathbb {R}\rightarrow SE(2), \qquad \gamma (s)=(x(s),y(s),\theta (s)) \end{aligned}$$
$$\begin{aligned} \gamma '(s)=(k_1(s){{\mathbf {X}}}_1 + k_2(s){{\mathbf {X}}}_2)(\gamma (s)),\quad \gamma (0)=0. \end{aligned}$$

These curves projected on the 2D cortical plane represent a good model of the association fields, as described in [13] (see Fig. 4).

Fig. 4
figure 4

a The integral curves of the vector fields \(X_1\) and \(X_2\) in the \((x,y,\theta \)) space. In blue the projections of the integral curves on the xy plane. b The distribution of the integral curves, modeling the connectivity between points

In order to include the intensity term, we use the Euclidean distance between the intensities of two corresponding points. If f(xy) represents the image intensity at position (xy) the stimulus is lifted to the extended 4-dimensional feature space:

$$\begin{aligned} (x,y,\theta ) \rightarrow (x,y,\theta ,f(x,y)). \end{aligned}$$

An admissible curve in this space is defined as the solution of the following differential equation:

$$\begin{aligned} \gamma '(s) = (k_1(s){{\mathbf {X}}}_1 + k_2(s){{\mathbf {X}}}_2 + k_4(s)\mathbf {X}_4)(\gamma (s)) \end{aligned}$$
$$\begin{aligned} \gamma (0) = (x_1,y_1,\theta _1,f_1) , \qquad \gamma (1) = (x_2,y_2,\theta _2,f_2), \end{aligned}$$

where the vector fields are:

$$\begin{aligned} {{\mathbf {X}}}_1 = (\cos \theta ,\sin \theta ,0,0), \quad {{\mathbf {X}}}_2 = (0,0,1,0), \end{aligned}$$
$$\begin{aligned} {{\mathbf {X}}}_4 = (0,0,0,1) \end{aligned}$$

and the coefficients \(k_1\) and \(k_2\) represent a distance in the \((x,y,\theta )\) domain and \(k_4\) is a Euclidean distance. Starting from these vector fields we can model the cortical connectivity.

2.2 The Connectivity Kernels

The cortical connectivity can be modeled as the probability of connecting two points in the cortex and is represented by the stochastic counterpart of the curves in Eq. 3:

$$\begin{aligned} (x',y',\theta ') = {{\mathbf {X}}}_1 + N\left( 0,\sigma ^2\right) {{\mathbf {X}}}_2, \end{aligned}$$

where \(N(0,\sigma ^2)\) is a normally distributed variable with zero mean and variance equal to \(\sigma ^2\).

This process, first described in [49], is discussed in [3, 4, 61, 74]. We denote v the probability density to find a particle at the point (xy) considering that it started from a given location \((x', y')\) and that it is moving with some known velocity. This probability density satisfies a deterministic equation known in literature as the Kolmogorov forward equation or Fokker–Planck equation:

$$\begin{aligned} \partial _tv=X_1v +\sigma ^2X_{22}v, \end{aligned}$$

where \(X_1\) is the directional derivative \(\cos (\theta )\partial _x + \sin (\theta )\partial _y\) and \(X_{22}= \partial _{\theta \theta }\) is the second-order derivative.

This equation has been largely used in different fields [3, 4, 23, 74]. In [61] its stationary counterpart was proposed to model the probability of co-occurrence of contours in natural images. The Fokker–Planck operator has a nonnegative fundamental solution \(\varGamma _1\) that satisfies:

$$\begin{aligned} \begin{aligned}&X_1\varGamma _1((x,y,\theta ),(x',y',\theta ')) + \sigma ^2X_{22}\varGamma _1((x,y,\theta ),(x',y',\theta '))\\&\quad =\delta (x,y,\theta ) \end{aligned} \end{aligned}$$

which is not symmetric. The connectivity kernel \(\omega _1\) obtained by symmetrization of the Fokker–Planck fundamental solution is:

$$\begin{aligned} \begin{aligned} \omega _1((x,y,\theta ),(x',y',\theta '))&= \frac{1}{2}(\varGamma _1((x,y,\theta ),(x',y',\theta '))\\&\quad +\varGamma _1((x',y',\theta '),(x,y,\theta )). \end{aligned} \end{aligned}$$

In order to measure the distances between intensities we introduce the kernel \(\omega _2((x_i,y_i),(x_j,y_j))\). This new intensity kernel is obtained as:

$$\begin{aligned} \omega _2(f_i,f_j) = e^{\left( -\frac{1}{2}\left( \frac{f_i-f_j}{\sigma _2}\right) \right) ^2} \end{aligned}$$

The final connectivity kernel can be written as the product (as these are probabilities) of the two components:

$$\begin{aligned} \begin{aligned}&\omega _f((x_i,y_i,\theta _i,f_i),(x_j,y_j,\theta _j,f_j)) \\&\quad = \omega _1((x_i,y_i,\theta _i),(x_j,y_j,\theta _j))\omega _2(f_i,f_j). \end{aligned} \end{aligned}$$

2.3 Affinity Matrix

Starting from the connectivity kernel defined previously, it is possible to extract perceptual units from images by means of spectral analysis of suitable affinity matrices. The eigenvectors with the highest eigenvalues are linked to the most salient objects in the scene [55]. The connectivity is represented by a real symmetric matrix \(A_{i,j}\):

$$\begin{aligned} A_{i,j} = \omega _f((x_i,y_i,\theta _i,f_i), (x_j,y_j,\theta _j,f_j)) \end{aligned}$$

that contains the connectivity information between all the lifted points. The eigenvectors of the affinity matrix are interpreted as perceptual units [62].

3 Spectral Analysis

The goal of clustering is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar to each other. The cognitive task of visual grouping can be considered as a form of clustering, with which it is possible to separate points in different groups according to their similarities. In order to perform visual grouping, we will use the spectral clustering algorithm. Traditional clustering algorithms, such as K-means, are not able to resolve this problem [51]. In recent years, different techniques have been presented to overcome the performance of the traditional algorithms, in particular spectral analysis techniques. It is widely known that these techniques can be used for data partitioning and image segmentation [47, 55, 64, 72] and they outperform the traditional approaches. Above that, they are simple to implement and can be solved efficiently by standard linear algebra methods [69]. In the next section we will describe the spectral clustering algorithm used in the numerical simulations of this paper.

3.1 Spectral Clustering Technique

Different algorithms based on the theory of graphs have been proposed to perform clustering. In [55] it has been shown how the edge weights \(\{a_{ij}\}_{i,j=1,...n}\) of a weighted graph describe an affinity matrix A. This matrix contains information about the correct segmentation and will identify perceptual units in the scene, where the salient objects will correspond to the eigenvectors with the highest eigenvalues. Even though it works successfully in many examples, in [72] it has been demonstrated that this algorithm also can lead to clustering errors. In [47, 69] the algorithm is improved considering the normalized affinity matrix. In particular we will use the normalization described in [47]. Defining the diagonal matrix D as formed by the sum of the edge weights (representing the degrees of the nodes, \(d_i = \sum _{j = 1}^{n} a_{ij}\)), the normalized affinity matrix is obtained as:

$$\begin{aligned} P = D^{-1}A \end{aligned}$$

This stochastic matrix P represents the transition probability of a random walk in a graph. It has real eigenvalues \(\{\lambda _{j}\}_{j=1,...n}\) where \(0\le \lambda _{j}\le 1\), and its eigenvectors \(\{u_{i}\}_{i=1,...K}\), related to the K largest eigenvalues \(\lambda _{1}\ge \lambda _{2}\ge ...\ge \lambda _{K}\), represent a solution of the clustering problem [69]. The value of K determines the number of eigenvalues and eigenvectors considered informative.

The important step is selecting the best value of K, which can be done by defining an a-priori significance threshold \(\epsilon \) for the decreasingly ordered eigenvalues \(\lambda _{i}\), so that \(\lambda _{i} > 1-\epsilon , \forall 1 \le i \le K\). However, selecting the best \(\epsilon \) value is not always trivial, and the clustering results get very sensitive to this parameter in many cases. Hence, considering the diffusion map approach of [15] and following the idea of [14], using an auxiliary diffusion parameter (\(\tau \), big positive integer value) to obtain the exponentiated spectrum \(\{\lambda _{i}^\tau \}_{i=1,\dots n}\), the gap between exponentiated eigenvalues increases and sensitivity to the threshold value decreases very much. Using this new spectrum, yields to the stochastic matrix \(P^\tau \), that represents the transition matrix of a random walk in defined \(\tau \) steps. The difference between thresholding the eigenvalues directly or the exponentiated spectrum is shown in an example in Fig. 5. As seen in this figure, selecting the best discriminative threshold value for the eigenvectors (Fig. 5c) is not easy, while with the exponentiated spectrum (Fig. 5d) the threshold value can be selected in a wide range (e.g., \(0.05\le 1-\epsilon \le 0.9\)). The value of \(\tau \) need to be selected as a large positive integer number (e.g., 150).

After selecting the value of K, the number of clusters is automatically determined using Algorithm 1.

figure e
Fig. 5
figure 5

a A sample image patch at a crossing, b its affinity matrix (A) built upon a connectivity measure, c the eigenvalues (\(\lambda _i,{i=1,...n}\)) of the normalized affinity matrix (P) and the threshold value (\(1-\epsilon =0.7\)), represented in red. d The exponentiated spectrum (\(\lambda ^\tau _i,{i=1,...n}\)) with \(\tau =150\) and threshold value of 0.7 in red

Possible neural implementations of the algorithm are discussed in [14]. Particularly, in [8, 25] an implementation of the spectral analysis is described as a mean-field neural computation. Principal eigenvectors emerge as symmetry breaking of the stationary solutions of mean-field equations. In addition, in [62] it is shown that in the presence of a visual stimulus the emerging eigenvectors are linked to visual perceptual units, obtained from a spectral clustering on excited connectivity kernels. In the next section the application of this algorithm in obtaining the vessel clustering in retinal images will be presented.

4 Experiments and Results

In this section, the steps proposed for analyzing the connectivities of blood vessels in retinal images and validating the method are described. In addition, the parameter settings and the obtained results are discussed in detail.

4.1 Proposed Technique

In order to prove the reliability of the method in retrieving the connectivity information in 2D retinal images, several challenging and problematic image patches around junctions were selected. First step before detecting the junctions and selecting the image patches around them, is to apply preconditioning on the green channel (I) of a color fundus retinal image. The green channel provides a higher contrast between vessels and background and it is widely used in retinal image analysis. The preconditioning includes: (a) removing the non-uniform luminosity and contrast variability using the method proposed by [29]; (b) removing the high frequency contents; and (c) denoising using the non-linear enhancement in SE(2) as proposed by [1]. A sample color image before and after preconditioning (\(I_\mathrm{enh}\)) are shown in Fig. 6a, b, respectively.

In next step, soft (\(I_\mathrm{soft}\)) and hard (\(I_\mathrm{hard}\)) segmentations are obtained using the BIMSO (biologically inspired multi-scale and multi-orientation) method for segmenting \(I_\mathrm{enh}\) as proposed by [1]. These images are shown in Fig. 6c, d, respectively. The hard segmentation is used for detecting the junctions and selecting several patches with different sizes around them; while soft segmentation is used later in connectivity analysis.

In order to find the junction locations automatically, the skeleton of \(I_\mathrm{hard}\) is produced using the morphological skeletonization technique. Then the method proposed by [52] is applied on this skeleton and the junction locations are determined as shown in Fig. 6e. Using the determined locations, several image patches with similar sizes (\(s=10\) pixels) are selected at first stage. However, as seen in Fig. 6e, some of the junctions are very close to each other and their distances are smaller than s / 2. For these junctions, a new patch including both nearby junctions (with a size equal to three times the distance between them) is considered, and its center is used for finding the distance of this new patch with the other ones. These steps are repeated until no more merging is possible or the patch size reaches the maximum possible size (we assumed 100 as the maximum possible value). Thus, all nearby junctions are grouped in order to decrease the number of patches that overlap in a great extent. This results in having different patch sizes (\(0 \le s_{p_{i}}\le 100,1\le i\le m\)) that could include more than one junction all over the image. Figure 6f shows the junction locations and the corresponding selected patches overlaying on artery/vein ground truth.

Fig. 6
figure 6

The different steps applied for selecting several image patches around junctions, a original RGB image, b enhanced image (\(I_\mathrm{enh}\)), c soft segmentation (\(I_\mathrm{soft}\)), d hard segmentation (\(I_\mathrm{hard}\)), e detected junctions and the skeleton of the segmentation overlaid on color image, f selected patches overlaid on artery/vein ground truth

In order to analyze the vessel connectivities for each image patch (\(I_{p_i}\)), we need to extract the location (xy), orientation (\(\theta \)), and intensity (f(xy)) of vessel pixels in these patches. Hence for each group of junctions (i) with the size \(s_i\), two patches from \(I_\mathrm{enh}\) and \(I_\mathrm{soft}\) are selected, called \(I_{\mathrm{enh},p_i}\) and \(I_{\mathrm{soft},p_i}\) respectively. Then \(I_{\mathrm{soft},p_i}\) is thresholded locally to obtain a new hard segmented image patch (called \(I_{\mathrm{hard},p_i}\)). This new segmented image patch is different from selecting the corresponding patch from \(I_\mathrm{hard}\), because \(I_\mathrm{hard}\) was obtained by thresholding the entire \(I_\mathrm{soft}\) using one global threshold value, but this is not appropriate at all regions. If there are regions with very small vessels with low contrast (often they get a very low probability of being vessel pixels), they are normally removed in the global thresholding approach. Accordingly, wrong thresholding leads into wrong tracking results e.g., C1, C2, C6 in Fig. 1 are some instance patches with missing small vessels. In this work, we selected one threshold value for each patch specifically using Otsu’s method [53], to keep more information and cover a wider range of vessel pixels. Consequently, thicker vessels will be created in \(I_{\mathrm{hard},p_i}\) and the results will be more accurate.

By knowing the vessel locations (xy) other information could be extracted for these locations using \(I_{\mathrm{enh},p_i}\). So f(xy) equals the intensity value in \(I_{\mathrm{enh},p_i}\) at location (xy). Moreover, by lifting \(I_{\mathrm{enh},p_i}\) using cake wavelets (see Eq. 1), at each location the angle corresponding to the maximum of the negative orientation response (real part) in the lifted domain is considered as the dominant orientation (\(\theta _d\)) as Eq. 11. The negative response is considered because the blood vessels in retinal images are darker than background.

$$\begin{aligned} \theta _d = {{\mathrm{arg\,max}}}_{\theta \in [0,\pi ]} \mathrm{Re}(-U_f(x,y,\theta )) \end{aligned}$$
Fig. 7
figure 7

A sample \(21\times 21\) image patch at a crossing: a \(I_{\mathrm{enh},p_i}\), b \(I_{\mathrm{soft},p_i}\), c \(I_{\mathrm{hard},p_i}\), d the differences in intensities are shown in color, e each oriented line represents the orientation at its position, f final perceptual units shown in different colors, g the ground truth artery and vein labels, h lifted image in SE(2), i connectivity kernel (\(\omega _2\)), j affinity matrix (A) obtained using both orientation and intensity information, k thresholding the eigenvalues of the normalized affinity matrix

Fig. 8
figure 8

Sample image patches selected from the DRIVE dataset. Columns from left to right present the image patch at the green channel, segmented image, extracted orientation and intensity, clustering result, and the artery/vein labels

Next step is approximating the connectivity kernels. The first kernel (\(\omega _1\)), was calculated numerically. So the fundamental solution \(\varGamma _1\) was estimated using the Markov Chain Monte-Carlo method [59] by developing random paths based on the numerical solution of Eq. 6. This solution can be approximated by:

$$\begin{aligned} {\left\{ \begin{array}{ll} x_{s+\varDelta {s}}-x_s = \varDelta {s}\cos (\theta )\\ y_{s+\varDelta {s}}-y_s = \varDelta {s}\sin (\theta )\\ \theta _{s+\varDelta {s}} - \theta _s=\varDelta {s}N(0,\sigma ), \end{array}\right. } s\in 0, \dots , H, \end{aligned}$$

where H is the number of steps of the random path and \(\sigma \) is the diffusion constant (the propagation variance in the \(\theta \) direction). This finite difference equation is solved for n (typically \(10^5\)) times, so n paths are created. Then the estimated kernel is obtained by averaging all the solutions [36, 62]. An overview of different possible numerical methods to compute the kernel is explained in [79], where comparisons are done with the exact solutions derived in [19, 20, 23]. From these comparisons it follows that the stochastic Monte-Carlo implementation is a fair and accurate method. The intensity-based kernel (\(\omega _2\)), the final connectivity kernel (\(\omega _f\)), and the affinity matrix (A), were calculated using Eqs. 78, and 9, respectively. Finally, by applying the proposed spectral clustering step in Sect. 3, the final perceptual units (individual vessels) were obtained for each patch.

The above-mentioned steps for a sample crossing in a \(21\times 21\) image patch are presented in Fig. 7. After enhancing the image (Fig. 7a), obtaining soft segmentation (Fig. 7b) and thresholding it locally (Fig. 7c), the vessel locations, intensity, and orientation have been extracted. As shown in Fig. 7d arteries and veins have different intensities and this difference helps in discriminating between them. Though, orientation information is the most discriminative one. The lifted image in SE(2) using the \(\pi \)-periodic cake wavelets in 24 different orientations is shown in Fig. 7h. The disentanglement of two crossing vessels at the junction point can be seen clearly in this figure. The dominant orientations (\(\theta _d\)) for the vessel pixels are also depicted in Fig. 7e, using line segments oriented according to the corresponding orientation at each pixel.

In the next step, this contextual information (intensity and orientation) is used for calculating the connectivity kernel (Fig. 7i) and the affinity matrix (Fig. 7j) as mentioned in Sect. 2. For this numerical simulation, \(H, n, \sigma \), and \(\sigma _2\) have been set to 7, 100000, 0.05, and 0.1 respectively. Next, by applying the spectral clustering on the normalized affinity matrix using \(\epsilon \) and \(\tau \) as 0.1 and 150, only two eigenvalues above the threshold will remain (Fig. 7k). This means that there are two main salient perceptual units in this image as it was expected. These two units are color coded in Fig. 7f. The corresponding artery and vein labels are also depicted in Fig. 7g which approve the correctness of the obtained clustering results.

4.2 Validation

To validate the method, the proposed steps were applied on several image patches of the DRIVE [66] dataset. This public dataset contains 40 color images with a resolution of \(565\times 584\) (\(\sim \)25 \(\upmu \)m/px) and a \(45^\circ \) field of view. The selected patches from each image were manually categorized into the following groups: simple crossing (category A), simple bifurcation (category B), nearby parallel vessels with bifurcation (category C), bifurcation next to a crossing (category D), and multiple bifurcations (category E), and each category narrowed down to 20 image patches. These patches have different complexities, number of junctions and sizes and they could contain broken lines, missing small vessels and vessels with high curvature. The parameters used in the numerical simulation of the affinity matrix and spectral clustering step (including \(\sigma , H, n, \sigma , \sigma _2, \epsilon \), and \(\tau \)) are chosen for each patch differently, with the aim of achieving the optimal results for each case. Automatic parameter selection remains a challenging task and will be investigated in future work.

Some sample figures of these cases are depicted in Fig. 8. For each example, the original gray scale enhanced image, hard segmentation (locally thresholded), orientation and intensity information, and finally the clustering result together with artery/vein labels are depicted (Fig. 8a–f, respectively). Although the complexity of these patches is quite different in all cases, the salient groups are detected successfully. All the vessel pixels grouped as one unit have similarity in their orientations and intensities, and they follow the law of good continuation. Therefore, at each bifurcation or crossover point, two groups have been detected.

In this figure, G1 is a good example of a crossing with a small angle. The method not only differentiates well between vessels crossing each other even with a small crossing angle, but it also determines the order of vessels, being at the bottom or passing over in crossover regions. The image patch in G2 is a good example showing the strength of the method in detecting small vessels. The detected small vessel in this image is even not annotated in the artery/vein ground truth. However, this detection is highly dependent on the soft-segmented image and the threshold value used for obtaining the hard segmentation. If the small vessel is not detected in the soft segmentation or if a high threshold value is selected, then it also will not be available in the final result. Other cases in this figure are good representations of the robustness of the method against the presence of a central vessel reflex (as in G3), interrupted lines (as in G10), or even noise (as in G9). In G9, noisy pixels are detected as individual units which are not similar to the other groups. They can be differentiated from others based on their sizes. If there are very few pixels in one group, then it can be considered as noise and removed. There are also several cases with complex junctions in this figure. Presence of multiple bifurcations in one image patch, or presence of several bifurcations close to the crossing points does not lead to wrong grouping results (as seen in G5, G6, G7, G8, and G10).

Table 1 The parameters used in numerical simulation of the image patches shown in Fig. 8 and their corresponding sizes

The parameters used during the numerical simulations of the image patches shown in Fig. 8 and their corresponding sizes are presented in Table 1. For all experiments the values of \(n, \epsilon \) and \(\tau \) were set to 100000, 0.1 and 150, respectively and they remained constant. The key parameters which are very effective in the final results are \(H, \sigma \), and \(\sigma _2\). H and \(\sigma \) determine the shape of the kernel. Based on the experiments, the appropriate value for the number of steps of the random path generation is approximately 1 / 3 of the image width. Selecting this parameter correctly is very important in connecting the interrupted lines. The parameters \(\sigma \) and \(\sigma _2\) which determine the propagation variance in the \(\theta \) direction and the effect of the intensity-based similarity term do not have a large sensitivity to variation. To quantify this, the mean and variance of these two parameters for each of the above-mentioned categories are calculated and presented in Table 2. Since the selected patches have varying sizes and H is dependent on that, this parameter is not presented in this table. Moreover, to evaluate the performance of the method, we introduced the correct detection rate (CDR) as the percentage of correctly grouped image patches for each category. These values are presented in Table 2. By considering higher number of image patches per category, the CDR values will be more realistic.

Table 2 The correct detection rate and the mean and variance of \(\sigma \) and \(\sigma _2\) used in numerical simulation for each category

If there are some high curvature vessels, then depending on their curvature increasing \(\sigma \) might help in preserving the continuity of the vessel. As an example, G4 in Fig. 8 is relatively more curved compared to the other cases, but the clustering works perfectly in this case. However, for some cases it does not solve the problem totally, and other kernels need to be considered for preserving the continuity. An example \(49\times 49\) image patch with a highly curved vessel is shown in Fig. 9, where the method fails in clustering the vessels correctly. The parameters used for this case are \(H=16, \sigma =0.03\), and \(\sigma _2 = 0.3\).

Fig. 9
figure 9

Failure of clustering in the presence of highly curved vessels. Columns from left to right: a enhanced image; b its segmentation; c intensity; d orientation information; e clustering results; and f the artery/vein labels

Even though the intensities of arteries and veins in the gray scale enhanced image are very close to each other in some images, adding the intensity term in calculating the final affinity matrix is crucial. By decreasing the value of \(\sigma _2\), the distance between intensities gets a higher value and it helps in differentiating better between the groups. Figure 10 represents a sample \(67\times 67\) image patch, which includes two nearby parallel vessels with similar orientations. Figure 10e, f show the correct and wrong clustering results obtained by changing \(\sigma _2\) from 0.3 to 1. All other parameters have not changed (\(H=24, \sigma =0.02\) and \(n=100000\)). The other important difference between these two results is that the noisy pixels close to the thicker vessel have been totally removed in the correct result. Although they seem to be oriented with the thick vessel their intensities are totally different. Therefore, by increasing the effect of intensity, they are clustered as several small groups and removed in the final step of the spectral clustering algorithm.

Fig. 10
figure 10

The effect of including intensity term in calculating the connectivity kernel. Columns from left to right: a enhanced image; b its segmentation; c intensity; d orientation information; e correct; f wrong clustering results; and g the artery/vein labels

5 Conclusion and Future Work

In this work, we have presented a novel semi-automatic technique inspired by the geometry of the primary visual cortex to find and group different perceptual units in retinal images using spectral methods. Computing eigenvectors of affinity matrices, which are formed using the connectivity kernel, leads to the final grouping. The connectivity kernel represents the connectivity between all lifted points to the 4-dimensional feature space of positions, orientations, and intensities, and it presents a good model for the Gestalt law of good continuation. Thus, the perceptual units in retinal images are the individual blood vessels having low variation in their orientations and intensities.

The proposed method allows finding accurate junction positions, which is the position where two groups meet or cross each other. The main application of these connectivity analyses would be in modeling the retinal vasculature as a set of tree networks. The main graph constructed by these trees would be very informative in analyzing the topological behavior of retinal vasculature which is useful in diagnosis and prognosis of several diseases especially in automated application in large-scale screening programs.

The detection of small vessels highly depends on the quality of the soft segmentation, not the hard segmentation. These vessels could easily be differentiated from noise based on the size of the group. Noisy pixels have random orientations and intensities and they build smaller groups. Our method represents some limitations at blood vessels with high curvature. One possible solution is to merge the two detected perceptual units and form one unique unit, if there are no junctions at these locations. The other stronger extension is to use other kernels that take into account the curvature of structures in addition to positions and orientations. Moreover, it is also possible to enrich the affinity matrix with other terms e.g., the principle curvature of the multi-scale Hessian (ridgeness or vesselness similarity). All these solutions will be investigated in the future.

With this model we have analyzed many challenging cases, such as bifurcations, crossovers, small and disconnected vessels in retinal vessel segmentations. These cases not only have been reported to create tracing errors in the state-of-the-art techniques, but also are very informative for the clinical studies. Based on the results shown in the numerical simulations, the method is successful in detecting the salient groups in retinal images, and robust against noise, central vessel reflex, interruptions in vessel segments, presence of multiple junctions in a small area, and presence of nearby parallel vessels. For this reason, this can be considered as an excellent quantitative model for the constitution of perceptual units in retinal images. To the best of our knowledge, this is the first time that the vessel connectivities in such complex situations are solved by one single solution perfectly.