1 Introduction

Anomaly detection is the task of spotting items that do not conform to the expected pattern of the data. In the case of images, it usually refers to the problem of spotting pixels showing a peculiar spectral signature when compared to all other pixels in an image. Image anomaly detection is considered one of the most interesting and crucial tasks for many high-level image- and video-based applications, e.g., surveillance, environmental monitoring, and medical analysis [16].

One of the most used and widely validated techniques for anomaly detection is Reed–Xiaoli detector, often called RX detector for short [56], which is the most known example of covariance-based anomaly detectors. This class of detectors has found wide adoption in many domains, from hyperspectral [49] to medical images [65]; however, methods of this type suffer from crucial drawbacks, most noticeably the need for covariance estimation and inversion. Many situations exist where the drawbacks of these state-of-the-art anomaly detectors lead to poor and unreliable results [67]. Moreover, the operations required by those techniques are is computationally expensive [12]. For all these reasons, the research for a fast and reliable image anomaly detection strategy able to overcome the limitations of covariance-based anomaly detectors deserves further efforts.

In this paper, we use graphs to tackle image anomaly detection. Graphs are proved to be natural tools to represent data in many domains, e.g., recommendation systems, social networks, or protein interaction systems [18]. Recently, they have found wide adoption also in computer vision and image processing communities, thanks to their ability to intuitively model relations between pixels. Graph-based approaches have been proposed to this date to solve a wide variety of image processing tasks, e.g., edge detection [6], gradient estimation [55], and segmentation [9, 59]. In particular, spectral graph theory has been recently bridged with signal processing, where the graph is used to model local relations between signal samples [57, 60]. As an example, graph-based signal processing is emerging as a novel approach in the design of energy compacting image transformations [27, 28, 39, 64, 70].

To this date, graph-based approaches have not been proposed for image anomaly detection, although many techniques for anomaly detection on generic graphs have been explored in the literature [2]. Those techniques cannot be straightforwardly extended to images since they usually exploit anomalies in the topology of the graph to extract knowledge about the data [18]. On the other hand, in the image case the graph topology is constrained to the pixel grid, whereas different weights are assigned to edges connecting pixels depending on their similarity or correlation.

Our proposed approach uses an undirected weighted graph to model the expected behavior of the data and then computes the distance of each pixel in the image from the model. We propose to use a graph to model spectral or both spectral and spatial correlation. The main contribution of this paper is a novel anomaly detection approach which exploits spectral graph theory to overcome one of the well-known limitations of RX detector and other covariance-based anomaly detectors, i.e., the need to estimate and invert a covariance matrix. Estimation of the covariance may be very critical in the presence of a small sample size; moreover, inverting such a matrix is also a complex, badly conditioned and unstable operation [40]. Our novel anomaly detector estimates the statistic of the background using a graph Laplacian matrix. Also, the graph model used by our approach is abstract and flexible enough to be tailored to any prior knowledge of the data possibly available. The effectiveness of our methodological contributions is shown in two use cases: a typical hyperspectral anomaly detection experiment and a novel application for tumor detection in 3D biomedical images.

The paper is organized as follows: We will first give a brief overview of RX detector and the graph Fourier transform in Sect. 2 and go over some related work in Sect. 3, and then we will present our technique in Sect. 4; we will then evaluate the performance of our technique and compare our results with those yielded by algorithms in the state of the art both visually and objectively in Sect. 5, and we will discuss these results in Sect. 6; finally, conclusions will be drawn in Sect. 7.

2 Background

Anomaly detection refers to a particular class of target detection problems, namely the ones where no prior information about the target is available. In this scenario, supervised approaches that try to find pixels which match reference spectral characteristics (e.g., [24, 42]) cannot usually be employed. This extends also to supervised deep learning or other data-driven approaches, which attempt to learn a parametric model from a set of labeled data. Although deep learning methods have found increasingly wide adoption for many other tasks in image processing and computer vision [15, 35, 71], their application to anomaly detection—especially on hyperspectral and medical imaging—is stifled by multiple factors: First, pixels have to be considered anomalous according to intra-image metrics which are difficult to capture in a dataset; second, the amount of data required to train the models is not often available in these contexts [11, 44]. For these reasons, classical unsupervised approaches are preferable instead. These algorithms detect anomalous or peculiar pixels showing high spectral distance from their surrounding [20]. To this end, the typical strategy is to extract knowledge of the background statistics from the data and then measure the deviation of each examined pixel from the learned knowledge according to some affinity function.

2.1 Reed–Xiaoli detector

The best known and most widely employed algorithm for anomaly detection is \(\hbox {Reed--Xiaoli detector (RXD)}\) by Reed and Yu [56]. To this date, it is still used as a benchmark algorithm for many anomaly detection applications [5, 20, 48, 51]. \(\hbox {RXD}\) assumes the background to be characterized by a non-stationary multivariate Gaussian model, estimated by the image mean and covariance. Then, it measures the squared Mahalanobis distance [47] of each pixel from the estimated background model. Pixels showing distance values over a set threshold are assessed to be anomalous.

Formally, \(\hbox {RXD}\) works as follows. Consider an image \(\mathbf{I} = [\mathbf{x}_{1}\mathbf{x}_{2}\ldots \mathbf{x}_{N}]\) consisting of N pixels, where the column vector \(\mathbf{x}_{i} = [x_{i1} x_{i2} \ldots x_{im}]^T\) represents the value of the ith pixel over the m channels (or spectral bands) of \(\mathbf{I}\). The expected behavior of background pixels can be captured by the mean vector \(\hat{\varvec{\mu }}\) and covariance matrix \(\widehat{\mathbf{C}}\) which are estimated as follows:

$$\begin{aligned} \hat{\varvec{\mu }}= \frac{1}{N} \sum _{i=1}^N \mathbf{x}_{i} \text{, } \text{ and } \widehat{\mathbf{C}}= \frac{1}{N} \sum _{i=1}^N \overline{\mathbf{x}}_{i}\overline{\mathbf{x}}_{i}^T, \end{aligned}$$

where \(\overline{\mathbf{x}}_{i} = (\mathbf{x}_{i} - \hat{\varvec{\mu }})\).

Mean vector and covariance matrix are computed under the assumption that vectors \(\mathbf{x}_{i}\) are observations of the same random process; it is usually possible to make this assumption as the anomaly is small enough to have a negligible impact on the estimate [12].

Then, the generalized likelihood of a pixel \(\mathbf{x}_{}\) to be anomalous with respect to the model \(\widehat{\mathbf{C}}\) is expressed in terms of the square of the Mahalanobis distance [47], as follows:

$$\begin{aligned} \delta _{{RXD}}(\mathbf{x}_{}) \; = \; \overline{\mathbf{x}}_{}^T \, \widehat{\mathbf{Q}}\, \overline{\mathbf{x}}_{} , \end{aligned}$$

where \(\widehat{\mathbf{Q}}= \widehat{\mathbf{C}}^{-1}\), i.e., the inverse of the covariance matrix, also known in the literature as the precision matrix.

Finally, a decision threshold \(\eta \) is usually employed to confirm or refuse the anomaly hypothesis. A common approach is to set \(\eta \) adaptively as a percentage of \(\delta _{{RXD}}\) dynamic range as follows:

$$\begin{aligned} \eta = t \cdot p \max _{i = 1,\dots , N}(\delta _{{RXD}}(\mathbf{x}_{i})) , \end{aligned}$$

with \(t \in [0, 1]\). Then, if \(\delta _{{RXD}}(\mathbf{x}_{}) \ge \eta \), the pixel \(\mathbf{x}_{}\) is considered anomalous.

An interesting property of \(\hbox {RXD}\) has been observed by Chang and Heinz in [14]. In that work, the authors demonstrated how \(\hbox {RXD}\) can be considered an inverse operation of the \(\hbox {principal component analysis (PCA)}\).

More precisely, let us assume that \(\kappa _1 \ge \kappa _2 \ge \cdots \ge \kappa _m\) are the eigenvalues of the \(m \times m\) covariance matrix \(\widehat{\mathbf{C}}\) and \(\{\mathbf{v}_1, \mathbf{v}_2, \ldots , \mathbf{v}_m\}\) is its set of unit eigenvectors with \(\mathbf{v}_j\) corresponding to \(\kappa _j\). We can then form the matrix \(\mathbf{V} = [\mathbf{v}_1 \mathbf{v}_2 \ldots \mathbf{v}_m]\) with the jth column specified by \(\mathbf{v}_j\). \(\mathbf{V}\) can 1.5ptbe used to decorrelate the signal by diagonalizing \(\widehat{\mathbf{C}}\) into the diagonal matrix \(\mathbf{K}\) whose jth diagonal element is \(\kappa _j\), such that \(\mathbf{V}^T \widehat{\mathbf{C}}\mathbf{V} = \mathbf{K}\) and \(\mathbf{V}^T \widehat{\mathbf{Q}}\mathbf{V} = \mathbf{K}^{-1}\). Then, we can compute \(\mathbf{y}_{} = \mathbf{V}^T \overline{\mathbf{x}}_{}\), which is known as the Karhunen–Loève transform (KLT). Data dimensionality reduction via \(\hbox {PCA}\) usually involves the computation of \(\mathbf{y}_{}\) using just the first \(p \ll m\) columns of \(\mathbf{V}\). As shown in [14], (2) can be expressed as a function of \(\mathbf{y}_{}\) as follows:

$$\begin{aligned} \delta _{{RXD}}(\mathbf{x}_{})= & {} \overline{\mathbf{x}}_{}^T \, \widehat{\mathbf{Q}}\, \overline{\mathbf{x}}_{} \nonumber \\= & {} (\mathbf{V}\mathbf{y}_{})^T \, \widehat{\mathbf{Q}}\, (\mathbf{V}\mathbf{y}_{})\nonumber \\= & {} \mathbf{y}_{}^T \, (\mathbf{V}^T \widehat{\mathbf{Q}}\mathbf{V}) \, \mathbf{y}_{}\nonumber \\= & {} \mathbf{y}_{}^T \mathbf{K}^{-1} \mathbf{y}_{}\nonumber \\= & {} \sum _{j=1}^{m} \kappa _j^{-1} y_{j}^2 , \end{aligned}$$

where \(y_j\) represents the jth element of the KLT vector \(\mathbf{y}_{}\).

From this formulation, one can notice that \(\hbox {RXD}\) detects targets with small energies that are represented by small eigenvalues. This is because, according to (4), the smaller the eigenvalue, the greater its contribution to the value of \(\delta _{{RXD}}\). This is reasonable, since if an anomalous small target is present in the image, it will not be visible in the principal components, but it is rather going to appear in smaller components [12]. However, when seeing \(\hbox {RXD}\) in this form, it is quite evident that the last components, which are those containing mostly noise, are actually weighted the most. To improve the result of \(\hbox {RXD}\), a value \(p \ll m\) can be determined [38]. Then, the eigenvalues beyond the first (greater) p will be considered to represent components containing only noise and will be discarded. We then obtain a de-noised version of \(\hbox {RXD}\) that can be expressed as follows:

$$\begin{aligned} \delta ^p_{{RXD}}(\mathbf{x}_{}) = \sum _{j=1}^{p} \kappa _j^{-1} y_{j}^2 . \end{aligned}$$

Obviously, \(\delta ^m_{{RXD}} = \delta _{{RXD}}\).

The issue of determining p was addressed in [13, 38] and is closely related to the problem of determining the \(\hbox {intrinsic dimensionality (ID)}\) of the image signal. Empirically, p is usually set such that a desired percentage \(\psi \in [0,1]\) of the original image cumulative energy content is retained. The cumulative energy content of the first p principal components of an image \(\mathbf{I} = [\mathbf{x}_{1}\mathbf{x}_{2}\ldots \mathbf{x}_{N}]\) can be expressed in terms of the image’s \(\hbox {KLT}\) transform \(\mathbf{Y} = \mathbf{V}^T\overline{\mathbf{I}} = [\mathbf{y}_{1}\mathbf{y}_{2}\ldots \mathbf{y}_{N}]\) where \(\overline{\mathbf{I}} = [\overline{\mathbf{x}}_{1}\overline{\mathbf{x}}_{2}\ldots \overline{\mathbf{x}}_{N}]\) as

$$\begin{aligned} e(\mathbf{I},p) = \sum _{i=1}^{N}\sum _{j=1}^{p} y^2_{ij} , \end{aligned}$$

where \(y_{ij}\) is the jth element of the vector \(\mathbf{y}_{i}\). We then choose the smallest \(p \in [1,m]\), such that \(e(\mathbf{I},p)/e(\mathbf{I},m) \le \psi \). Commonly for dimensionality reduction applications \(\psi = 0.9\), but for anomaly detection purposes that value might be too low, given we do not want to risk to lose the anomaly. In this case, \(\psi = 0.99\) is usually more appropriate.

2.2 Graph Fourier transform

In recent years, the growing interest in graph-based signal processing [58] has stimulated the study of graph-based transform approaches. These methodologies map the image content onto a topological graph where nodes represent pixel intensities and edges model relations between nodes, e.g., according to a criterion based on correlation or other similarity measures. The Fourier transform can be generalized to graphs obtaining the so-called \(\hbox {graph Fourier transform (GFT)}\) [57].

Consider an undirected, weighted graph \({\mathcal {G}} = ({\mathcal {V}}, {\mathcal {E}})\) composed of a vertex set \({\mathcal {V}}\) of order n and an edge set \({\mathcal {E}}\) specified by \((a, b, w_{ab})\), where \(a, b \in {\mathcal {V}}\), and \(w_{ab} \in {\mathbb {R}}^+\) is the edge weight between vertices a and b. Thus, a weighted graph can be described by its adjacency matrix \(\mathbf{W}\) where \(\mathbf{W}(a,b) = w_{ab}\). A graph signal is a mapping that assigns a value to each vertex, denoted as \(\mathbf{s}_{} = [s_1 s_2\ldots s_n]^T\).

Typically, when computing the \(\hbox {GFT}\) a graph is constructed to capture the inter-pixel correlation and is used to compute the optimal decorrelating transform leveraging on spectral graph theory [60]. From the adjacency (also called weight) matrix \(\mathbf{W}\), the combinatorial graph Laplacian matrix \(\mathbf{L} = \mathbf{D}-\mathbf{W}\) can be computed, where \(\mathbf{D}\) is the degree matrix: a diagonal matrix whose ath diagonal element is equal to the sum of the weights of all edges incident to the node a. Formally,

$$\begin{aligned} \mathbf{D}(a,b) = {\left\{ \begin{array}{ll} \mathop \sum \nolimits ^n_{k=1} w_{ak} &{} \text{ if } a = b,\\ 0 &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$$

In some scenarios, it is useful to normalize weights in the Laplacian matrix; in those cases, the use of the symmetric normalized Laplacian matrix \(\mathbf{L}^{sym}\) is preferred. It is defined as

$$\begin{aligned} \mathbf{L}^{sym} = \mathbf{D}^{-\frac{1}{2}} \mathbf{L} \mathbf{D}^{-\frac{1}{2}} . \end{aligned}$$

\(\mathbf{L}^{sym}\) has important properties, i.e., its eigenvalues are always real, nonnegative, and bounded into the range [0, 2]; for these reasons, the spectrum of a symmetric normalized Laplacian relates well to other graph invariants for general graphs in a way that other definitions fail to do [18].

Any Laplacian matrix \(\mathbf{L}\) is a symmetric positive semi-definitive matrix with eigendecomposition:

$$\begin{aligned} \mathbf{L}=\mathbf{U} \varvec{\Lambda } \mathbf{U}^T , \end{aligned}$$

where \(\mathbf{U}\) is the matrix whose columns are the eigenvectors of \(\mathbf{L}\) and \(\varvec{\Lambda }\) is the diagonal matrix whose diagonal elements are the corresponding eigenvalues. The matrix \(\mathbf{U}\) is used to compute the \(\hbox {GFT}\) of a signal \(\mathbf{s}_{}\) as:

$$\begin{aligned} \widetilde{\mathbf{s}}_{} =\mathbf{U}^T \mathbf{s}_{} . \end{aligned}$$

The inverse \(\hbox {GFT}\) is then given by

$$\begin{aligned} \mathbf{s}_{}=\mathbf{U} \widetilde{\mathbf{s}}_{} . \end{aligned}$$

When computing the \(\hbox {GFT}\), the eigenvalues in \(\varvec{\Lambda }\) are usually sorted for increasing magnitude, the first eigenvalue being equal to zero [57], i.e., \(0 = \lambda _1 \le \lambda _2 \le \cdots \le \lambda _m\). The eigenvectors in \(\mathbf{U}\) are sorted accordingly.

3 Related work

Despite its popularity, \(\hbox {RXD}\) has recognized drawbacks that undermine its performance in some applications. For a full discussion over the limitations of \(\hbox {RXD}\), we suggest [12, 67]; however, they can be summarized in the following:

  1. 1.

    \(\hbox {RXD}\) involves a high-dimensional covariance matrix that needs to be estimated and inverted, often under a small sample size [5, 40]. Those are unstable, highly complex, and badly conditioned operations;

  2. 2.

    \(\hbox {RXD}\) often suffers from high \(\hbox {false positive rate (FPR)}\) [5, 34, 51];

  3. 3.

    \(\hbox {RXD}\) assumes that the background follows a multivariate Gaussian model, but there are cases in which this assumption might not be adequate, e.g., in the case of multiple materials and textures [5, 12, 21, 34];

  4. 4.

    \(\hbox {RXD}\) lacks spatial awareness: Every pixel is evaluated individually extrapolated from its context [31].

To address these issues, recent works have iterated over \(\hbox {RXD}\)’s idea, e.g., by considering subspace features [22, 62], by using kernels to go beyond the Gaussian assumption [21, 41], by applying dimensionality reduction [33], by improving how the background statistics are estimated [20, 50], or by exploiting sparsity and compress sensing theory [23, 26, 72]. In this work, we generalize \(\hbox {RXD}\)’s idea by looking at it from the point of view of spectral graph theory. This not only makes us able to avoid costly covariance matrix inversions, but also allows us to incorporate spatial information and any prior knowledge about the background model into the detector. Previous work trying to including spatial awareness in the detector is available in the literature; a noteworthy example is \(\hbox {whitening spatial correlation filtering (WSCF)}\) [31], where the authors propose to apply a whitening transformation based on the eigendecomposition of the image covariance matrix. On the whitened space, \(\hbox {RXD}\) is represented by the Euclidean norm. Then, by using an approach based on constrained energy minimization, \(\hbox {WSCF}\) spots anomalous pixels by estimating consistency to their neighborhood in the whitened space. We compare our proposed approach to \(\hbox {WSCF}\) in the experimental section.

Although prior research targeting anomaly detection in graphs exists, it mostly focuses on anomalies in a graph structure, and not on graph signals [2, 18]. For example, in the context of behavioral monitoring and intelligence, the structure of social graphs can be analyzed to spot subgraphs expressing patterns deviating from the rest of the network [52]. However, in images, the structure of the graph is fixed to a grid, and the application of graph-based anomaly detection algorithms coming from other domains is not straightforward; even in works where peculiarities in the graph signal are under observation, structure is included as part of the signal, as for example in [25] where a signal function of the physical distance between wireless sensors is proposed. The effectiveness of these approaches to images has not been reported yet.

Our proposed graph-based approach is founded on two recent findings: First, Zhang and Florêncio [70] have shown that a Laplacian model can be used as an estimation of the precision matrix \(\mathbf{Q}\) of an image, under the assumption that the image follows a \(\hbox {Gaussian Markov random field (GMRF)}\) model. This amounts to using a function of the partial correlation between nodes as graph weights. Second, it has been demonstrated how the \(\hbox {GFT}\) can be considered an approximation of the \(\hbox {KLT}\) for graph signals [39]. Recent literature in spectral graph theory has exploited this relationship to provide novel graph-based solutions to classical signal processing problems, in particular for image compression where the use of the \(\hbox {GFT}\) has been proposed as an alternative to the \(\hbox {discrete cosine transform (DCT)}\) [17, 27, 28, 39]. This relationship is, however, never been explored in the context of image anomaly detection, which motivated us to study it in this work.

4 Method

In this work, we exploit the analogy between \(\hbox {KLT}\) and \(\hbox {GFT}\) in the framework of anomaly detection. In the \(\hbox {GFT}\) definition, the role of the covariance matrix in the \(\hbox {KLT}\) is taken by the graph Laplacian. It turns out that \(\mathbf{L}\) can be exploited also in the inverse problem of anomaly detection according to (4). We here propose a novel algorithm for image anomaly detection, which we will refer to as \(\hbox {Laplacian anomaly detector (LAD)}\). \(\hbox {LAD}\) overcomes some of the known limitations of \(\hbox {RXD}\) exposed in Sect. 2.1: It can be used to avoid problematic covariance matrix estimate and inversion, and it is able to include spatial information as well as a priori knowledge, when available.

4.1 Construction of the graph model

Given an image \(\mathbf{I}\) composed of N pixels and having m spectral bands or channels, we first build an undirected graph \({\mathcal {G}} = ({\mathcal {V}}, {\mathcal {E}})\) to serve as the model for the background pixels in the image. The graph is used to model local relations between pixel values and can be constructed to capture spectral and spatial characteristics. Topology and weights of the graph have to be chosen accordingly to the domain. We will discuss some general construction strategies in Sects. 4.3 and 4.4. The chosen graph will be described by a weight matrix \(\mathbf{W}\), from which a Laplacian matrix \(\mathbf{L}\) will be computed according to the procedure detailed in Sect. 2.2. The use of the symmetric normalized Laplacian, constructed as in (8), in place of the unnormalized combinatorial one is to be preferred for the reasons expressed in Sect. 2.2. Also, \(\mathbf{L}^{sym}\) is proved to be preferable in similar domains, e.g., segmentation and classification [7, 29].

4.2 Graph-based anomaly detection

Given a pixel \(\mathbf{x}_{}\), we define a corresponding graph signal \(\mathbf{s}_{}\), e.g., describing the spectral bands of \(\mathbf{x}_{}\) or its spatial neighborhood, and compute the distance of \(\mathbf{x}_{}\) from the model as

$$\begin{aligned} \delta _{{LAD}}(\mathbf{x}_{})= & {} \mathbf{s}_{}^T \, \mathbf{L} \, \mathbf{s}_{}\nonumber \\= & {} (\mathbf{U}\widetilde{\mathbf{s}}_{})^T \, \mathbf{L} \, (\mathbf{U}\widetilde{\mathbf{s}}_{})\nonumber \\= & {} \widetilde{\mathbf{s}}_{}^T \, (\mathbf{U}^T \mathbf{L} \mathbf{U}) \, \widetilde{\mathbf{s}}_{}\nonumber \\= & {} \widetilde{\mathbf{s}}_{}^T \, \varvec{\Lambda } \, \widetilde{\mathbf{s}}_{}\nonumber \\= & {} \sum _{j=1}^{m} \lambda _j \, {\widetilde{s}}_{j}^2 , \end{aligned}$$

where \({\widetilde{s}}_j\) represents the jth element of the \(\hbox {GFT}\) vector \(\widetilde{\mathbf{s}}_{}\), and \(\mathbf{U}\) and \(\varvec{\Lambda }\) refer to the eigenvector and eigenvalue matrices used for the eigendecomposition of \(\mathbf{L}\) in (9). Although this formulation might look similar to the one of \(\hbox {RXD}\) given in (4), some important differences have to be noted. First, the model used is not the inverse of the covariance matrix \(\widehat{\mathbf{C}}^{-1}\), but an arbitrary Laplacian model; this is a generalization over \(\hbox {RXD}\), because if the image follows a \(\hbox {GMRF}\) model, then a Laplacian can be constructed to estimate the precision matrix [70], but if this is not the case a Laplacian model can be computed according to any knowledge of the domain. Second, the Laplacian matrix can be used to capture both spatial and spectral characteristics as we will detail in Sect. 4.4. Another thing to notice is that in (12) each contribution \({\widetilde{s}}_j\) is multiplied by \(\lambda _j\), whereas in \(\hbox {RXD}\) each \(y_j\) was instead divided by the corresponding eigenvalue \(\kappa _j\).

As already discussed for \(\hbox {RXD}\), we can also use a de-noised version of the \(\hbox {GFT}\) where only the first smaller \(p \ll m\) eigenvectors are kept, removing the higher and noisier frequencies and obtaining the following:

$$\begin{aligned} \delta ^p_{{LAD}}(\mathbf{x}_{}) = \sum _{j=1}^{p} \lambda _j {\widetilde{s}}_{j}^2 . \end{aligned}$$

The parameter p is determined accordingly to the percentage of retained cumulative energy, following the approach presented in Sect. 2.1.

Finally, a decision threshold over \(\delta _{{LAD}}\) is needed to determine if a pixel is anomalous or not. An approach similar to the one described in Sect. 2.1 can be employed.

Fig. 1
figure 1

Example of 3-band graph connectivity: the spectral components are fully connected, while spatially pixels are 4-connected

4.3 Spectral graph model

As already mentioned, the graph model is used to characterize the typical behavior around the pixel being tested for anomaly. As in the case of standard \(\hbox {RXD}\), the graph can be employed to model only the spectral relations: In this case, the vertex set \({\mathcal {V}}\) consists of m nodes, each representing one of the spectral bands of \(\mathbf{I}\); then, we connect each pair of nodes (bands) with an edge, obtaining a fully connected graph. An example of this topology for a 3-band image is shown in Fig. 1a. A weight is then assigned to each edge: If some a priori knowledge about inter-band correlation is available, it can be used to set weights accordingly; if this is not the case, a possibility is to use the image data to estimate the weights. Also, for each pixel \(\mathbf{x}_{}\), the graph signal \(\mathbf{s}_{}\) will contain exactly the value of that pixel over the m bands, after removing the mean; thus, \(\mathbf{s}_{} = \overline{\mathbf{x}}_{}\).

Under the assumption that the image follows a \(\hbox {GMRF}\) model, we might use partial correlation as weight, as proposed by Zhang and Florêncio [70]. To this end, given the precision matrix \(\widehat{\mathbf{Q}}= \widehat{\mathbf{C}}^{-1}\), estimated according to (1), we can set the weight of the edge connecting nodes a and b as:

$$\begin{aligned} w_{ab} = - \frac{\widehat{\mathbf{Q}}(a,b)}{\sqrt{\widehat{\mathbf{Q}}(a,a) \, \widehat{\mathbf{Q}}(b,b)}} . \end{aligned}$$

Note that \(w_{aa} = 0\) as we do not include self-loops. However, this approach still relies on the estimate and inversion of the covariance matrix that, as we already discussed, might be unreliable (especially in the presence of a small data sample) as well as expensive to compute: Matrix inversion requires \(O(m^3)\) time [46]. Also, if the image does not follow a \(\hbox {GMRF}\) model, this distance function might produce unreliable results, as for all other covariance-based methods. An option to safeguard against this could be to use the graph constructed to evaluate the \(\hbox {GMRF}\) hypothesis with an approach similar to the one proposed in [3]

Another possibility is to use a different weight function, e.g., the Cauchy function [32], which has been proved to be able to capture graph distances effectively for image signals and is commonly used as graph weight in other applications like image segmentation and compression [8, 28]. We propose to set the weight of the edge connecting bands a and b, according to the band mean vector \(\hat{\varvec{\mu }}= [\mu _1 \mu _2 \ldots \mu _m]^T\) estimated as in (1), as

$$\begin{aligned} w_{ab} = \frac{1}{1 + \left( \frac{\mu _a - \mu _b}{\alpha }\right) ^2} , \end{aligned}$$

where \(\alpha \) is a scaling parameter. In this study, we decided to set \(\alpha = \frac{1}{m} \sum _{i=1}^m \mu _i\), to normalize all values according to the mean range of the bands. The advantages of this approach are twofold: It avoids using unreliable correlation estimates and does not require matrix inversion, thus reducing the computational cost significantly.

Although other approaches to estimate graph weights might be devised, in this study we will limit the analysis to these ones.

4.4 Integration of spatial information in the graph

One of the advantages of using a graph-based approach is the flexibility of the model. For example, by augmenting the graph topology to include edges connecting each node to nodes describing the same band for the neighboring pixels, as shown in Fig. 1b, one is able to include spatial information in the model. We will refer to this spatially aware version of \(\hbox {LAD}\) as \(\hbox {LAD}\)-S.

When considering the case of 4-connected nodes, the resulting graph will be composed of 5m nodes; therefore, the weight matrix \(\mathbf{W}\), as well as the corresponding Laplacian matrix \(\mathbf{L}\), will be a \(5m \times 5m\) matrix. We can construct the weight matrix as follows:

$$\begin{aligned} \mathbf{W}(a,b) = {\left\{ \begin{array}{ll} w'_{ab} &{} \text{ if } \text{ nodes } a, b \text{ represent } \text{ different }\\ &{} \text{ bands } \text{ of } \text{ the } \text{ same } \text{ pixel, }\\ w''_{ab} &{} \text{ if } \text{ nodes } a, b \text{ belong } \text{ to } \text{ the } \text{ same }\\ &{} \text{ band } \text{ of } \text{4-connected } \text{ pixels, }\\ 0 &{} \text{ otherwise, } \end{array}\right. } \end{aligned}$$

where \(w'_{ab}\) and \(w''_{ab}\) are some spectral and spatial correlation measures, respectively.

Then, to compute the distance of a pixel \(\mathbf{x}_{}\) from the model, a graph signal \(\mathbf{s}_{}\) is constructed concatenating the vector corresponding to \(\mathbf{x}_{}\) and its 4-connected neighbors; also in this case, the mean value \(\hat{\varvec{\mu }}\) is subtracted. It follows that the vector \(\mathbf{s}_{}\) will have length 5m.

The spectral weights \(w'_{ab}\) can be estimated as proposed in the previous section. The weights \(w''_{ab}\) can be used to enforce a spatial prior: As an example in the following experimental analysis, we will set uniform spatial weights \(w''_{ab}=1\).

5 Experiments

To objectively evaluate \(\hbox {LAD}\)’s performance, we selected a couple of scenarios in which the use of \(\hbox {RXD}\) has been proposed. The first one is hyperspectral remote sensing, which is one of the most common use cases for anomaly detection where the use of \(\hbox {RXD}\) is widely validated [49]; the second one is the domain of 3D volumetric segmentation of tumoral masses on \(\hbox {positron emission tomography (PET)}\) images, where we successfully explored the use of \(\hbox {RXD}\) in the past [10, 63, 65]. In these scenarios, we compare the performance of the proposed technique with those produced by \(\hbox {RXD}\) and, in the hyperspectral domain, also with \(\hbox {random-selection-based anomaly detector (RSAD)}\) [20] and \(\hbox {WSCF}\) [31]. \(\hbox {RSAD}\) employs multiple random selections of pixels to estimate the background statistics and then marks a pixel as anomalous by merging the output of the different runs by a majority voting approach. \(\hbox {WSCF}\) applies a whitening transformation to the input based on the image covariance matrix and then incorporates spatial information in the anomaly measure. This latter algorithm is of particular interest for our evaluation, to compare its performance against our own spatially aware methodology.

5.1 Hyperspectral remote sensing

Hyperspectral images find wide adoption in remote sensing applications, where hyperspectral sensors are typically deployed on either aircraft or satellites. The data produced by these sensors are a three-dimensional array or “cube” of data with the width and length of the array corresponding to spatial dimensions and the spectrum of each point as the third dimension.

Fig. 2
figure 2

The full \(512\times 217\) Salinas scene

Fig. 3
figure 3

Hyperspectral test scenarios and algorithm outputs. \(\hbox {LAD}\) results have been obtained using \(\mathbf{L}_{C}\)

5.1.1 Dataset

The dataset used in this study is composed of three hyperspectral scenes collected by the 224-band AVIRIS sensor. As a common practice [12], we discarded the 20 water absorption bands, i.e., bands (108-112, 154-167, 224). The first scene was collected over Salinas Valley, California, and is characterized by high spatial resolution (3.7-meter pixels). The area covered by this scene comprises 512 lines by 217 samples, and it includes vegetables, bare soils, and vineyard fields. A classification ground truth containing 16 classes is provided with this scene. A sample band of the image together with the classification ground truth is shown in Fig. 2. The other two scenes image two urban environments and come with anomaly detection ground truth, and both comprise 100 lines by 100 samples. We will refer to them as Urban-A and Urban-B. A sample band of these two scenes, together with their corresponding ground truth, is shown in Fig. 3.

To evaluate \(\hbox {LAD}\), we tested it on both real and synthetic anomalies. For the Salinas scene, we cropped a \(200 \times 150\) portion of the scene and manually segmented a construction which was visible in the cropped area: As the scene mostly contains fields of various kinds, this human-made construction was a good anomalous candidate. This setup, which we will call Field, is shown in Fig. 3m together with its ground truth in Fig. 3n.

To obtain a synthetic anomaly, we used the target implant method [61] on a different portion of the Salinas scene. The \(150 \times 126\) binary mask image \(\mathbf{M}\) shown in Fig. 3t has been constructed by generating six squares having sides measuring from 1 to 6 pixels arranged in a line. The six squares have been then copied in reverse order and arranged in another line at close distance. The two lines have finally been rotated by an angle of approximatively \(\pi /6\). The pixels inside the squares have a value of 1, while the rest of the pixels in \(\mathbf{M}\) have value 0. Then, we cropped a region \(\mathbf{I}\) from the Salinas scene, having the same dimension as the mask. We used it to build the modified image \(\mathbf{I}'\) containing the implanted target as follows:

$$\begin{aligned} \mathbf{I}'(i, j) = \mathbf{M}(i,j) \cdot \varPhi (k) + (1 - \mathbf{M}(i,j)) \cdot \mathbf{I}(i,j) , \end{aligned}$$

where \(\varPhi \) is a function that, given a parameter \(k \in [1,16]\), returns a random pixel from the region of the Salinas scene having class k according to the classification ground truth shown in Fig. 2b. In the following discussion, for conciseness, we will limit the analysis to two synthetic setups with \(k = 14\) and \(k = 4\), respectively. The two representative values have been chosen since \(\hbox {RXD}\) achieves the best performance on the former and the worst one on the latter. We will refer to them as Impl-14 and Impl-4, respectively. A sample band from the Impl-14 setup is shown in Fig. 3s.

Figure 4 shows the mean and standard deviation of the intensity of each band for the background, the anomaly region in Impl-4 and Impl-14. As it can be noticed, the spectral characteristics of the anomaly in Impl-4 are similar in shape to those of the background, although with reduced intensities. The anomaly in Impl-14 presents a more different curve than the others, instead.

Fig. 4
figure 4

Spectral characteristic curves for different regions of the image. The line represents the mean intensity computed over all pixels in a region, while the shaded area represents the standard deviation

Fig. 5
figure 5

\(\hbox {Receiver operating characteristic (ROC)}\) curves for the hyperspectral testing scenarios

5.1.2 Experimental results

We are interested in evaluating the detection accuracy of \(\hbox {LAD}\) using the Laplacian model built over the partial correlation weights (\(\mathbf{L}_{Q}\)) and the one built using Cauchy distance (\(\mathbf{L}_{C}\)). Also, we want to test both the spectral version of \(\hbox {LAD}\) and its spatially aware variant \(\hbox {LAD}\)-S. The results will be compared with those yielded by classic \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\). We compare our results against those yielded by RXD, given its well-known status as benchmark algorithm for anomaly detection. We want also to confirm with our experiments one of the known limitations of \(\hbox {RXD}\) enunciated in Sect. 2.1, namely how the inclusion of spatial information in \(\hbox {RXD}\) is detrimental to its performance, to demonstrate how our approach overcomes this limitation. Another well-known algorithm which aims at addressing this limitation is \(\hbox {WSCF}\), and for this reason we selected it for evaluation as well. \(\hbox {WSCF}\) requires a parameter \(\alpha \) to determine the amount of spatial information included in the metric. In this study, we set \(\alpha = 0.2\), as suggested in the original work [31]. \(\hbox {RSAD}\) requires to select: the initial number of randomly selected blocks N, which should be as small as possible but still large enough so that \(4N>b\), where b is the number of image bands; the number of random selections L; and the percentile \(\alpha \). For these parameters, we chose the following values in our experiments: \(N=80\), \(L=40\), and \(\alpha =0.001\). We implemented our method as well as all three benchmark methods in MATLAB 2014b. All experiments were run on a laptop equipped with an Intel® Core™ i7@2.20GHz CPU, a NVIDIA GT435M GeForce GPU and 8GB of RAMFootnote 1.

Figure 3 shows the visual results by \(\hbox {LAD}\) (\(\mathbf{L}_{C}\)) approach compared to the ones yielded by \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\) on the all hyperspectral scenarios. It can be clearly noticed that the lower number of false positives \(\hbox {LAD}\) is able to achieve against all other algorithms.

Figure 5 shows the \(\hbox {ROC}\) curves for the hyperspectral test cases, for all algorithms except \(\hbox {RSAD}\). The approach by virtue of which \(\hbox {RSAD}\) selects which pixels are anomalous does not lend itself to be plotted in a \(\hbox {ROC}\) curve. The scale of the \(\hbox {FPR}\) axis has been enhanced, as common in anomaly detection studies [4, 43, 68], given the great difference in scale between the number of negative pixels and positive ones. It can be noticed how in all scenarios except Urban-A our approach outperforms both \(\hbox {RXD}\) and \(\hbox {WSCF}\). On Urban-A, all algorithms perform very similarly. Also, worth noticing is that the inclusion of spatial information yields limited improvements on the hyperspectral scenarios. When comparing results obtained by \(\hbox {LAD}\) using \(\mathbf{L}_{Q}\) or \(\mathbf{L}_{C}\), it can be noticed how performance is often very similar. This is a remarkable result, also considering that \(\mathbf{L}_{C}\) creates a model of the background without the need for matrix inversions, so it proves to be both quicker and equally precise.

Table 1 Experimental results in hyperspectral setup (SOI)
Table 2 Experimental results after dimensionality reduction in hyperspectral setup (SOI)
Fig. 6
figure 6

Energy and eigenvalue curves for the Impl-14 scenario

To further compare performance yielded by the different approaches, we also use the standard \(\hbox {spatial overlap index (SOI)}\) [73], also known as Dice similarity coefficient (DSC) [19], which can be computed as follows:

$$\begin{aligned} SOI = \frac{2(A \cap B)}{A + B} , \end{aligned}$$

where A and B are two binary masks (i.e., the ground truth or \(\hbox {region of interest (ROI)}\) and the output of an automatic algorithm); the intersection operator is used to indicate the number of pixels/voxels having value 1 in both masks, while the sum operator indicates the total number of pixels/voxels having value 1 in the two masks. \(\hbox {SOI}\) is also equivalent to the statistical \(F_1\)-score, which is the harmonic mean of precision and sensitivity, and is usually defined in terms of Type I and Type II errors as follows:

$$\begin{aligned} F_1 = \frac{2 \cdot \text{ true } \text{ positive }}{2 \cdot \text{ true } \text{ positive } + \text{ false } \text{ positive } + \text{ false } \text{ negative }}. \nonumber \\ \end{aligned}$$

The equality between (18) and (19) can be easily demonstrated considering that \(A \cap B\) contains the true positive pixels/voxels and that if we consider that \(A = (\text{ true } \text{ positive } + \text{ false } \text{ positive })\) and \(B = (\text{ true } \text{ positive } + \text{ false } \text{ negative })\), then also the denominator in (18) equals the one in (19). Clearly, to compute the \(\hbox {SOI}\) metric one needs to select a threshold t to identify the anomaly subset B. Many approaches [1, 53, 69] have been proposed in the literature to deal with the problem of choosing the optimal threshold. In this work, we select the value of t yielding the highest \(\hbox {SOI}\), i.e., striking the best balance between TPR and \(\hbox {FPR}\) on the \(\hbox {ROC}\) curve in terms of \(\hbox {SOI}\). This choice allows us to compute a single-objective metric to compare the analyzed methods. Alternatively, we could also use the \(\hbox {area under the curve (AUC)}\), which measures the area under each \(\hbox {ROC}\) curve; we decided to avoid such metric since it has been recently criticized for being sensitive to noise [36] and for other significant problems it shows in model comparison [37, 45].

Table 1 shows all \(\hbox {SOI}\) results of our tests. It can be noticed how all variants of our approach are able to outperform \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\). These results are consistent with those presented by the \(\hbox {ROC}\) curves.

Finally, in Table 2 we show results of the de-noised version of both \(\hbox {LAD}\) and \(\hbox {RXD}\), which we call \(\hbox {LAD}\)p and \(\hbox {RXD}\)p, respectively. In this case, the value of p has been chosen according to the cumulative energy as described in Sect. 2.1, setting \(\psi = 0.99\). It can be noticed how \(\hbox {RXD}\) is able to gain the most from dimensionality reduction. These results can be explained considering the distribution of energy in the eigenspace decomposition. For the Impl-14 scenario, in Fig. 6 we show the cumulative energy distribution in the different eigenspaces together with the corresponding eigenvalues \(\kappa _j^{-1}\) and \(\lambda _j\) (that are used to weigh the different contribution in (5) and (13), respectively). It can be noticed that in the \(\hbox {RXD}\) case (Fig. 6a) energy is better compacted into few eigenspaces with respect to \(\hbox {LAD}\) (Fig. 6b, c). At the same time, it can be observed that the distribution of \(\kappa _j^{-1}\) in \(\hbox {RXD}\) dramatically amplifies the last eigenspaces, i.e., the noise components, according to (5). On the contrary, this phenomenon does not affect \(\hbox {LAD}\) since the distribution of eigenvalues \(\lambda _j\) is not peaked on the last eigenspaces. It follows that the effect of noise in (13) is mitigated by construction and the benefit of dimensionality reduction is limited. Indeed, it can be noted that results obtained by \(\hbox {RXD}\) after dimensionality reduction are in line with those obtained by \(\hbox {LAD}\) in its simple form. Being the eigendecomposition a costly operation, on a par with matrix inversion, the use of \(\hbox {LAD}\) (\(\mathbf{L}_{C}\)), which does not require any matrix inversion or eigendecomposition, might be preferable.

Fig. 7
figure 7

The three FDG-PET images of one of the sample patients; (1) is the early scan (ES, \(144 \times 144 \times 213\) px), (2) and (3) are constructed integrating the delayed scan in 3-min time windows (DS1 and DS2, \(144 \times 144 \times 45\) px). Only the area containing the tumor is acquired in the delayed scan. These images, originally in grayscale, are here displayed using a Fire lookup table

5.2 Application to 3D volumes: tumor segmentation in PET sequences

\(\hbox {PET}\) data are volumetric medical images that are usually employed to locate the tumoral area for proper oncological treatment, e.g. by means of radiotherapy. From a PET scan, one or more 3D images can be produced where the intensity of a voxel represents the local concentration of the tracer during the time window of the scan. In particular, fluorodeoxyglucose positron emission tomography (FDG-PET) is used to detect tissue metabolic activity by virtue of the glucose uptake.

During normal cell replication, mutations in the DNA can occur and lead to the birth of cancer cells. By their nature, these cells lack the ability to stop their multiplication, raising cell density in their region and causing insufficient blood supply. The resulting deficiency in oxygen (hypoxia) forces these cells to rely mostly on their anaerobic metabolism, i.e., glycolysis [54]. For this reason, glycolysis is an excellent marker for detecting cancer cells; FDG-PET —in which the tracer’s concentration indicates the glucose uptake in the imaged area—turns out to be a suitable tool for recognizing tumors, metastases, and lymph nodes all at once [30]. It follows that proper segmentation of tumors in medical images is crucial as oncological treatment plans rely on precise information on the tumoral region to be effective [54]. Manual segmentation by medical staff has been proven to be subjective, inaccurate, and time-consuming [66]; for this reason, the need for automatic methods for tumor region segmentation is on the rise. \(\hbox {PET}\) images carry information about cells metabolism and are therefore suitable for this task; however, \(\hbox {PET}\) segmentation is still an open problem mainly because of limited image resolution and strong presence of acquisition noise [69].

In [10, 63, 65], we successfully explored the use of \(\hbox {RXD}\) to identify the anomalous behavior of cancer cells over time in sequences of three \(\hbox {FDG-PET}\) images acquired over a time span of one hour. A quick visual overview of this setup is shown in Fig. 7. The idea behind the use of \(\hbox {RXD}\) in this scenario arises from the fact that cancer cells tend to acquire glucose differently than normal cells, given their peculiar reliance on anaerobic metabolism. For this reason, when considering the values a voxel assumes over time, cancer’s anomalous glucose uptake can be successfully spotted using anomaly detection techniques, where the usual role of spectral bands is taken by three PET images acquired over time.

To do this, we build a 4D matrix \(\mathbf{I}\), having the three spatial dimensions as the first three dimensions and the time as the fourth dimension. Being acquired at different times, with the subject assuming slightly different positions, it is worth recalling that the images need to be aligned using registration algorithms as detailed in [65]. The resulting matrix \(\mathbf{I}\) will then have size \(144 \times 144 \times 45 \times 3\). Then, for a generic voxel, identified by its spatial coordinates, we define the vector \(\mathbf{x}_{} = [x_1 x_2 x_3]^T\) as the vector containing that voxel’s intensities over time. In other words, RXD can be employed in this case if time takes the role of the spectral dimension.

Fig. 8
figure 8

\(\hbox {ROC}\) curves for all patients in the Tumor testing scenario

5.2.1 Experimental results

In this study, we used a dataset comprising eight patients, that has been made available by the Candiolo Cancer Institute (IRCCS-FPO) for research purposes. All the acquisitions have been made using a Philips Gemini TF PET/CT. To this end, we acknowledge the precious aid of nuclear medicine physicians who have manually segmented the \(\hbox {ROIs}\) on the \(\hbox {PET}\) images, setting up the ground truth for evaluating the performance yielded by the proposed tools. We will refer to this setup as Tumor.

Also in this scenario, we are interested in evaluating the detection accuracy of \(\hbox {LAD}\) using both Laplacian models, \(\mathbf{L}_{Q}\) and \(\mathbf{L}_{C}\), and compare our results with those yielded by classic \(\hbox {RXD}\). We cannot compare with \(\hbox {WSCF}\) in this domain as its extension to 3D has not been proposed, and therefore the choice of the parameter \(\alpha \) is non-trivial. A thing to notice regarding this setup is that we are dealing with voxels and 3D volumes. For this reason, in \(\hbox {LAD}\)-S we will use 6-connectivity, which is the extension of 2D 4-connectivity to 3D space.

To compare performance yielded by the different approaches, we use \(\hbox {SOI}\) as presented in (18). Once again, in this study we selected the value of t yielding the highest \(\hbox {SOI}\).

Figure 8 shows the \(\hbox {ROC}\) curves for all the eight patients in the Tumor dataset, while Table 3 shows the average \(\hbox {SOI}\) results of our tests over the patient dataset. The inclusion of spatial information in the graph improves the \(\hbox {SOI}\) metric. In this scenario, we do not present results after dimensionality reduction because the spectral dimensions were already very few. Also, in this scenario the use of \(\hbox {LAD}\) is able to obtain performance similar when not better than \(\hbox {RXD}\) in all its variances.

Table 3 Experimental results in Tumor setup (SOI)

6 Discussion

In the previous section, we conducted experiments in hyperspectral and medical domain. \(\hbox {RXD}\)’s limitations detailed in Sect. 2 can be noticed in many of the presented experiments. In particular, the high number of false negative can be easily noticed in Fig. 3, while the poor performance of \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\) for the Impl-4 scenario can be imputed to the fact that in that case the anomaly has a very similar covariance matrix to the background as shown in Fig. 4; this makes very difficult for covariance-based methods to find an acceptable solution.

The results obtained by \(\hbox {RSAD}\) have been particularly surprising. The algorithm has been able to achieve results inline or even better than the other two covariance-based approaches in a couple of scenarios, while obtaining very poor performance in the others due to very high \(\hbox {FPR}\). We believe this behavior is caused by the assumption made by \(\hbox {RSAD}\) while marking pixels as anomalous that the Mahalanobis distance follows a \(\chi \) distribution. In the scenarios used in this study, we observed that that was rarely the case. When this assumption does not hold, the decision criterion used by \(\hbox {RSAD}\) is probably not sufficient.

The proposed technique was able to outperform state-of-the-art techniques in all scenarios, proving how the flexibility of a graph model can actually enable better and more robust background estimation as well as successful inclusion of spatial information.

Spatially aware variants of the proposed techniques were able to achieve better performance in the Tumor scenarios, while failing at improving the performance of the spectral-only variants in the hyperspectral ones. The benefit of including spatial information is more noticeable in the medical scenario because in that case the spectral dimension is reduced to only three bands, representing three different acquisitions in time, as opposed to the 204 spectral bands of the hyperspectral images. Also, we used a uniform correlation as model for the spatial weights; a more refined model might be more suited to better capture the spatial dynamics of remote sensing, while the one used might just be more fitting for medical imaging.

When comparing results obtained by \(\hbox {LAD}\) using \(\mathbf{L}_{Q}\) or \(\mathbf{L}_{C}\), it can be noticed how performance is often very similar on hyperspectral images, while in Tumor \(\mathbf{L}_{C}\) is able to obtain consistently better results. This behavior is clearly due to the fact that \(\mathbf{L}_{Q}\) depends on pairwise correlation estimates that are particularly critical in the Tumor case, where the 3D volumes are characterized by poor spatiotemporal resolution. In this case, the use of graph prior based on \(\mathbf{L}_{C}\) turns out to be more robust. An analysis of the \(\hbox {ROCs}\) validated this observation even further: For the hyperspectral case, the ROC curves for LAD using \(\mathbf{L}_Q\) or \(\mathbf{L}_C\) behave very similarly in both cases, indicating that the two weight functions are able to capture the same aspects of the data, while in the Tumor case, the two ROC curves have a more varied behavior.

All these tests confirm that the use of our approach is preferable to \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\) and that Laplacian estimated using the Cauchy distance is able to perform as well as the one estimated using partial correlation. Once again, this is remarkable as the former does not require any matrix inversion, while the latter does.

7 Conclusions

We present Laplacian anomaly detector, a graph-based algorithm aiming at detecting targets by virtue of a Laplacian model of the image background. Two different approaches to the graph construction are proposed. When comparing to \(\hbox {RXD}\), \(\hbox {RSAD}\), and \(\hbox {WSCF}\), one of the main advantages of our technique is its ability to model the image content without the need for matrix inversions. Both visual inspection and objective results show how the proposed approach is able to outperform the other benchmark methods. Future direction might be devoted to evaluate \(\hbox {LAD}\) ability to detect anomalies on generic non-image graphs.