Graph Laplacian for image anomaly detection

Reed–Xiaoli detector (RXD) is recognized as the benchmark algorithm for image anomaly detection; however, it presents known limitations, namely the dependence over the image following a multivariate Gaussian model, the estimation and inversion of a high-dimensional covariance matrix, and the inability to effectively include spatial awareness in its evaluation. In this work, a novel graph-based solution to the image anomaly detection problem is proposed; leveraging the graph Fourier transform, we are able to overcome some of RXD’s limitations while reducing computational cost at the same time. Tests over both hyperspectral and medical images, using both synthetic and real anomalies, prove the proposed technique is able to obtain significant gains over performance by other algorithms in the state of the art.


Introduction
Anomaly detection is the task of spotting items that do not conform to the expected pattern of the data.In the case of images, it usually refers to the problem of spotting pixels showing a peculiar spectral signature when compared to all other pixels in an image.Image anomaly detection is considered one of the most interesting and crucial tasks for many high-level image-and video-based applications, e.g., surveillance, environmental monitoring, and medical analysis [16].
One of the most used and widely validated techniques for anomaly detection is Reed-Xiaoli detector, often called RX detector for short [56], which is the most known example of covariance-based anomaly detectors.This class of detectors has found wide adoption in many domains, from hyperspectral [49] to medical images [65]; however, methods of this type suffer from crucial drawbacks, most noticeably the need for covariance estimation and inversion.Many situations exist where the drawbacks of these state-of-the-art anomaly detectors lead to poor and unreliable results [67].Moreover, the operations required by those techniques are computationally expensive [12].For all these reasons, the research for a fast and reliable image anomaly detection strategy able to overcome the limitations of covariance-based anomaly detectors deserves further efforts.
In this paper, we use graphs to tackle image anomaly detection.Graphs are proved to be natural tools to represent data in many domains, e.g., recommendation systems, social networks, or protein interaction systems [18].Recently, they have found wide adoption also in computer vision and image processing communities, thanks to their ability to intuitively model relations between pixels.Graph-based approaches have been proposed to this date to solve a wide variety of image processing tasks, e.g., edge detection [6], gradient estimation [55], and segmentation [9,59].In particular, spectral graph theory has been recently bridged with signal processing, where the graph is used to model local relations between signal samples [57,60].As an example, graph-based signal processing is emerging as a novel approach in the design of energy compacting image transformations [27,28,39,64,70].
To this date, graph-based approaches have not been proposed for image anomaly detection, although many techniques for anomaly detection on generic graphs have been explored in the literature [2].Those techniques cannot be straightforwardly extended to images since they usually exploit anomalies in the topology of the graph to extract knowl-edge about the data [18].On the other hand, in the image case the graph topology is constrained to the pixel grid, whereas different weights are assigned to edges connecting pixels depending on their similarity or correlation.
Our proposed approach uses an undirected weighted graph to model the expected behavior of the data and then computes the distance of each pixel in the image from the model.We propose to use a graph to model spectral or both spectral and spatial correlation.The main contribution of this paper is a novel anomaly detection approach which exploits spectral graph theory to overcome one of the well-known limitations of RX detector and other covariance-based anomaly detectors, i.e., the need to estimate and invert a covariance matrix.Estimation of the covariance may be very critical in the presence of a small sample size; moreover, inverting such a matrix is also a complex, badly conditioned and unstable operation [40].Our novel anomaly detector estimates the statistic of the background using a graph Laplacian matrix.Also, the graph model used by our approach is abstract and flexible enough to be tailored to any prior knowledge of the data possibly available.The effectiveness of our methodological contributions is shown in two use cases: a typical hyperspectral anomaly detection experiment and a novel application for tumor detection in 3D biomedical images.
The paper is organized as follows: we will first give a brief overview of RX detector and the graph Fourier transform in Section 2 and go over some related work in Section 3, and then we will present our technique in Section 4; we will then evaluate the performance of our technique and compare our results with those yielded by algorithms in the state of the art both visually and objectively in Section 5, and we will discuss these results in Section 6; finally, conclusions will be drawn in Section 7.

Background
Anomaly detection refers to a particular class of target detection problems, namely the ones where no prior information about the target is available.In this scenario, supervised approaches that try to find pixels which match reference spectral characteristics (e.g., [24,42]) cannot usually be employed.This extends also to supervised deep learning or other data-driven approaches, which attempt to learn a parametric model from a set of labeled data.Although deep learning methods have found increasingly wide adoption for many other tasks in image processing and computer vision [15,35,71], their application to anomaly detection-especially on hyperspectral and medical imaging-is stifled by multiple factors: first, pixels have to be considered anomalous according to intra-image metrics which are difficult to capture in a dataset; second, the amount of data required to train the models is not often available in these contexts [11,44].For these reasons, classical unsupervised approaches are preferable instead.These algorithms detect anomalous or peculiar pixels showing high spectral distance from their surrounding [20].To this end, the typical strategy is to extract knowledge of the background statistics from the data and then measure the deviation of each examined pixel from the learned knowledge according to some affinity function.

Reed-Xiaoli detector
The best known and most widely employed algorithm for anomaly detection is Reed-Xiaoli detector (RXD) by Reed and Yu [56].To this date, it is still used as a benchmark algorithm for many anomaly detection applications [5,20,48,51].RXD assumes the background to be characterized by a non-stationary multivariate Gaussian model, estimated by the image mean and covariance.Then, it measures the squared Mahalanobis distance [47] of each pixel from the estimated background model.Pixels showing distance values over a set threshold are assessed to be anomalous.
Formally, RXD works as follows.Consider an image I = [x 1 x 2 . . .x N ] consisting of N pixels, where the column vector T represents the value of the i-th pixel over the m channels (or spectral bands) of I.The expected behavior of background pixels can be captured by the mean vector μ and covariance matrix C which are estimated as follows: x i , and where Mean vector and covariance matrix are computed under the assumption that vectors x i are observations of the same random process; it is usually possible to make this assumption as the anomaly is small enough to have a negligible impact on the estimate [12].
Then, the generalized likelihood of a pixel x to be anomalous with respect to the model C is expressed in terms of the square of the Mahalanobis distance [47], as follows: where Q = C −1 , i.e., the inverse of the covariance matrix, also known in the literature as the precision matrix.Finally, a decision threshold η is usually employed to confirm or refuse the anomaly hypothesis.A common approach is to set η adaptively as a percentage of δ RXD dynamic range as follows: An interesting property of RXD has been observed by Chang and Heinz in [14].In that work, the authors demonstrated how RXD can be considered an inverse operation of the principal component analysis (PCA).
More precisely, let us assume that κ 1 ≥ κ 2 ≥ . . .≥ κ m are the eigenvalues of the m × m covariance matrix C and {v 1 , v 2 , . . ., v m } is its set of unit eigenvectors with v j corresponding to κ j .We can then form the matrix V = [v 1 v 2 . . .v m ] with the j-th column specified by v j .V can be used to decorrelate the signal by diagonalizing C into the diagonal matrix K whose j-th diagonal element is κ j , such that V T CV = K and V T QV = K −1 .Then, we can compute y = V T x, which is known as the Karhunen-Loève transform (KLT).Data dimensionality reduction via PCA usually involves the computation of y using just the first p m columns of V.As shown in [14], (2) can be expressed as a function of y as follows: where y j represents the j-th element of the KLT vector y.
From this formulation, one can notice that RXD detects targets with small energies that are represented by small eigenvalues.This is because, according to (4), the smaller the eigenvalue, the greater its contribution to the value of δ RXD .This is reasonable, since if an anomalous small target is present in the image, it will not be visible in the principal components, but it is rather going to appear in smaller components [12].However, when seeing RXD in this form, it is quite evident that the last components, which are those containing mostly noise, are actually weighted the most.To improve the result of RXD, a value p m can be determined [38].Then, the eigenvalues beyond the first (greater) p will be considered to represent components containing only noise and will be discarded.We then obtain a de-noised version of RXD that can be expressed as follows: Obviously, δ m RX D = δ RX D .The issue of determining p was addressed in [13,38] and is closely related to the problem of determining the intrinsic dimensionality (ID) of the image signal.Empirically, p is usually set such that a desired percentage ψ ∈ [0, 1] of the original image cumulative energy content is retained.The cumulative energy content of the first p principal components of an image I = [x 1 x 2 . . .
where y i j is the j-th element of the vector y i .We then choose the smallest p ∈ [1, m], such that e(I, p)/e(I, m) ≤ ψ.Commonly for dimensionality reduction applications ψ = 0.9, but for anomaly detection purposes that value might be too low, given we do not want to risk to lose the anomaly.In this case, ψ = 0.99 is usually more appropriate.

Graph Fourier transform
In recent years, the growing interest in graph-based signal processing [58] has stimulated the study of graph-based transform approaches.These methodologies map the image content onto a topological graph where nodes represent pixel intensities and edges model relations between nodes, e.g., according to a criterion based on correlation or other similarity measures.The Fourier transform can be generalized to graphs obtaining the so-called graph Fourier transform (GFT) [57].Consider an undirected, weighted graph G = (V, E) composed of a vertex set V of order n and an edge set E specified by (a, b, w ab ), where a, b ∈ V, and w ab ∈ R + is the edge weight between vertices a and b.Thus, a weighted graph can be described by its adjacency matrix W where W(a, b) = w ab .A graph signal is a mapping that assigns a value to each vertex, denoted as s = [s 1 s 2 . . .s n ] T .
Typically, when computing the GFT a graph is constructed to capture the inter-pixel correlation and is used to compute the optimal decorrelating transform leveraging on spectral graph theory [60].From the adjacency (also called weight) matrix W, the combinatorial graph Laplacian matrix L = D − W can be computed, where D is the degree matrix: a diagonal matrix whose a-th diagonal element is equal to the sum of the weights of all edges incident to the node a. Formally, In some scenarios, it is useful to normalize weights in the Laplacian matrix; in those cases, the use of the symmetric normalized Laplacian matrix L sym is preferred.It is defined as L sym has important properties, i.e., its eigenvalues are always real, nonnegative, and bounded into the range [0, 2]; for these reasons, the spectrum of a symmetric normalized Laplacian relates well to other graph invariants for general graphs in a way that other definitions fail to do [18].Any Laplacian matrix L is a symmetric positive semidefinitive matrix with eigendecomposition: where U is the matrix whose columns are the eigenvectors of L and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues.The matrix U is used to compute the GFT of a signal s as: The inverse GFT is then given by When computing the GFT, the eigenvalues in Λ are usually sorted for increasing magnitude, the first eigenvalue being equal to zero [57], i.e., 0 = λ 1 ≤ λ 2 ≤ . . .≤ λ m .The eigenvectors in U are sorted accordingly.

Related work
Despite its popularity, RXD has recognized drawbacks that undermine its performance in some applications.For a full discussion over the limitations of RXD, we suggest [12,67]; however, they can be summarized in the following: 1. RXD involves a high-dimensional covariance matrix that needs to be estimated and inverted, often under a small sample size [5,40].Those are unstable, highly complex, and badly conditioned operations; 2. RXD often suffers from high false positive rate (FPR) [5,34,51]; 3. RXD assumes that the background follows a multivariate Gaussian model, but there are cases in which this assumption might not be adequate, e.g., in the case of multiple materials and textures [5,12,21,34]; 4. RXD lacks spatial awareness: every pixel is evaluated individually extrapolated from its context [31].
To address these issues, recent works have iterated over RXD's idea, e.g., by considering subspace features [22,62], by using kernels to go beyond the Gaussian assumption [21,41], by applying dimensionality reduction [33], by improving how the background statistics are estimated [20,50], or by exploiting sparsity and compress sensing theory [23,26,72].In this work, we generalize RXD's idea by looking at it from the point of view of spectral graph theory.This not only makes us able to avoid costly covariance matrix inversions, but also allows us to incorporate spatial information and any prior knowledge about the background model into the detector.Previous work trying to including spatial awareness in the detector is available in the literature, a noteworthy example is whitening spatial correlation filtering (WSCF) [31], where the authors propose to apply a whitening transformation based on the eigendecomposition of the image covariance matrix.On the whitened space, RXD is represented by the Euclidean norm.Then, by using an approach based on constrained energy minimization, WSCF spots anomalous pixels by estimating consistency to their neighborhood in the whitened space.We compare our proposed approach to WSCF in the experimental section.
Although prior research targeting anomaly detection in graphs exists, it mostly focuses on anomalies in a graph structure, and not on graph signals [2,18].For example, in the context of behavioral monitoring and intelligence, the structure of social graphs can be analyzed to spot subgraphs expressing patterns deviating from the rest of the network [52].However, in images, the structure of the graph is fixed to a grid, and the application of graph-based anomaly detection algorithms coming from other domains is not straightforward; even in works where peculiarities in the graph signal are under observation, structure is included as part of the signal, as for example in [25] where a signal function of the physical distance between wireless sensors is proposed.The effectiveness of these approaches to images has not been reported yet.
Our proposed graph-based approach is founded on two recent findings: first, Zhang and Florêncio [70] have shown that a Laplacian model can be used as an estimation of the precision matrix Q of an image, under the assumption that the image follows a gaussian Markov random field (GMRF) model.This amounts to using a function of the partial correlation between nodes as graph weights.Second, it has been demonstrated how the GFT can be considered an approximation of the KLT for graph signals [39].Recent literature in spectral graph theory has exploited this relationship to provide novel graph-based solutions to classical signal processing problems, in particular for image compression where the use of the GFT has been proposed as an alternative to the discrete cosine transform (DCT) [17,27,28,39].This relationship is, however, never been explored in the context of image anomaly detection, which motivated us to study it in this work.

Method
In this work, we exploit the analogy between KLT and GFT in the framework of anomaly detection.In the GFT definition, the role of the covariance matrix in the KLT is taken by the graph Laplacian.It turns out that L can be exploited also in the inverse problem of anomaly detection according to (4).We here propose a novel algorithm for image anomaly detection, which we will refer to as Laplacian anomaly detector (LAD).LAD overcomes some of the known limitations of RXD exposed in Section 2.1: it can be used to avoid problematic covariance matrix estimate and inversion, and it is able to include spatial information as well as a priori knowledge, when available.

Construction of the graph model
Given an image I composed of N pixels and having m spectral bands or channels, we first build an undirected graph G = (V, E) to serve as the model for the background pixels in the image.The graph is used to model local relations between pixel values and can be constructed to capture spectral and spatial characteristics.Topology and weights of the graph have to be chosen accordingly to the domain.We will discuss some general construction strategies in Section 4.3 and Section 4.4.The chosen graph will be described by a weight matrix W, from which a Laplacian matrix L will be computed according to the procedure detailed in Section 2.2.The use of the symmetric normalized Laplacian, constructed as in (8), in place of the unnormalized combinatorial one is to be preferred for the reasons expressed in Section 2.2.Also, L sym is proved to be preferable in similar domains, e.g., segmentation and classification [7,29].

Graph-based anomaly detection
Given a pixel x, we define a corresponding graph signal s, e.g., describing the spectral bands of x or its spatial neighborhood, and compute the distance of x from the model as where s j represents the j-th element of the GFT vector s, and U and Λ refer to the eigenvector and eigenvalue matrices used for the eigendecomposition of L in (9).Although this formulation might look similar to the one of RXD given in (4), some important differences have to be noted.First, the model used is not the inverse of the covariance matrix C −1 , but an arbitrary Laplacian model; this is a generalization over RXD, because if the image follows a GMRF model, then a Laplacian can be constructed to estimate the precision matrix [70], but if this is not the case a Laplacian model can be computed according to any knowledge of the domain.Second, the Laplacian matrix can be used to capture both spatial and spectral characteristics as we will detail in Section 4.4.
Another thing to notice is that in (12) each contribution s j is multiplied by λ j , whereas in RXD each y j was instead divided by the corresponding eigenvalue κ j .
As already discussed for RXD, we can also use a denoised version of the GFT where only the first smaller p m eigenvectors are kept, removing the higher and noisier frequencies and obtaining the following: The parameter p is determined accordingly to the percentage of retained cumulative energy, following the approach presented in Section 2.1.Finally, a decision threshold over δ L AD is needed to determine if a pixel is anomalous or not.An approach similar to the one described in Section 2.1 can be employed.

Spectral graph model
As already mentioned, the graph model is used to characterize the typical behavior around the pixel being tested for anomaly.As in the case of standard RXD, the graph can be employed to model only the spectral relations: in this case, the vertex set V consists of m nodes, each representing one of the spectral bands of I; then, we connect each pair of nodes (bands) with an edge, obtaining a fully connected graph.An example of this topology for a 3-band image is shown in Figure 1a.A weight is then assigned to each edge: if some a priori knowledge about inter-band correlation is available, it can be used to set weights accordingly; if this is not the case, a possibility is to use the image data to estimate the weights.Also, for each pixel x, the graph signal s will contain exactly the value of that pixel over the m bands, after removing the mean; thus, s = x.
Under the assumption that the image follows a GMRF model, we might use partial correlation as weight, as proposed by Zhang and Florêncio [70].To this end, given the precision matrix Q = C −1 , estimated according to (1), we can set the weight of the edge connecting nodes a and b as: Note that w aa = 0 as we do not include self-loops.However, this approach still relies on the estimate and inversion of the covariance matrix that, as we already discussed, might be unreliable (especially in the presence of a small data sample) as well as expensive to compute: matrix inversion requires O(m 3 ) time [46].Also, if the image does not follow a GMRF model, this distance function might produce unreliable results, as for all other covariance-based methods.An option to safeguard against this could be to use the graph constructed to evaluate the GMRF hypothesis with an approach similar to the one proposed in [3] Another possibility is to use a different weight function, e.g., the Cauchy function [32], which has been proved to be able to capture graph distances effectively for image signals and is commonly used as graph weight in other applications like image segmentation and compression [8,28].We propose to set the weight of the edge connecting bands a and b, according to the band mean vector μ = [µ 1 µ 2 . . .µ m ] T estimated as in (1), as where α is a scaling parameter.In this study, we decided to set α = 1 m m i=1 µ i , to normalize all values according to the mean range of the bands.The advantages of this approach are twofold: it avoids using unreliable correlation estimates and does not require matrix inversion, thus reducing the computational cost significantly.
Although other approaches to estimate graph weights might be devised, in this study we will limit the analysis to these ones.

Integration of spatial information in the graph
One of the advantages of using a graph-based approach is the flexibility of the model.For example, by augmenting the graph topology to include edges connecting each node to nodes describing the same band for the neighboring pixels, as shown in Figure 1b, one is able to include spatial information in the model.We will refer to this spatially aware version of LAD as LAD-S.
When considering the case of 4-connected nodes, the resulting graph will be composed of 5m nodes; therefore, the weight matrix W, as well as the corresponding Laplacian matrix L, will be a 5m × 5m matrix.We can construct the weight matrix as follows: where w ab and w ab are some spectral and spatial correlation measures, respectively.Then, to compute the distance of a pixel x from the model, a graph signal s is constructed concatenating the vector corresponding to x and its 4-connected neighbors; also in this case, the mean value μ is subtracted.It follows that the vector s will have length 5m.
The spectral weights w ab can be estimated as proposed in the previous section.The weights w ab can be used to enforce a spatial prior: as an example in the following experimental analysis, we will set uniform spatial weights w ab = 1.

Experiments
To objectively evaluate LAD's performance, we selected a couple of scenarios in which the use of RXD has been proposed.The first one is hyperspectral remote sensing, which is one of the most common use cases for anomaly detection where the use of RXD is widely validated [49]; the second one is the domain of 3D volumetric segmentation of tumoral masses on positron emission tomography (PET) images, where we successfully explored the use of RXD in the past [10,63,65].In these scenarios, we compare the performance of the proposed technique with those produced by RXD and, in the hyperspectral domain, also with Random-selection-based anomaly detector (RSAD) [20] and WSCF [31].RSAD employs multiple random selections of pixels to estimate the background statistics and then marks a pixel as anomalous by merging the output of the different runs by a majority voting approach.WSCF applies a whitening transformation to the input based on the image covariance matrix and then incorporates spatial information in the anomaly measure.This latter algorithm is of particular interest for our evaluation, to compare its performance against our own spatially aware methodology.

Hyperspectral remote sensing
Hyperspectral images find wide adoption in remote sensing applications, where hyperspectral sensors are typically deployed on either aircraft or satellites.The data produced by these sensors are a three-dimensional array or "cube" of data with the width and length of the array corresponding to spatial dimensions and the spectrum of each point as the third dimension.

Dataset
The dataset used in this study is composed of three hyperspectral scenes collected by the 224-band AVIRIS sensor.As a common practice [12], we discarded the 20 water absorption bands, i.e., bands (108-112, 154-167, 224).The first To evaluate LAD, we tested it on both real and synthetic anomalies.For the Salinas scene, we cropped a 200 × 150 portion of the scene and manually segmented a construction which was visible in the cropped area: as the scene mostly contains fields of various kinds, this human-made construction was a good anomalous candidate.This setup, which we will call Field, is shown in Figure 3m together with its ground truth in Figure 3n.
To obtain a synthetic anomaly, we used the target implant method [61] on a different portion of the Salinas scene.The 150 × 126 binary mask image M shown in Figure 3t has been constructed by generating six squares having sides measuring from 1 to 6 pixels arranged in a line.The six squares have been then copied in reverse order and arranged in another line at close distance.The two lines have finally been rotated by an angle of approximatively π/6.The pixels inside the squares have value of 1, while the rest of the pixels in M have value 0.Then, we cropped a region I from the Salinas scene, having the same dimension as the mask.We used it to build the modified image I containing the implanted target as follows: where Φ is a function that, given a parameter k ∈ [1,16], returns a random pixel from the region of the Salinas scene having class k according to the classification ground truth shown in Figure 2b.In the following discussion, for conciseness, we will limit the analysis to two synthetic setups with k = 14 and k = 4, respectively.The two representative values have been chosen since RXD achieves the best performance on the former and the worst one on the latter.We will refer to them as Impl-14 and Impl-4, respectively.A sample band from the Impl-14 setup is shown in Figure 3s.
Figure 4 shows the mean and standard deviation of the intensity of each band for the background, the anomaly region in Impl-4 and Impl-14.As it can be noticed, the spectral characteristics of the anomaly in Impl-4 are similar in shape to those of the background, although with reduced intensities.The anomaly in Impl-14 presents a more different curve than the others, instead.

Experimental results
We are interested in evaluating the detection accuracy of LAD using the Laplacian model built over the partial correlation weights (L Q ) and the one built using Cauchy distance (L C ).Also, we want to test both the spectral version of LAD and its spatially aware variant LAD-S.The results will be compared with those yielded by classic RXD, RSAD, and WSCF.We compare our results against those yielded by RXD, given its well known status as benchmark algorithm for anomaly detection.We want also to confirm with our experiments one of the known limitations of RXD enunciated in Section 2.1, namely how the inclusion of spatial information in RXD is detrimental to its performance, to demonstrate how our approach overcomes this limitation.Another wellknown algorithm which aims at addressing this limitation is WSCF, and for this reason we selected it for evaluation as well.WSCF requires a parameter α to determine the amount of spatial information included in the metric.In this study, we set α = 0.2, as suggested in the original work [31].RSAD requires to select: the initial number of randomly selected blocks N, which should be as small as possible but still large enough so that 4N > b, where b is the number of image bands; the number of random selections L; and the percentile α.For these parameters, we chose the following values in our experiments: N = 80, L = 40, and α = 0.001.We implemented our method as well as all three benchmark methods in MATLAB 2014b.All experiments were run on a laptop equipped with an Intel ® Core™ i7@2.20GHzCPU, a NVIDIA GT435M GeForce GPU and 8GB of RAM 1.
Figure 3 shows the visual results by LAD (L C ) approach compared to the ones yielded by RXD, RSAD, and WSCF on the all hyperspectral scenarios.It can be clearly noticed that the lower number of false positives LAD is able to achieve against all other algorithms.
Figure 5 shows the ROC curves for the hyperspectral test cases, for all algorithms except RSAD.The approach by virtue of which RSAD selects which pixels are anomalous does not lend itself to be plotted in a ROC curve.The scale of the FPR axis has been enhanced, as common in anomaly detection studies [4,43,68], given the great difference in scale between the number of negative pixels and positive ones.It can be noticed how in all scenarios except Urban-A our approach outperforms both RXD and WSCF.On Urban-A, all algorithms perform very similarly.Also, worth noticing is that the inclusion of spatial information yields limited improvements on the hyperspectral scenarios.When comparing results obtained by LAD using L Q or L C , it can be noticed how performance is often very similar.This is a remarkable Fig. 5 Receiver operating characteristic (ROC) curves for the hyperspectral testing scenarios result, also considering that L C creates a model of the background without the need for matrix inversions, so it proves to be both quicker and equally precise.
To further compare performance yielded by the different approaches, we also use the standard spatial overlap index (SOI) [73], also known as Dice similarity coefficient (DSC) [19], which can be computed as follows: where A and B are two binary masks (i.e., the ground truth or region of interest (ROI) and the output of an automatic algorithm); the intersection operator is used to indicate the number of pixels/voxels having value 1 in both masks, while the sum operator indicates the total number of pixels/voxels having value 1 in the two masks.SOI is also equivalent to the statistical F 1 -score, which is the harmonic mean of precision and sensitivity, and is usually defined in terms of Type I and Type II errors as follows: The equality between ( 18) and ( 19) can be easily demonstrated considering that A ∩ B contains the true positive pixels/voxels and that if we consider that A = (true positive + false positive) and B = (true positive + false negative), then also the denominator in (18) equals the one in (19).Clearly, to compute the SOI metric one needs to select a threshold t to identify the anomaly subset B. Many approaches [1,53,69] have been proposed in the literature to deal with the problem of choosing the optimal threshold.In this work, we select the value of t yielding the highest SOI, i.e., striking the best balance between TPR and FPR on the ROC curve in terms of SOI.This choice allows us to compute a single-objective metric to compare the analyzed methods.Alternatively, we could also use the area under the curve (AUC), which measures the area under each ROC curve; we decided to avoid such metric since it has been recently criticized for being sensitive to noise [36] and for other significant problems it shows in model comparison [37,45].
Table 1 shows all SOI results of our tests.It can be noticed how all variants of our approach are able to outperform RXD, RSAD, and WSCF.These results are consistent with those presented by the ROC curves.
Finally, in Table 2 we show results of the de-noised version of both LAD and RXD, which we call LAD p and RXD p , respectively.In this case, the value of p has been chosen according to the cumulative energy as described in Section 2.1, setting ψ = 0.99.It can be noticed how RXD is able to gain  the most from dimensionality reduction.These results can be explained considering the distribution of energy in the eigenspace decomposition.For the Impl-14 scenario, in Figure 6 we show the cumulative energy distribution in the different eigenspaces together with the corresponding eigenvalues κ −1 j and λ j (that are used to weigh the different contribution in ( 5) and ( 13) respectively).It can be noticed that in the RXD case (Figure 6a) energy is better compacted into few eigenspaces with respect to LAD (Figure 6b and Figure 6c).At the same time, it can be observed that the distribution of κ −1 j in RXD dramatically amplifies the last eigenspaces, i.e., the noise components, according to (5).On the contrary, this phenomenon does not affect LAD since the distribution of eigenvalues λ j is not peaked on the last eigenspaces.It follows that the effect of noise in (13) is mitigated by construction and the benefit of dimensionality reduction is limited.Indeed, it can be noted that results obtained by RXD after dimensionality reduction are in line with those obtained by LAD in its simple form.Being the eigendecomposition a costly operation, on a par with matrix inversion, the use of LAD (L C ), which does not require any matrix inversion or eigendecomposition, might be preferable.

Application to 3D volumes: tumor segmentation in PET sequences
PET data are volumetric medical images that are usually employed to locate the tumoral area for proper oncological treatment, e.g., by means of radiotherapy.From a PET scan, one or more 3D images can be produced where the intensity of a voxel represents the local concentration of the tracer during the time window of the scan.In particular, fluorodeoxyglucose positron emission tomography (FDG-PET) is used to detect tissue metabolic activity by virtue of the glucose uptake.During normal cell replication, mutations in the DNA can occur and lead to the birth of cancer cells.By their nature, these cells lack the ability to stop their multiplication, raising cell density in their region and causing insufficient blood supply.The resulting deficiency in oxygen (hypoxia) forces these cells to rely mostly on their anaerobic metabolism, i.e., glycolysis [54].For this reason, glycolysis is an excellent marker for detecting cancer cells; FDG-PET -in which the tracer's concentration indicates the glucose uptake in the imaged area-turns out to be a suitable tool for recognizing tumors, metastases, and lymph nodes all at once [30].It follows that proper segmentation of tumors in medical images is crucial as oncological treatment plans rely on precise information on the tumoral region to be effective [54].Manual segmentation by medical staff has been proven to be subjective, inaccurate, and time-consuming [66]; for this reason, the need for automatic methods for tumor region segmentation is on the rise.PET images carry information about cells metabolism and are therefore suitable for this task; however, PET segmentation is still an open problem mainly because of limited image resolution and strong presence of acquisition noise [69].
In [10,63,65], we successfully explored the use of RXD to identify the anomalous behavior of cancer cells over time in sequences of three FDG-PET images acquired over a time span of one hour.A quick visual overview of this setup is shown in Figure 7.The idea behind the use of RXD in this scenario arises from the fact that cancer cells tend to acquire glucose differently than normal cells, given their peculiar reliance on anaerobic metabolism.For this reason, when considering the values a voxel assumes over time, cancer's anomalous glucose uptake can be successfully spotted using anomaly detection techniques, where the usual role of spectral bands is taken by three PET images acquired over time.
To do this, we build a 4D matrix I, having the three spatial dimensions as the first three dimensions and the time as the fourth dimension.Being acquired at different times, with the subject assuming slightly different positions, it is worth recalling that the images need to be aligned using registration algorithms as detailed in [65].The resulting matrix I will then have size 144 × 144 × 45 × 3.Then, for a generic voxel, identified by its spatial coordinates, we define the vector x = [x 1 x 2 x 3 ] T as the vector containing that voxel's intensities over time.In other words, RXD can be employed in this case if time takes the role of the spectral dimension.

Experimental results
In this study, we used a dataset comprising eight patients, that has been made available by the Candiolo Cancer Institute (IRCCS-FPO) for research purposes.All the acquisitions have been made using a Philips Gemini TF PET/CT.To this end, we acknowledge the precious aid of nuclear medicine physicians who have manually segmented the ROIs on the PET images, setting up the ground truth for evaluating the performance yielded by the proposed tools.We will refer to this setup as Tumor.
Also in this scenario, we are interested in evaluating the detection accuracy of LAD using both Laplacian models, L Q and L C , and compare our results with those yielded by classic RXD.We cannot compare with WSCF in this domain as its extension to 3D has not been proposed, and therefore the choice of the parameter α is non-trivial.A thing to notice regarding this setup is that we are dealing with voxels and 3D volumes.For this reason, in LAD-S we will use 6connectivity, which is the extension of 2D 4-connectivity to 3D space.
To compare performance yielded by the different approaches, we use SOI as presented in (18).Once again, in this study we selected the value of t yielding the highest SOI.
Figure 8 shows the ROC curves for all the eight patients in the Tumor dataset, while Table 3 shows the average SOI results of our tests over the patient dataset.The inclusion of spatial information in the graph improves the SOI metric.In this scenario, we do not present results after dimensionality reduction because the spectral dimensions were already very few.Also, in this scenario the use of LAD is able to obtain  8 ROC curves for all patients in the Tumor testing scenario performance similar when not better than RXD in all its variances.

Discussion
In the previous section, we conducted experiments in hyperspectral and medical domain.RXD's limitations detailed in Section 2 can be noticed in many of the presented experiments.In particular, the high number of false negative can be easily noticed in Figure 3, while the poor performance of RXD, RSAD, and WSCF for the Impl-4 scenario can be imputed to the fact that in that case the anomaly has a very similar covariance matrix to the background as shown in Figure 4; this makes very difficult for covariance-based methods to find an acceptable solution.
The results obtained by RSAD have been particularly surprising.The algorithm has been able to achieve results inline or even better than the other two covariance-based approaches in a couple of scenarios, while obtaining very poor performance in the others due to very high FPR.We believe this behavior is caused by the assumption made by RSAD while marking pixels as anomalous that the Mahalanobis distance follows a χ distribution.In the scenarios used in this study, we observed that that was rarely the case.When this assumption does not hold, the decision criterion used by RSAD is probably not sufficient.
The proposed technique was able to outperform state-ofthe-art techniques in all scenarios, proving how the flexibility of a graph model can actually enable better and more robust background estimation as well as successful inclusion of spatial information.
Spatially aware variants of the proposed techniques were able to achieve better performance in the Tumor scenarios, while failing at improving the performance of the spectralonly variants in the hyperspectral ones.The benefit of including spatial information is more noticeable in the medical scenario because in that case the spectral dimension is reduced to only three bands, representing three different acquisitions in time, as opposed to the 204 spectral bands of the hyperspectral images.Also, we used a uniform correlation as model for the spatial weights; a more refined model might be more suited to better capture the spatial dynamics of remote sensing, while the one used might just be more fitting for medical imaging.
When comparing results obtained by LAD using L Q or L C , it can be noticed how performance is often very similar on hyperspectral images, while in Tumor L C is able to obtain consistently better results.This behavior is clearly due to the fact that L Q depends on pairwise correlation estimates that are particularly critical in the Tumor case, where the 3D volumes are characterized by poor spatiotemporal resolution.In this case, the use of graph prior based on L C turns out to be more robust.An analysis of the ROCs validated this observation even further: for the hyperspectral case, the ROC curves for LAD using L Q or L C behave very similarly in both cases, indicating that the two weight functions are able to capture the same aspects of the data, while in the Tumor case, the two ROC curves have a more varied behavior.
All these tests confirm that the use of our approach is preferable to RXD, RSAD, and WSCF and that Laplacian estimated using the Cauchy distance is able to perform as well as the one estimated using partial correlation.Once again, this is remarkable as the former does not require any matrix inversion, while the latter does.

Conclusions
We present Laplacian anomaly detector, a graph-based algorithm aiming at detecting targets by virtue of a Laplacian model of the image background.Two different approaches to the graph construction are proposed.When comparing to RXD, RSAD, and WSCF, one of the main advantages of our technique is its ability to model the image content without the need for matrix inversions.Both visual inspection and objective results show how the proposed approach is able to outperform the other benchmark methods.Future direction might be devoted to evaluate LAD ability to detect anomalies on generic non-image graphs.
x N ] can be expressed in terms of the image's KLT transform Y = V T I = [y 1 y 2 . . .y N ] where I = [x 1 x 2 . . .x N ] as e(I, p) = N i=1 p j=1

Fig. 1
Fig. 1 Example of 3-band graph connectivity: the spectral components are fully connected, while spatially pixels are 4-connected.

w
ab if nodes a, b represent different bands of the same pixel, w ab if nodes a, b belong to the same band of 4-connected pixels, 0 otherwise,

Fig. 3 Fig. 4
Fig. 3 Hyperspectral test scenarios and algorithm outputs.LAD results have been obtained using L C .

False
positive rate (FPR) True positive rate (TPR)

Fig. 6
Fig.6 Energy and eigenvalue curves for the Impl-14 scenario

Fig. 7
Fig. 7 The three FDG-PET images of one of the sample patients; (1) is the early scan (ES, 144×144×213 px), (2) and (3) are constructed integrating the delayed scan in 3-min time windows (DS1 and DS2, 144×144×45 px).Only the area containing the tumor is acquired in the delayed scan.These images, originally in grayscale, are here displayed using a Fire lookup table.

False
Fig. 8 ROC curves for all patients in the Tumor testing scenario

Table 1
Experimental results in hyperspectral setup (SOI).For each column, bold indicates the best result.

Table 2
Experimental results after dimensionality reduction in hyperspectral setup (SOI).For each column, bold indicates the best result.

Table 3
Experimental results in Tumor setup (SOI).Bold indicates the best result.