Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model

Lo, Edisanter

doi:10.1007/s10044-015-0519-6

Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model

Theoretical Advances
Published: 22 September 2015

Volume 20, pages 531–555, (2017)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Edisanter Lo¹

334 Accesses
1 Citation
Explore all metrics

Abstract

Anomaly detection in a large area using hyperspectral imaging is an important application in real-time remote sensing. Anomaly detectors based on subspace models are suitable for such an anomaly and usually assume the main background subspace and its dimensions are known. These detectors can detect the anomaly for a range of values of the dimension of the subspace. The objective of this paper is to develop an anomaly detector that extends this range of values by assuming main background subspace with an unknown user-specified dimension and by imposing covariance of error to be a diagonal matrix. A pixel from the image is modeled as the sum of a linear combination of the unknown main background subspace and an unknown error. By having more unknown quantities, there are more degrees of freedom to find a better way to fit data to the model. By having a diagonal matrix for the covariance of the error, the error components become uncorrelated. The coefficients of the linear combination are unknown, but are solved by using a maximum likelihood estimation. Experimental results using real hyperspectral images show that the anomaly detector can detect the anomaly for a significantly larger range of values for the dimension of the subspace than conventional anomaly detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hyperspectral anomaly detection based on uniformly partitioned pixel

Article 06 August 2014

Hybrid anomaly detection method for hyperspectral images

Article 17 January 2023

Developing an algorithm for local anomaly detection based on spectral space window in hyperspectral image

Article 08 January 2015

References

Matteoli S, Diani M, Corsini G (2010) A Tutorial overview of anomaly detection in hyperspectral images. IEEE Trans Aerosp Electron Syst Mag 25(7):5–27
Article Google Scholar
Stein DWJ, Beaven SG, Hoff LE, Winter EM, Schaum AP, Stocker AD (2002) Anomaly detection from hyperspectral imagery. IEEE Signal Process Mag 19(1):58–69
Article Google Scholar
Reed IS, Yu X (1990) Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans Acoust Speech Signal Process 38(10):1760–1770
Article Google Scholar
Schweizer SM, Moura JMF (2000) Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Trans Geosci Remote Sens 46(5):1855–1871
MATH Google Scholar
Schweizer SM, Moura JMF (2001) Efficient detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 10(4):584–597
MATH Google Scholar
Lo E, Ingram J (2008) Hyperspectral anomaly detection based on minimum generalized variance method. In: Proceedings of SPIE, vol. 6966, p 696603
Fowler J, Du Q (2012) Anomaly detection and reconstruction from random projections. IEEE Trans Image Process 21(1):184–195
Article MathSciNet Google Scholar
Du B, Zhang L (2010) Random selection based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589
Article Google Scholar
Khazai S, Safari A, Mojaradi B, Homayouni S (2012) An approach for subpixel anomaly detection in hyperspectral images. IEEE J Sel Top Appl Earth Obs Remote Sens 5(2):470–477
Article Google Scholar
McKenzie P, Alder M (1994) Selecting the optimal number of components for a Gaussian mixture model. In: Proceedings of IEEE international symposium on information theory, pp 393
Yeung KY, Fraley C, Murua A, Rafery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article Google Scholar
Kyrgyzov IO, Kyrgyzov OO, Maitre H, Campedel M (2007) Kernel MDL to determine the number of clusters, vol 4571., Lecture Notes in Computer ScienceSpringer, Berlin, pp 203–217
Google Scholar
Chang CI, Du Q (2004) Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans Geosci Remote Sens 42(3):608–619
Article Google Scholar
Masson P, Pieczynski W (1993) SEM algorithm and unsupervised statistical segmentation of satellite images. IEEE Trans Geosci Remote Sens 31(3):618–633
Article Google Scholar
Ashton EA (1998) Detection of subpixel anomalies in multispectral infrared imagery using an adaptive Bayesian classifier. IEEE Trans Geosci Remote Sens 36(2):506–517
Article Google Scholar
Carlotto MJ (2005) A cluster-based approach for detecting man-made objects and changes in imagery. IEEE Trans Geosci Remote Sens 43(2):374–387
Article Google Scholar
Duran O, Petrou M (2005) A time-efficient clustering method for pure class selection. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 510–513
Duran O, Petrou M (2007) A time-efficient method for anomaly detection in hyperspectral images. IEEE Trans Geosci Remote Sens 45(12):3894–3904
Article Google Scholar
Duran O, Petrou M, Hathaway D, Nothard J (2006) Anomaly detection through adaptive background class extraction from dynamic hyperspectral data. In: Proceedings of IEEE Nordic Signal Processing Conference, pp 234–237
Penn B (2002) A time-efficient method for anomaly detection in hyperspectral images. In: Proceedings of IEEE Aerospace Conference vol 3, pp 1531–1535
Chen JY, Reed IS (1987) A detection algorithm for optical targets in clutter. IEEE Trans Aerosp Electron Syst 23(1):394–405
Google Scholar
Kwon H, Der SZ, Nasrabadi NM (2003) Using self-organizing maps for anomaly detection in hyperspectral imagery. Opt Eng 42(11):3342–3351
Article Google Scholar
Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Article MathSciNet MATH Google Scholar
Banerjee A, Burlina P, Diehl C (2006) A support vector method for anomaly detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 44(8):2282–2291
Article Google Scholar
Goldberg H, Kwon H, Nasrabadi NM (2007) Kernel eigenspace separation transform for subspace anomaly detection in hyperspectral imagery. IEEE Geosci Remote Sens Lett 4(4):581–585
Article Google Scholar
Bowles J, Chen W, Gillis D (2003) ORASIS framework-benefits to working within the linear mixing model. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 96–98
Winter ME (1999) Fast autonomous spectral endmember determination in hyperspectral data. In: Proceedings of 13th international conference on applied geologic remote sensing, vol 2, pp 337–344
Winter ME (2000) Comparison of approaches for determining end-members in hyperspectral data. In: Proceedings of IEEE aerospace conference vol 3, pp 305–313
Nascimento JMP, Bioucas JM (2005) Dias, vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 43(4):898–909
Article Google Scholar
Chang CI (2005) Orthogonal subspace projection (OSP) revisited: a comprehensive study and analysis. IEEE Trans Geosci Remote Sens 43(3):502–518
Article Google Scholar
Ranney KI, Soumekh M (2006) Hyperspectral anomaly detection within the signal subspace. IEEE Geosci Remote Sens Lett 3(3):312–316
Article Google Scholar
Duran O, Petrou M (2009) Spectral unmixing with negative and superunity abundances for subpixel anomaly detection. IEEE Trans Geosci Remote Sens Lett 6(1):152–156
Article Google Scholar
Du B, Zhang L (2014) A discriminative metric learning based anomaly detection method. IEEE Trans Geosci Remote Sens 52(11):6844–6857
Article Google Scholar
Zhao R, Du B, Zhang L (2014) Robust nonlinear hyperspectral anomaly detection approach. IEEE J Sel Top Appl Earth Obs Remote Sens 7(4):1227–1234
Article Google Scholar
Du B, Zhang L (2011) Random-selection-based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589
Article Google Scholar
Schaum AP (2007) Hyperspectral anomaly detection beyond RX. In: Proceeding of SPIE, vol. 6565, pp 656502
Lo E (2012) Maximized subspace model for hyperspectral anomaly detection. Pattern Anal Appl 15(3):225–235
Article MathSciNet Google Scholar
Lo E (2013) Variable subspace model for hyperspectral anomaly detection. Pattern Anal Appl 16(3):393–405
Article MathSciNet MATH Google Scholar
Lo E (2014) Variable factorization model based on numerical optimization for hyperspectral anomaly detection. Pattern Anal Appl 17(2):291–310
Article MathSciNet MATH Google Scholar
Morrison DF (1976) Multivariate statistical methods, 2nd edn. McGraw Hill, New York
MATH Google Scholar
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley Interscience, Hoboken
MATH Google Scholar
Kerekes JP, Snyder DK (2010) Unresolved target detection blind test project overview. In: Proceeding of 16th SPIE conference on algorithms and technologies for multispectral, hyperspectral, and ultraspectral imagery, vol 7695, pp 769521

Download references

Acknowledgments

The author wishes to thank the US Naval Research Laboratory, and the Digital Imaging and Remote Sensing Laboratory at the Rochester Institute of Technology for the data.

Author information

Authors and Affiliations

Department of Mathematical Sciences, Susquehanna University, 514 University Avenue, Selinsgrove, PA, 17870, USA
Edisanter Lo

Authors

Edisanter Lo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edisanter Lo.

Appendices

Appendix 1: Maximum likelihood estimation

The maximum likelihood estimation of the unknown coefficient $\varvec{\beta }$ and $\varvec{\delta }$ of the statistical model given in (3) subject to the constraint in (4) is derived in this appendix using standard tools in multivariate statistical analysis [40, 41]. The sample covariance $\varvec{S}$ from a random sample of n pixels is used to estimate the population covariance $\varvec{C}$. Estimating $\varvec{C}$ is equivalent to estimating $\varvec{\beta }$ and $\varvec{\delta }$. The likelihood function for $\varvec{C}$ is the Wishart density function

$$\begin{aligned} L(\varvec{C})=k|\varvec{S}|^{(n-p-1)/2}|\varvec{C}|^{-n/2}e^{-(n/2)tr\left( \varvec{C}^{-1}\varvec{S}\right) }, \end{aligned}$$

(36)

where

$$\begin{aligned} k=\left( \pi ^{(p-1)/4}2^{np/2}\prod _{i=1}^{p}\Gamma \left( \frac{n+1-i}{2}\right) \right) ^{-1}, \end{aligned}$$

(37)

$\Gamma $ is the gamma function, and tr denotes trace. The maximum likelihood estimates of $\varvec{\beta }$ and $\varvec{\delta }$ are obtained by maximizing the logarithm of the likelihood function in (36) subject to the constraint in (4). The logarithm of the likelihood function is

$$\begin{aligned} \phi (\varvec{\beta },\varvec{\delta })&=ln(k)+\frac{1}{2}(n-p-1)\, ln|\varvec{S}|\nonumber \\&\quad -\,\frac{1}{2}n\, ln\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| -\frac{1}{2}n\, tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) . \end{aligned}$$

(38)

The maximum solution of the logarithm of the likelihood function is obtained by differentiating the logarithm of the likelihood function with respect to $\delta _i$ and $\beta _{i,j}$. The derivative of the logarithm of the likelihood function in (38) with respect to $\delta _i$ is

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}&=-\frac{n}{2}\frac{1}{\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| }\, \frac{\partial {}}{\partial {\delta _i}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \nonumber \\&\quad -\,\frac{n}{2}\, \frac{\partial {}}{\partial {\delta _i}}\left( tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) \right) . \end{aligned}$$

(39)

The derivative of the determinant with respect to $\delta _i$ in (39) is

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| =\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {\delta }}{\partial {\delta _i}}\right) . \end{aligned}$$

(40)

The derivative of the trace with respect to $\delta _i$ in (39) is

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =tr\left( \left( \frac{\partial {}}{\partial {\delta _i}}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{S}\right) . \end{aligned}$$

(41)

By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to $\delta _i$ in (41) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) = - tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {\delta }}{\partial {\delta _i}}\right) . \end{aligned}$$

(42)

By substituting the derivative of the determinant in (40) and the derivative of the trace in (42) into (39), the derivative of the logarithm of the likelihood function in (39) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}=-\frac{n}{2}\times tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\left( \varvec{I}-\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\right) . \end{aligned}$$

(43)

The derivative $\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}$ is a matrix with all zeros except in row i and column i. Premultiplying $\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}$ by a matrix preserves only column i of the matrix. By applying the result that the trace of a matrix is the same as the trace of the corresponding diagonal matrix, the derivative of the logarithm of the likelihood function in (43) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}=-\frac{n}{2}\times \left( diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\left( \varvec{I}-\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \right) \right) _{i,i}, \end{aligned}$$

(44)

which is a product of $-n/2$ and the element in row i and column i of the diagonal matrix. The notation $diag(\varvec{B})$ denotes the diagonal matrix with diagonal elements from matrix $\varvec{B}$. By setting $\frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}$ in (44) to zero for $i=1,2,\dots,p$, the derivatives of the logarithm of the likelihood function in (44) becomes

$$\begin{aligned} diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) = diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) . \end{aligned}$$

(45)

The derivative of the logarithm of the likelihood function in (38) with respect to $\beta _{i,j}$ is

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{i,j}}}=-\frac{n}{2}\frac{1}{\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| }\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \right) -\frac{n}{2}\, \frac{\partial {}}{\partial {\beta _{i,j}}}\left( tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) \right) . \end{aligned}$$

(46)

The derivative of the determinant with respect to $\beta _{i,j}$ in (46) is

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| =\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \times tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \right) . \end{aligned}$$

(47)

The derivative of $\varvec{\beta }\varvec{\beta }^T$ with respect to $\beta _{i,j}$ is matrix with all zeros, except row i and column i are the same as column j of $\varvec{\beta }$, and the element in row i and column i has a scalar multiplier of 2, i.e.,

$$\begin{aligned}&\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) =\nonumber \\&\left[ \begin{array}{ccccccc}0 &{}\dots &{}0 &{}\beta _{1,j} &{}0 &{}\dots &{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots \\ 0 &{}\dots &{}0 &{}\beta _{i-1,j} &{}0 &{}\dots &{}0 \\ \beta _{1,j} &{}\dots &{}\beta _{i-1,j} &{}2\beta _{i,j} &{}\beta _{i+1,j} &{}\dots &{}\beta _{p,j} \\ 0 &{}\dots &{}0 &{}\beta _{i+1,j} &{}0 &{}\dots &{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots \\ 0 &{}\dots &{}0 &{}\beta _{p,j} &{}0 &{}\dots &{}0\end{array}\right] . \end{aligned}$$

(48)

The trace of the product generated by premultiplying $\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) $ by a matrix is two times the dot product of row i of the matrix and column j of $\varvec{\beta }$. Thus, the trace in (47) can be written as

$$\begin{aligned} tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \right) =2 \varvec{\nu }_i \varvec{\beta }_j, \end{aligned}$$

(49)

where $\varvec{\nu }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}$, $\varvec{\nu }_i=\left[ \begin{array}{cccc}\nu _{i,1}&\nu _{i,2}&\dots&\nu _{i,p}\end{array}\right] $, and $\varvec{\beta }_j=\left[ \begin{array}{cccc}\beta _{1,j}&\beta _{2,j}&\dots&\beta _{p,j}\end{array}\right] ^T$. The derivative of the trace with respect to $\beta _{i,j}$ in (46) is

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =tr\left( \left( \frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{S}\right) . \end{aligned}$$

(50)

By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to $\beta _{i,j}$ in (50) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) = -tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}} \left( \varvec{\beta } \varvec{\beta }^T\right) \right) . \end{aligned}$$

(51)

Applying the result in (49), the derivative of the trace with respect to $\beta _{i,j}$ in (51) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =-2\varvec{\psi }_i\varvec{\beta }_j, \end{aligned}$$

(52)

where $\varvec{\psi }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}$ and $\varvec{\psi }_i=\left[ \begin{array}{cccc}\psi _{i,1}&\psi _{i,2}&\dots&\psi _{i,p}\end{array}\right] $. By substituting (47), (49), and (52) into (46), the derivative of the logarithm of the likelihood function with respect to $\beta _{i,j}$ in (46) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{i,j}}}=-n\left( \varvec{\nu }_i-\varvec{\psi }_i\right) \varvec{\beta }_j. \end{aligned}$$

(53)

The derivatives of the logarithm of the likelihood function with respect to $\beta _{i,j}$ in (53) for $i=1,2,\dots ,p$ and $j=1,2,\dots ,q$ can be arranged into a matrix as

$$\begin{aligned}&\varvec{\Phi }^{'}=\left[ \begin{array}{cccc}\frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,q}}} \\ \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,q}}} \\ \vdots &{}\quad \vdots &{}\quad \dots &{}\quad \vdots \\ \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,q}}}\end{array}\right] . \end{aligned}$$

(54)

By substituting the derivative in (53) into the matrix in (54), the matrix $\varvec{\Phi }^{'}$ in (54) can be written as

$$\begin{aligned}&\varvec{\Phi }^{'}=-n\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}-\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\beta }. \end{aligned}$$

(55)

By setting the matrix $\varvec{\Phi }^{'}$ to a zero matrix, the resulting equation is

$$\begin{aligned} \varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\beta }=\varvec{\beta }. \end{aligned}$$

(56)

The maximum likelihood estimates for $\varvec{\delta }$ and $\varvec{\beta }$ are given by the two equations in (45) and (56). These equations can be simplified into more useful forms as follows. The left side and right side of Eq. (45) are diagonal matrices. By performing premultiplication and postmultiplication of each side by the diagonal matrix $\varvec{\delta }$, Eq. (45) becomes

$$\begin{aligned} \varvec{\delta }\, diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\delta }=\varvec{\delta }\, diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\delta }. \end{aligned}$$

(57)

Equation (57) can be written as

$$\begin{aligned} diag\left( \varvec{\delta } \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\delta }\right) = diag\left( \varvec{\delta }\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\delta }\right) . \end{aligned}$$

(58)

By using the substitution $\varvec{\delta }=\varvec{C}-\varvec{\beta }\varvec{\beta }^T$ and $\varvec{C}=\varvec{\delta }+\varvec{\beta }\varvec{\beta }^T$ in Eq. (58), Eq. (58) becomes

$$\begin{aligned} diag\left( \left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \varvec{C}^{-1}\left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \right) = diag\left( \left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \varvec{C}^{-1}\varvec{S}\varvec{C}^{-1}\left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \right) . \end{aligned}$$

(59)

By multiplying out the matrices in Eq. (59), Eq. (59) can be simplified to

$$\begin{aligned} diag\left( \varvec{C}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) = diag\left( \varvec{S}-\varvec{S}\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T-\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{S}+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{S}\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) . \end{aligned}$$

(60)

By using Eq. (56) to simplify the right side of Eq. (60), Eq. (60) becomes

$$\begin{aligned} diag\left( \varvec{C}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) = diag\left( \varvec{S}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) . \end{aligned}$$

(61)

Equation (61) simplifies to the final form with no inverse as

$$\begin{aligned} diag\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) =diag\left( \varvec{S}\right) . \end{aligned}$$

(62)

Equation (56) can be simplified by using the identity

$$\begin{aligned} \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\beta }=\varvec{\delta }^{-1}\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) ^{-1} \end{aligned}$$

(63)

to obtain

$$\begin{aligned} \varvec{S}\varvec{\delta }^{-1}\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) ^{-1}=\varvec{\beta }. \end{aligned}$$

(64)

It is easier to find the inverse of a diagonal matrix than a full matrix, so Eq. (60) is written in the final form as

$$\begin{aligned} \varvec{S}\varvec{\delta }^{-1}\varvec{\beta }=\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) . \end{aligned}$$

(65)

Appendix 2: Iterative algorithm

An iterative algorithm for solving the system of nonlinear equations in (11) and (12) is derived in this appendix. The derivation is obtained using standard tools in multivariate statistical analysis [38, 39].The initial values for $\varvec{\beta }$ and $\varvec{\delta }$ in the iterative algorithm can be obtained by approximating the model in (3) with the maximized subspace model [34]. Using the notations from the model in (1), the MSM detector approximates the pixel $\varvec{z}$ with a linear transformation of the high-variance principal components $\varvec{w}$. The columns of the transformation matrix $\varvec{\gamma }$ have been derived in [34] to be the eigenvectors of the covariance of the pixel. An approach to obtain the initial values is to model the pixel $\varvec{z}$ as

$$\begin{aligned} \varvec{z}=\varvec{\xi }\, \varvec{w} \end{aligned},$$

(66)

where $\varvec{\xi }=\left[ \begin{array}{cccc}\varvec{\xi }_1&\varvec{\xi }_2&\dots&\varvec{\xi }_q \end{array}\right] $ and $\left( \tau _i,\varvec{\xi }_i\right) $ is the eigenvalue–eigenvector pair of the covariance of $\varvec{z}$ for $i=1,2,\dots ,q$. By substituting $\varvec{z}$ in (66) into $Cov(\varvec{z},\varvec{z}^T)$, the covariance of $\varvec{z}$ simplifies to

$$\begin{aligned} Cov\left( \varvec{z},\varvec{z}^T\right) =\varvec{\xi }\, \varvec{\tau }\, \varvec{\xi }^T=(\varvec{\xi }\, \varvec{\tau }^{\frac{1}{2}}) (\varvec{\xi }\, \varvec{\tau }^{\frac{1}{2}})^T, \end{aligned}$$

(67)

where $\varvec{\tau }$ is a diagonal matrix with diagonal elements $\tau _1,\tau _2,\dots ,\tau _q$. Since $\varvec{x}=\varvec{z}-\varvec{\mu }_z$, the covariance of $\varvec{z}$ and the covariance of $\varvec{x}$ are the same. Thus, Eq. (67) can be written in terms of the eigenvalues and eigenvectors of $\varvec{x}$ as

$$\begin{aligned} Cov\left( \varvec{z},\varvec{z}^T\right) =(\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}}) (\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}})^T, \end{aligned}$$

(68)

where $\varvec{\upsilon }=\left[ \begin{array}{cccc}\varvec{\upsilon }_1&\varvec{\upsilon }_2&\dots&\varvec{\upsilon }_q \end{array}\right] $, $\varvec{\omega }$ is a diagonal matrix with diagonal elements $\omega _1,\omega _2,\dots ,\omega _q$, and $\left( \omega _i,\varvec{\upsilon _i}\right) $ is the eigenvalue–eigenvector pair of the covariance of $\varvec{x}$ for $i=1,2,\dots ,q$, where $\omega _1\ge \omega _2\ge \dots \ge \omega _q> 0$ . In order to obtain the initial values, the pixel $\varvec{x}$ in (3) is modeled as

$$\begin{aligned} \varvec{x}=\varvec{\beta }\, \varvec{y}. \end{aligned}$$

(69)

By substituting $\varvec{x}$ in (69) into the definition of the covariance of $\varvec{x}$, the covariance of $\varvec{x}$ becomes

$$\begin{aligned} Cov\left( \varvec{x},\varvec{x}^T\right) =\varvec{\beta }\, \varvec{\beta }^T. \end{aligned}$$

(70)

It follows from (68) and (70) that

$$\begin{aligned} \varvec{\beta }=\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}}. \end{aligned}$$

(71)

By estimating the unknown covariance of $\varvec{x}$ with its known sample covariance $\varvec{S}$, the computable initial value for $\varvec{\beta }$ denoted by $\varvec{\beta }^{(0)}$ is

$$\begin{aligned} \varvec{\beta }^{(0)}=\varvec{\upsilon }^{(0)}\, \left( \varvec{\omega }^{(0)}\right) ^{\frac{1}{2}}, \end{aligned}$$

(72)

where $\left( \omega _i^{(0)},\,\varvec{\upsilon }_i^{(0)}\right) $ is the eigenvalue–eigenvector pair of $\varvec{S}$. From Eq. (12), the computable initial value for $\varvec{\delta }$ denoted by $\varvec{\delta }^{(0)}$ is

$$\begin{aligned} \varvec{\delta }^{(0)}=diag\left( \varvec{S}-\varvec{\beta }^{(0)}\left( \varvec{\beta }^{(0)}\right) ^T\right) . \end{aligned}$$

(73)

Let $\varvec{\alpha }=\left[ \begin{array}{cccc}\varvec{\alpha }_1&\varvec{\alpha }_2&\dots&\varvec{\alpha }_q\end{array}\right] $ and $\varvec{\lambda }$ be a diagonal matrix with diagonal elements $\lambda _1,\lambda _2,\dots ,\lambda _q$. Let the superscript (j) denote the jth iterate. Then the jth iterates of $\varvec{\beta }$ and $\varvec{\delta }$ denoted by $\varvec{\beta }^{(j)}$ and $\varvec{\delta }^{(j)}$ for $j=1,2,\dots $ are

$$\begin{aligned}&\varvec{\beta }^{(j)}=\varvec{\alpha }^{(j)} \, \left( \varvec{\lambda }^{(j)}\right) ^{\frac{1}{2}}\end{aligned}$$

(74)

$$\begin{aligned}&\varvec{\delta }^{(j)}=diag\left( \varvec{S}-\varvec{\beta }^{(j)}(\varvec{\beta }^{(j)})^T\right) , \end{aligned}$$

(75)

where $\left( \lambda _i^{(j)},\,\varvec{\alpha }_i^{(j)}\right) $ is the eigenvalue–eigenvector pair of $\varvec{B}^{(j)}$, and

$$\begin{aligned} \varvec{B}^{(j)}&=\left( \varvec{\delta }^{(j-1)}\right) ^{-\frac{1}{2}}\left( \varvec{S}-\varvec{\delta }^{(j-1)}\right) \left( \varvec{\delta }^{(j-1)}\right) ^{-\frac{1}{2}}. \end{aligned}$$

(76)

A typical convergence criterion for stopping the iterations is based on the norm or relative norm of the difference of successive iterates of $\varvec{\beta }$ and $\varvec{\delta }$. The initial estimate $\varvec{\beta }^{(0)}$ in (72) and the $j$ th iterate $\varvec{\beta }^{(j)}$ in (74) are derived using the models in (66) and (69), which are simplified from the full models in (1) and (3). However, the initial estimate $\varvec{\delta }^{(0)}$ in (73) and the $j$th iterate $\varvec{\delta }^{(j)}$ in (75) are derived using the full models in (1) and (3). Therefore, $\varvec{\beta }^{(j)}$ may fail to converge and $\varvec{\delta }^{(j)}$ would be a better convergence criterion. Thus, the iteration converges when the norm of the successive iterates $\varvec{\delta }^{(j)}$ and $\varvec{\delta }^{(j-1)}$ exceeds a prescribed tolerance tol, i.e.,

$$\begin{aligned} \left| \left| \varvec{\delta }^{(j)}-\varvec{\delta }^{(j-1)}\right| \right| >tol. \end{aligned}$$

(77)

An alternative convergence criterion to that in (77) is based on the assumption that the pixel from the image actually fits the model in (3). If the model in (3) is the correct model for the pixel from the image, the covariance of the error would almost be a diagonally dominant matrix in which the off-diagonal elements would be close to zero and some diagonal elements would be dominant. As the iteration progresses, the off-diagonal elements would continue to approach closer to zero and the variance of the diagonal elements would continue to increase at a slower rate. Consequently, the iteration would converge when there is no significant change in the covariance of the error from two successive iterations. Thus, the alternative convergence criterion is to terminate the iteration when the ratio between the absolute value of the difference between the two variances of the diagonal elements of $\varvec{\delta }^{(j)}$ and $\varvec{\delta }^{(j-1)}$ and the absolute value of the variance of the diagonal elements of $\varvec{\delta }^{(j-1)}$ exceeds a specified tolerance tol, i.e.,

$$\begin{aligned} \frac{\left| var\left( diag\left( \varvec{\delta }^{(j)}\right) \right) -var\left( diag\left( \varvec{\delta }^{(j-1)}\right) \right) \right| }{\left| var\left( diag\left( \varvec{\delta }^{(j-1)}\right) \right) \right| }> tol, \end{aligned}$$

(78)

where $var\left( diag\left( \varvec{\delta }^{(j)}\right) \right) $ denotes the variance of the diagonal elements of $\varvec{\delta }^{(j)}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lo, E. Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model. Pattern Anal Applic 20, 531–555 (2017). https://doi.org/10.1007/s10044-015-0519-6

Download citation

Received: 30 December 2014
Accepted: 09 September 2015
Published: 22 September 2015
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10044-015-0519-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model

Abstract

Access this article

Similar content being viewed by others

Hyperspectral anomaly detection based on uniformly partitioned pixel

Hybrid anomaly detection method for hyperspectral images

Developing an algorithm for local anomaly detection based on spectral space window in hyperspectral image

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Maximum likelihood estimation

Appendix 2: Iterative algorithm

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model

Abstract

Access this article

Similar content being viewed by others

Hyperspectral anomaly detection based on uniformly partitioned pixel

Hybrid anomaly detection method for hyperspectral images

Developing an algorithm for local anomaly detection based on spectral space window in hyperspectral image

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Maximum likelihood estimation

Appendix 2: Iterative algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation