Skip to main content
Log in

Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Anomaly detection in a large area using hyperspectral imaging is an important application in real-time remote sensing. Anomaly detectors based on subspace models are suitable for such an anomaly and usually assume the main background subspace and its dimensions are known. These detectors can detect the anomaly for a range of values of the dimension of the subspace. The objective of this paper is to develop an anomaly detector that extends this range of values by assuming main background subspace with an unknown user-specified dimension and by imposing covariance of error to be a diagonal matrix. A pixel from the image is modeled as the sum of a linear combination of the unknown main background subspace and an unknown error. By having more unknown quantities, there are more degrees of freedom to find a better way to fit data to the model. By having a diagonal matrix for the covariance of the error, the error components become uncorrelated. The coefficients of the linear combination are unknown, but are solved by using a maximum likelihood estimation. Experimental results using real hyperspectral images show that the anomaly detector can detect the anomaly for a significantly larger range of values for the dimension of the subspace than conventional anomaly detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33

Similar content being viewed by others

References

  1. Matteoli S, Diani M, Corsini G (2010) A Tutorial overview of anomaly detection in hyperspectral images. IEEE Trans Aerosp Electron Syst Mag 25(7):5–27

    Article  Google Scholar 

  2. Stein DWJ, Beaven SG, Hoff LE, Winter EM, Schaum AP, Stocker AD (2002) Anomaly detection from hyperspectral imagery. IEEE Signal Process Mag 19(1):58–69

    Article  Google Scholar 

  3. Reed IS, Yu X (1990) Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans Acoust Speech Signal Process 38(10):1760–1770

    Article  Google Scholar 

  4. Schweizer SM, Moura JMF (2000) Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Trans Geosci Remote Sens 46(5):1855–1871

    MATH  Google Scholar 

  5. Schweizer SM, Moura JMF (2001) Efficient detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 10(4):584–597

    MATH  Google Scholar 

  6. Lo E, Ingram J (2008) Hyperspectral anomaly detection based on minimum generalized variance method. In: Proceedings of SPIE, vol. 6966, p 696603

  7. Fowler J, Du Q (2012) Anomaly detection and reconstruction from random projections. IEEE Trans Image Process 21(1):184–195

    Article  MathSciNet  Google Scholar 

  8. Du B, Zhang L (2010) Random selection based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589

    Article  Google Scholar 

  9. Khazai S, Safari A, Mojaradi B, Homayouni S (2012) An approach for subpixel anomaly detection in hyperspectral images. IEEE J Sel Top Appl Earth Obs Remote Sens 5(2):470–477

    Article  Google Scholar 

  10. McKenzie P, Alder M (1994) Selecting the optimal number of components for a Gaussian mixture model. In: Proceedings of IEEE international symposium on information theory, pp 393

  11. Yeung KY, Fraley C, Murua A, Rafery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987

    Article  Google Scholar 

  12. Kyrgyzov IO, Kyrgyzov OO, Maitre H, Campedel M (2007) Kernel MDL to determine the number of clusters, vol 4571., Lecture Notes in Computer ScienceSpringer, Berlin, pp 203–217

    Google Scholar 

  13. Chang CI, Du Q (2004) Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans Geosci Remote Sens 42(3):608–619

    Article  Google Scholar 

  14. Masson P, Pieczynski W (1993) SEM algorithm and unsupervised statistical segmentation of satellite images. IEEE Trans Geosci Remote Sens 31(3):618–633

    Article  Google Scholar 

  15. Ashton EA (1998) Detection of subpixel anomalies in multispectral infrared imagery using an adaptive Bayesian classifier. IEEE Trans Geosci Remote Sens 36(2):506–517

    Article  Google Scholar 

  16. Carlotto MJ (2005) A cluster-based approach for detecting man-made objects and changes in imagery. IEEE Trans Geosci Remote Sens 43(2):374–387

    Article  Google Scholar 

  17. Duran O, Petrou M (2005) A time-efficient clustering method for pure class selection. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 510–513

  18. Duran O, Petrou M (2007) A time-efficient method for anomaly detection in hyperspectral images. IEEE Trans Geosci Remote Sens 45(12):3894–3904

    Article  Google Scholar 

  19. Duran O, Petrou M, Hathaway D, Nothard J (2006) Anomaly detection through adaptive background class extraction from dynamic hyperspectral data. In: Proceedings of IEEE Nordic Signal Processing Conference, pp 234–237

  20. Penn B (2002) A time-efficient method for anomaly detection in hyperspectral images. In: Proceedings of IEEE Aerospace Conference vol 3, pp 1531–1535

  21. Chen JY, Reed IS (1987) A detection algorithm for optical targets in clutter. IEEE Trans Aerosp Electron Syst 23(1):394–405

    Google Scholar 

  22. Kwon H, Der SZ, Nasrabadi NM (2003) Using self-organizing maps for anomaly detection in hyperspectral imagery. Opt Eng 42(11):3342–3351

    Article  Google Scholar 

  23. Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  24. Banerjee A, Burlina P, Diehl C (2006) A support vector method for anomaly detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 44(8):2282–2291

    Article  Google Scholar 

  25. Goldberg H, Kwon H, Nasrabadi NM (2007) Kernel eigenspace separation transform for subspace anomaly detection in hyperspectral imagery. IEEE Geosci Remote Sens Lett 4(4):581–585

    Article  Google Scholar 

  26. Bowles J, Chen W, Gillis D (2003) ORASIS framework-benefits to working within the linear mixing model. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 96–98

  27. Winter ME (1999) Fast autonomous spectral endmember determination in hyperspectral data. In: Proceedings of 13th international conference on applied geologic remote sensing, vol 2, pp 337–344

  28. Winter ME (2000) Comparison of approaches for determining end-members in hyperspectral data. In: Proceedings of IEEE aerospace conference vol 3, pp 305–313

  29. Nascimento JMP, Bioucas JM (2005) Dias, vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 43(4):898–909

    Article  Google Scholar 

  30. Chang CI (2005) Orthogonal subspace projection (OSP) revisited: a comprehensive study and analysis. IEEE Trans Geosci Remote Sens 43(3):502–518

    Article  Google Scholar 

  31. Ranney KI, Soumekh M (2006) Hyperspectral anomaly detection within the signal subspace. IEEE Geosci Remote Sens Lett 3(3):312–316

    Article  Google Scholar 

  32. Duran O, Petrou M (2009) Spectral unmixing with negative and superunity abundances for subpixel anomaly detection. IEEE Trans Geosci Remote Sens Lett 6(1):152–156

    Article  Google Scholar 

  33. Du B, Zhang L (2014) A discriminative metric learning based anomaly detection method. IEEE Trans Geosci Remote Sens 52(11):6844–6857

    Article  Google Scholar 

  34. Zhao R, Du B, Zhang L (2014) Robust nonlinear hyperspectral anomaly detection approach. IEEE J Sel Top Appl Earth Obs Remote Sens 7(4):1227–1234

    Article  Google Scholar 

  35. Du B, Zhang L (2011) Random-selection-based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589

    Article  Google Scholar 

  36. Schaum AP (2007) Hyperspectral anomaly detection beyond RX. In: Proceeding of SPIE, vol. 6565, pp 656502

  37. Lo E (2012) Maximized subspace model for hyperspectral anomaly detection. Pattern Anal Appl 15(3):225–235

    Article  MathSciNet  Google Scholar 

  38. Lo E (2013) Variable subspace model for hyperspectral anomaly detection. Pattern Anal Appl 16(3):393–405

    Article  MathSciNet  MATH  Google Scholar 

  39. Lo E (2014) Variable factorization model based on numerical optimization for hyperspectral anomaly detection. Pattern Anal Appl 17(2):291–310

    Article  MathSciNet  MATH  Google Scholar 

  40. Morrison DF (1976) Multivariate statistical methods, 2nd edn. McGraw Hill, New York

    MATH  Google Scholar 

  41. Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley Interscience, Hoboken

    MATH  Google Scholar 

  42. Kerekes JP, Snyder DK (2010) Unresolved target detection blind test project overview. In: Proceeding of 16th SPIE conference on algorithms and technologies for multispectral, hyperspectral, and ultraspectral imagery, vol 7695, pp 769521

Download references

Acknowledgments

The author wishes to thank the US Naval Research Laboratory, and the Digital Imaging and Remote Sensing Laboratory at the Rochester Institute of Technology for the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edisanter Lo.

Appendices

Appendix 1: Maximum likelihood estimation

The maximum likelihood estimation of the unknown coefficient \(\varvec{\beta }\) and \(\varvec{\delta }\) of the statistical model given in (3) subject to the constraint in (4) is derived in this appendix using standard tools in multivariate statistical analysis [40, 41]. The sample covariance \(\varvec{S}\) from a random sample of n pixels is used to estimate the population covariance \(\varvec{C}\). Estimating \(\varvec{C}\) is equivalent to estimating \(\varvec{\beta }\) and \(\varvec{\delta }\). The likelihood function for \(\varvec{C}\) is the Wishart density function

$$\begin{aligned} L(\varvec{C})=k|\varvec{S}|^{(n-p-1)/2}|\varvec{C}|^{-n/2}e^{-(n/2)tr\left( \varvec{C}^{-1}\varvec{S}\right) }, \end{aligned}$$
(36)

where

$$\begin{aligned} k=\left( \pi ^{(p-1)/4}2^{np/2}\prod _{i=1}^{p}\Gamma \left( \frac{n+1-i}{2}\right) \right) ^{-1}, \end{aligned}$$
(37)

\(\Gamma \) is the gamma function, and tr denotes trace. The maximum likelihood estimates of \(\varvec{\beta }\) and \(\varvec{\delta }\) are obtained by maximizing the logarithm of the likelihood function in (36) subject to the constraint in (4). The logarithm of the likelihood function is

$$\begin{aligned} \phi (\varvec{\beta },\varvec{\delta })&=ln(k)+\frac{1}{2}(n-p-1)\, ln|\varvec{S}|\nonumber \\&\quad -\,\frac{1}{2}n\, ln\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| -\frac{1}{2}n\, tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) . \end{aligned}$$
(38)

The maximum solution of the logarithm of the likelihood function is obtained by differentiating the logarithm of the likelihood function with respect to \(\delta _i\) and \(\beta _{i,j}\). The derivative of the logarithm of the likelihood function in (38) with respect to \(\delta _i\) is

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}&=-\frac{n}{2}\frac{1}{\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| }\, \frac{\partial {}}{\partial {\delta _i}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \nonumber \\&\quad -\,\frac{n}{2}\, \frac{\partial {}}{\partial {\delta _i}}\left( tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) \right) . \end{aligned}$$
(39)

The derivative of the determinant with respect to \(\delta _i\) in (39) is

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| =\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {\delta }}{\partial {\delta _i}}\right) . \end{aligned}$$
(40)

The derivative of the trace with respect to \(\delta _i\) in (39) is

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =tr\left( \left( \frac{\partial {}}{\partial {\delta _i}}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{S}\right) . \end{aligned}$$
(41)

By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to \(\delta _i\) in (41) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\delta _i}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) = - tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {\delta }}{\partial {\delta _i}}\right) . \end{aligned}$$
(42)

By substituting the derivative of the determinant in (40) and the derivative of the trace in (42) into (39), the derivative of the logarithm of the likelihood function in (39) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}=-\frac{n}{2}\times tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\left( \varvec{I}-\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\right) . \end{aligned}$$
(43)

The derivative \(\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\) is a matrix with all zeros except in row i and column i. Premultiplying \(\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\) by a matrix preserves only column i of the matrix. By applying the result that the trace of a matrix is the same as the trace of the corresponding diagonal matrix, the derivative of the logarithm of the likelihood function in (43) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}=-\frac{n}{2}\times \left( diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\left( \varvec{I}-\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \right) \right) _{i,i}, \end{aligned}$$
(44)

which is a product of \(-n/2\) and the element in row i and column i of the diagonal matrix. The notation \(diag(\varvec{B})\) denotes the diagonal matrix with diagonal elements from matrix \(\varvec{B}\). By setting \(\frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}\) in (44) to zero for \(i=1,2,\dots,p\), the derivatives of the logarithm of the likelihood function in (44) becomes

$$\begin{aligned} diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) = diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) . \end{aligned}$$
(45)

The derivative of the logarithm of the likelihood function in (38) with respect to \(\beta _{i,j}\) is

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{i,j}}}=-\frac{n}{2}\frac{1}{\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| }\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \right) -\frac{n}{2}\, \frac{\partial {}}{\partial {\beta _{i,j}}}\left( tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) \right) . \end{aligned}$$
(46)

The derivative of the determinant with respect to \(\beta _{i,j}\) in (46) is

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| =\left| \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right| \times tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \right) . \end{aligned}$$
(47)

The derivative of \(\varvec{\beta }\varvec{\beta }^T\) with respect to \(\beta _{i,j}\) is matrix with all zeros, except row i and column i are the same as column j of \(\varvec{\beta }\), and the element in row i and column i has a scalar multiplier of 2, i.e.,

$$\begin{aligned}&\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) =\nonumber \\&\left[ \begin{array}{ccccccc}0 &{}\dots &{}0 &{}\beta _{1,j} &{}0 &{}\dots &{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots \\ 0 &{}\dots &{}0 &{}\beta _{i-1,j} &{}0 &{}\dots &{}0 \\ \beta _{1,j} &{}\dots &{}\beta _{i-1,j} &{}2\beta _{i,j} &{}\beta _{i+1,j} &{}\dots &{}\beta _{p,j} \\ 0 &{}\dots &{}0 &{}\beta _{i+1,j} &{}0 &{}\dots &{}0 \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots \\ 0 &{}\dots &{}0 &{}\beta _{p,j} &{}0 &{}\dots &{}0\end{array}\right] . \end{aligned}$$
(48)

The trace of the product generated by premultiplying \(\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \) by a matrix is two times the dot product of row i of the matrix and column j of \(\varvec{\beta }\). Thus, the trace in (47) can be written as

$$\begin{aligned} tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \right) =2 \varvec{\nu }_i \varvec{\beta }_j, \end{aligned}$$
(49)

where \(\varvec{\nu }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\), \(\varvec{\nu }_i=\left[ \begin{array}{cccc}\nu _{i,1}&\nu _{i,2}&\dots&\nu _{i,p}\end{array}\right] \), and \(\varvec{\beta }_j=\left[ \begin{array}{cccc}\beta _{1,j}&\beta _{2,j}&\dots&\beta _{p,j}\end{array}\right] ^T\). The derivative of the trace with respect to \(\beta _{i,j}\) in (46) is

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =tr\left( \left( \frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{S}\right) . \end{aligned}$$
(50)

By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to \(\beta _{i,j}\) in (50) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) = -tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\frac{\partial {}}{\partial {\beta _{i,j}}} \left( \varvec{\beta } \varvec{\beta }^T\right) \right) . \end{aligned}$$
(51)

Applying the result in (49), the derivative of the trace with respect to \(\beta _{i,j}\) in (51) becomes

$$\begin{aligned} \frac{\partial {}}{\partial {\beta _{i,j}}}tr\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\right) =-2\varvec{\psi }_i\varvec{\beta }_j, \end{aligned}$$
(52)

where \(\varvec{\psi }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\) and \(\varvec{\psi }_i=\left[ \begin{array}{cccc}\psi _{i,1}&\psi _{i,2}&\dots&\psi _{i,p}\end{array}\right] \). By substituting (47), (49), and (52) into (46), the derivative of the logarithm of the likelihood function with respect to \(\beta _{i,j}\) in (46) becomes

$$\begin{aligned} \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{i,j}}}=-n\left( \varvec{\nu }_i-\varvec{\psi }_i\right) \varvec{\beta }_j. \end{aligned}$$
(53)

The derivatives of the logarithm of the likelihood function with respect to \(\beta _{i,j}\) in (53) for \(i=1,2,\dots ,p\) and \(j=1,2,\dots ,q\) can be arranged into a matrix as

$$\begin{aligned}&\varvec{\Phi }^{'}=\left[ \begin{array}{cccc}\frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{1,q}}} \\ \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{2,q}}} \\ \vdots &{}\quad \vdots &{}\quad \dots &{}\quad \vdots \\ \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,1}}}&{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,2}}}&{}\quad \dots &{}\quad \frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\beta _{p,q}}}\end{array}\right] . \end{aligned}$$
(54)

By substituting the derivative in (53) into the matrix in (54), the matrix \(\varvec{\Phi }^{'}\) in (54) can be written as

$$\begin{aligned}&\varvec{\Phi }^{'}=-n\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}-\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\beta }. \end{aligned}$$
(55)

By setting the matrix \(\varvec{\Phi }^{'}\) to a zero matrix, the resulting equation is

$$\begin{aligned} \varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\beta }=\varvec{\beta }. \end{aligned}$$
(56)

The maximum likelihood estimates for \(\varvec{\delta }\) and \(\varvec{\beta }\) are given by the two equations in (45) and (56). These equations can be simplified into more useful forms as follows. The left side and right side of Eq. (45) are diagonal matrices. By performing premultiplication and postmultiplication of each side by the diagonal matrix \(\varvec{\delta }\), Eq. (45) becomes

$$\begin{aligned} \varvec{\delta }\, diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\delta }=\varvec{\delta }\, diag\left( \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\right) \varvec{\delta }. \end{aligned}$$
(57)

Equation (57) can be written as

$$\begin{aligned} diag\left( \varvec{\delta } \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\delta }\right) = diag\left( \varvec{\delta }\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\delta }\right) . \end{aligned}$$
(58)

By using the substitution \(\varvec{\delta }=\varvec{C}-\varvec{\beta }\varvec{\beta }^T\) and \(\varvec{C}=\varvec{\delta }+\varvec{\beta }\varvec{\beta }^T\) in Eq. (58), Eq. (58) becomes

$$\begin{aligned} diag\left( \left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \varvec{C}^{-1}\left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \right) = diag\left( \left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \varvec{C}^{-1}\varvec{S}\varvec{C}^{-1}\left( \varvec{C}-\varvec{\beta }\varvec{\beta }^T\right) \right) . \end{aligned}$$
(59)

By multiplying out the matrices in Eq. (59), Eq. (59) can be simplified to

$$\begin{aligned} diag\left( \varvec{C}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) = diag\left( \varvec{S}-\varvec{S}\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T-\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{S}+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{S}\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) . \end{aligned}$$
(60)

By using Eq. (56) to simplify the right side of Eq. (60), Eq. (60) becomes

$$\begin{aligned} diag\left( \varvec{C}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) = diag\left( \varvec{S}-2\varvec{\beta }\varvec{\beta }^T+\varvec{\beta }\varvec{\beta }^T\varvec{C}^{-1}\varvec{\beta }\varvec{\beta }^T\right) . \end{aligned}$$
(61)

Equation (61) simplifies to the final form with no inverse as

$$\begin{aligned} diag\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) =diag\left( \varvec{S}\right) . \end{aligned}$$
(62)

Equation (56) can be simplified by using the identity

$$\begin{aligned} \left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{\beta }=\varvec{\delta }^{-1}\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) ^{-1} \end{aligned}$$
(63)

to obtain

$$\begin{aligned} \varvec{S}\varvec{\delta }^{-1}\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) ^{-1}=\varvec{\beta }. \end{aligned}$$
(64)

It is easier to find the inverse of a diagonal matrix than a full matrix, so Eq. (60) is written in the final form as

$$\begin{aligned} \varvec{S}\varvec{\delta }^{-1}\varvec{\beta }=\varvec{\beta }\left( \varvec{I}+\varvec{\beta }^T\varvec{\delta }^{-1}\varvec{\beta }\right) . \end{aligned}$$
(65)

Appendix 2: Iterative algorithm

An iterative algorithm for solving the system of nonlinear equations in (11) and (12) is derived in this appendix. The derivation is obtained using standard tools in multivariate statistical analysis [38, 39].The initial values for \(\varvec{\beta }\) and \(\varvec{\delta }\) in the iterative algorithm can be obtained by approximating the model in (3) with the maximized subspace model [34]. Using the notations from the model in (1), the MSM detector approximates the pixel \(\varvec{z}\) with a linear transformation of the high-variance principal components \(\varvec{w}\). The columns of the transformation matrix \(\varvec{\gamma }\) have been derived in [34] to be the eigenvectors of the covariance of the pixel. An approach to obtain the initial values is to model the pixel \(\varvec{z}\) as

$$\begin{aligned} \varvec{z}=\varvec{\xi }\, \varvec{w} \end{aligned},$$
(66)

where \(\varvec{\xi }=\left[ \begin{array}{cccc}\varvec{\xi }_1&\varvec{\xi }_2&\dots&\varvec{\xi }_q \end{array}\right] \) and \(\left( \tau _i,\varvec{\xi }_i\right) \) is the eigenvalue–eigenvector pair of the covariance of \(\varvec{z}\) for \(i=1,2,\dots ,q\). By substituting \(\varvec{z}\) in (66) into \(Cov(\varvec{z},\varvec{z}^T)\), the covariance of \(\varvec{z}\) simplifies to

$$\begin{aligned} Cov\left( \varvec{z},\varvec{z}^T\right) =\varvec{\xi }\, \varvec{\tau }\, \varvec{\xi }^T=(\varvec{\xi }\, \varvec{\tau }^{\frac{1}{2}}) (\varvec{\xi }\, \varvec{\tau }^{\frac{1}{2}})^T, \end{aligned}$$
(67)

where \(\varvec{\tau }\) is a diagonal matrix with diagonal elements \(\tau _1,\tau _2,\dots ,\tau _q\). Since \(\varvec{x}=\varvec{z}-\varvec{\mu }_z\), the covariance of \(\varvec{z}\) and the covariance of \(\varvec{x}\) are the same. Thus, Eq. (67) can be written in terms of the eigenvalues and eigenvectors of \(\varvec{x}\) as

$$\begin{aligned} Cov\left( \varvec{z},\varvec{z}^T\right) =(\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}}) (\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}})^T, \end{aligned}$$
(68)

where \(\varvec{\upsilon }=\left[ \begin{array}{cccc}\varvec{\upsilon }_1&\varvec{\upsilon }_2&\dots&\varvec{\upsilon }_q \end{array}\right] \), \(\varvec{\omega }\) is a diagonal matrix with diagonal elements \(\omega _1,\omega _2,\dots ,\omega _q\), and \(\left( \omega _i,\varvec{\upsilon _i}\right) \) is the eigenvalue–eigenvector pair of the covariance of \(\varvec{x}\) for \(i=1,2,\dots ,q\), where \(\omega _1\ge \omega _2\ge \dots \ge \omega _q> 0\) . In order to obtain the initial values, the pixel \(\varvec{x}\) in (3) is modeled as

$$\begin{aligned} \varvec{x}=\varvec{\beta }\, \varvec{y}. \end{aligned}$$
(69)

By substituting \(\varvec{x}\) in (69) into the definition of the covariance of \(\varvec{x}\), the covariance of \(\varvec{x}\) becomes

$$\begin{aligned} Cov\left( \varvec{x},\varvec{x}^T\right) =\varvec{\beta }\, \varvec{\beta }^T. \end{aligned}$$
(70)

It follows from (68) and (70) that

$$\begin{aligned} \varvec{\beta }=\varvec{\upsilon }\, \varvec{\omega }^{\frac{1}{2}}. \end{aligned}$$
(71)

By estimating the unknown covariance of \(\varvec{x}\) with its known sample covariance \(\varvec{S}\), the computable initial value for \(\varvec{\beta }\) denoted by \(\varvec{\beta }^{(0)}\) is

$$\begin{aligned} \varvec{\beta }^{(0)}=\varvec{\upsilon }^{(0)}\, \left( \varvec{\omega }^{(0)}\right) ^{\frac{1}{2}}, \end{aligned}$$
(72)

where \(\left( \omega _i^{(0)},\,\varvec{\upsilon }_i^{(0)}\right) \) is the eigenvalue–eigenvector pair of \(\varvec{S}\). From Eq. (12), the computable initial value for \(\varvec{\delta }\) denoted by \(\varvec{\delta }^{(0)}\) is

$$\begin{aligned} \varvec{\delta }^{(0)}=diag\left( \varvec{S}-\varvec{\beta }^{(0)}\left( \varvec{\beta }^{(0)}\right) ^T\right) . \end{aligned}$$
(73)

Let \(\varvec{\alpha }=\left[ \begin{array}{cccc}\varvec{\alpha }_1&\varvec{\alpha }_2&\dots&\varvec{\alpha }_q\end{array}\right] \) and \(\varvec{\lambda }\) be a diagonal matrix with diagonal elements \(\lambda _1,\lambda _2,\dots ,\lambda _q\). Let the superscript (j) denote the jth iterate. Then the jth iterates of \(\varvec{\beta }\) and \(\varvec{\delta }\) denoted by \(\varvec{\beta }^{(j)}\) and \(\varvec{\delta }^{(j)}\) for \(j=1,2,\dots \) are

$$\begin{aligned}&\varvec{\beta }^{(j)}=\varvec{\alpha }^{(j)} \, \left( \varvec{\lambda }^{(j)}\right) ^{\frac{1}{2}}\end{aligned}$$
(74)
$$\begin{aligned}&\varvec{\delta }^{(j)}=diag\left( \varvec{S}-\varvec{\beta }^{(j)}(\varvec{\beta }^{(j)})^T\right) , \end{aligned}$$
(75)

where \(\left( \lambda _i^{(j)},\,\varvec{\alpha }_i^{(j)}\right) \) is the eigenvalue–eigenvector pair of \(\varvec{B}^{(j)}\), and

$$\begin{aligned} \varvec{B}^{(j)}&=\left( \varvec{\delta }^{(j-1)}\right) ^{-\frac{1}{2}}\left( \varvec{S}-\varvec{\delta }^{(j-1)}\right) \left( \varvec{\delta }^{(j-1)}\right) ^{-\frac{1}{2}}. \end{aligned}$$
(76)

A typical convergence criterion for stopping the iterations is based on the norm or relative norm of the difference of successive iterates of \(\varvec{\beta }\) and \(\varvec{\delta }\). The initial estimate \(\varvec{\beta }^{(0)}\) in (72) and the \(j\) th iterate \(\varvec{\beta }^{(j)}\) in (74) are derived using the models in (66) and (69), which are simplified from the full models in (1) and (3). However, the initial estimate \(\varvec{\delta }^{(0)}\) in (73) and the \(j\)th iterate \(\varvec{\delta }^{(j)}\) in (75) are derived using the full models in (1) and (3). Therefore, \(\varvec{\beta }^{(j)}\) may fail to converge and \(\varvec{\delta }^{(j)}\) would be a better convergence criterion. Thus, the iteration converges when the norm of the successive iterates \(\varvec{\delta }^{(j)}\) and \(\varvec{\delta }^{(j-1)}\) exceeds a prescribed tolerance tol, i.e.,

$$\begin{aligned} \left| \left| \varvec{\delta }^{(j)}-\varvec{\delta }^{(j-1)}\right| \right| >tol. \end{aligned}$$
(77)

An alternative convergence criterion to that in (77) is based on the assumption that the pixel from the image actually fits the model in (3). If the model in (3) is the correct model for the pixel from the image, the covariance of the error would almost be a diagonally dominant matrix in which the off-diagonal elements would be close to zero and some diagonal elements would be dominant. As the iteration progresses, the off-diagonal elements would continue to approach closer to zero and the variance of the diagonal elements would continue to increase at a slower rate. Consequently, the iteration would converge when there is no significant change in the covariance of the error from two successive iterations. Thus, the alternative convergence criterion is to terminate the iteration when the ratio between the absolute value of the difference between the two variances of the diagonal elements of \(\varvec{\delta }^{(j)}\) and \(\varvec{\delta }^{(j-1)}\) and the absolute value of the variance of the diagonal elements of \(\varvec{\delta }^{(j-1)}\) exceeds a specified tolerance tol, i.e.,

$$\begin{aligned} \frac{\left| var\left( diag\left( \varvec{\delta }^{(j)}\right) \right) -var\left( diag\left( \varvec{\delta }^{(j-1)}\right) \right) \right| }{\left| var\left( diag\left( \varvec{\delta }^{(j-1)}\right) \right) \right| }> tol, \end{aligned}$$
(78)

where \(var\left( diag\left( \varvec{\delta }^{(j)}\right) \right) \) denotes the variance of the diagonal elements of \(\varvec{\delta }^{(j)}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lo, E. Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model. Pattern Anal Applic 20, 531–555 (2017). https://doi.org/10.1007/s10044-015-0519-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0519-6

Keywords

Navigation