Abstract
Anomaly detection in a large area using hyperspectral imaging is an important application in real-time remote sensing. Anomaly detectors based on subspace models are suitable for such an anomaly and usually assume the main background subspace and its dimensions are known. These detectors can detect the anomaly for a range of values of the dimension of the subspace. The objective of this paper is to develop an anomaly detector that extends this range of values by assuming main background subspace with an unknown user-specified dimension and by imposing covariance of error to be a diagonal matrix. A pixel from the image is modeled as the sum of a linear combination of the unknown main background subspace and an unknown error. By having more unknown quantities, there are more degrees of freedom to find a better way to fit data to the model. By having a diagonal matrix for the covariance of the error, the error components become uncorrelated. The coefficients of the linear combination are unknown, but are solved by using a maximum likelihood estimation. Experimental results using real hyperspectral images show that the anomaly detector can detect the anomaly for a significantly larger range of values for the dimension of the subspace than conventional anomaly detectors.
Similar content being viewed by others
References
Matteoli S, Diani M, Corsini G (2010) A Tutorial overview of anomaly detection in hyperspectral images. IEEE Trans Aerosp Electron Syst Mag 25(7):5–27
Stein DWJ, Beaven SG, Hoff LE, Winter EM, Schaum AP, Stocker AD (2002) Anomaly detection from hyperspectral imagery. IEEE Signal Process Mag 19(1):58–69
Reed IS, Yu X (1990) Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans Acoust Speech Signal Process 38(10):1760–1770
Schweizer SM, Moura JMF (2000) Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Trans Geosci Remote Sens 46(5):1855–1871
Schweizer SM, Moura JMF (2001) Efficient detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 10(4):584–597
Lo E, Ingram J (2008) Hyperspectral anomaly detection based on minimum generalized variance method. In: Proceedings of SPIE, vol. 6966, p 696603
Fowler J, Du Q (2012) Anomaly detection and reconstruction from random projections. IEEE Trans Image Process 21(1):184–195
Du B, Zhang L (2010) Random selection based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589
Khazai S, Safari A, Mojaradi B, Homayouni S (2012) An approach for subpixel anomaly detection in hyperspectral images. IEEE J Sel Top Appl Earth Obs Remote Sens 5(2):470–477
McKenzie P, Alder M (1994) Selecting the optimal number of components for a Gaussian mixture model. In: Proceedings of IEEE international symposium on information theory, pp 393
Yeung KY, Fraley C, Murua A, Rafery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Kyrgyzov IO, Kyrgyzov OO, Maitre H, Campedel M (2007) Kernel MDL to determine the number of clusters, vol 4571., Lecture Notes in Computer ScienceSpringer, Berlin, pp 203–217
Chang CI, Du Q (2004) Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans Geosci Remote Sens 42(3):608–619
Masson P, Pieczynski W (1993) SEM algorithm and unsupervised statistical segmentation of satellite images. IEEE Trans Geosci Remote Sens 31(3):618–633
Ashton EA (1998) Detection of subpixel anomalies in multispectral infrared imagery using an adaptive Bayesian classifier. IEEE Trans Geosci Remote Sens 36(2):506–517
Carlotto MJ (2005) A cluster-based approach for detecting man-made objects and changes in imagery. IEEE Trans Geosci Remote Sens 43(2):374–387
Duran O, Petrou M (2005) A time-efficient clustering method for pure class selection. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 510–513
Duran O, Petrou M (2007) A time-efficient method for anomaly detection in hyperspectral images. IEEE Trans Geosci Remote Sens 45(12):3894–3904
Duran O, Petrou M, Hathaway D, Nothard J (2006) Anomaly detection through adaptive background class extraction from dynamic hyperspectral data. In: Proceedings of IEEE Nordic Signal Processing Conference, pp 234–237
Penn B (2002) A time-efficient method for anomaly detection in hyperspectral images. In: Proceedings of IEEE Aerospace Conference vol 3, pp 1531–1535
Chen JY, Reed IS (1987) A detection algorithm for optical targets in clutter. IEEE Trans Aerosp Electron Syst 23(1):394–405
Kwon H, Der SZ, Nasrabadi NM (2003) Using self-organizing maps for anomaly detection in hyperspectral imagery. Opt Eng 42(11):3342–3351
Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Banerjee A, Burlina P, Diehl C (2006) A support vector method for anomaly detection in hyperspectral imagery. IEEE Trans Geosci Remote Sens 44(8):2282–2291
Goldberg H, Kwon H, Nasrabadi NM (2007) Kernel eigenspace separation transform for subspace anomaly detection in hyperspectral imagery. IEEE Geosci Remote Sens Lett 4(4):581–585
Bowles J, Chen W, Gillis D (2003) ORASIS framework-benefits to working within the linear mixing model. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium vol 1, pp 96–98
Winter ME (1999) Fast autonomous spectral endmember determination in hyperspectral data. In: Proceedings of 13th international conference on applied geologic remote sensing, vol 2, pp 337–344
Winter ME (2000) Comparison of approaches for determining end-members in hyperspectral data. In: Proceedings of IEEE aerospace conference vol 3, pp 305–313
Nascimento JMP, Bioucas JM (2005) Dias, vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 43(4):898–909
Chang CI (2005) Orthogonal subspace projection (OSP) revisited: a comprehensive study and analysis. IEEE Trans Geosci Remote Sens 43(3):502–518
Ranney KI, Soumekh M (2006) Hyperspectral anomaly detection within the signal subspace. IEEE Geosci Remote Sens Lett 3(3):312–316
Duran O, Petrou M (2009) Spectral unmixing with negative and superunity abundances for subpixel anomaly detection. IEEE Trans Geosci Remote Sens Lett 6(1):152–156
Du B, Zhang L (2014) A discriminative metric learning based anomaly detection method. IEEE Trans Geosci Remote Sens 52(11):6844–6857
Zhao R, Du B, Zhang L (2014) Robust nonlinear hyperspectral anomaly detection approach. IEEE J Sel Top Appl Earth Obs Remote Sens 7(4):1227–1234
Du B, Zhang L (2011) Random-selection-based anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 49(5):1578–1589
Schaum AP (2007) Hyperspectral anomaly detection beyond RX. In: Proceeding of SPIE, vol. 6565, pp 656502
Lo E (2012) Maximized subspace model for hyperspectral anomaly detection. Pattern Anal Appl 15(3):225–235
Lo E (2013) Variable subspace model for hyperspectral anomaly detection. Pattern Anal Appl 16(3):393–405
Lo E (2014) Variable factorization model based on numerical optimization for hyperspectral anomaly detection. Pattern Anal Appl 17(2):291–310
Morrison DF (1976) Multivariate statistical methods, 2nd edn. McGraw Hill, New York
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley Interscience, Hoboken
Kerekes JP, Snyder DK (2010) Unresolved target detection blind test project overview. In: Proceeding of 16th SPIE conference on algorithms and technologies for multispectral, hyperspectral, and ultraspectral imagery, vol 7695, pp 769521
Acknowledgments
The author wishes to thank the US Naval Research Laboratory, and the Digital Imaging and Remote Sensing Laboratory at the Rochester Institute of Technology for the data.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Maximum likelihood estimation
The maximum likelihood estimation of the unknown coefficient \(\varvec{\beta }\) and \(\varvec{\delta }\) of the statistical model given in (3) subject to the constraint in (4) is derived in this appendix using standard tools in multivariate statistical analysis [40, 41]. The sample covariance \(\varvec{S}\) from a random sample of n pixels is used to estimate the population covariance \(\varvec{C}\). Estimating \(\varvec{C}\) is equivalent to estimating \(\varvec{\beta }\) and \(\varvec{\delta }\). The likelihood function for \(\varvec{C}\) is the Wishart density function
where
\(\Gamma \) is the gamma function, and tr denotes trace. The maximum likelihood estimates of \(\varvec{\beta }\) and \(\varvec{\delta }\) are obtained by maximizing the logarithm of the likelihood function in (36) subject to the constraint in (4). The logarithm of the likelihood function is
The maximum solution of the logarithm of the likelihood function is obtained by differentiating the logarithm of the likelihood function with respect to \(\delta _i\) and \(\beta _{i,j}\). The derivative of the logarithm of the likelihood function in (38) with respect to \(\delta _i\) is
The derivative of the determinant with respect to \(\delta _i\) in (39) is
The derivative of the trace with respect to \(\delta _i\) in (39) is
By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to \(\delta _i\) in (41) becomes
By substituting the derivative of the determinant in (40) and the derivative of the trace in (42) into (39), the derivative of the logarithm of the likelihood function in (39) becomes
The derivative \(\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\) is a matrix with all zeros except in row i and column i. Premultiplying \(\frac{\partial {\varvec{\delta }}}{\partial {\delta _i}}\) by a matrix preserves only column i of the matrix. By applying the result that the trace of a matrix is the same as the trace of the corresponding diagonal matrix, the derivative of the logarithm of the likelihood function in (43) becomes
which is a product of \(-n/2\) and the element in row i and column i of the diagonal matrix. The notation \(diag(\varvec{B})\) denotes the diagonal matrix with diagonal elements from matrix \(\varvec{B}\). By setting \(\frac{\partial {\phi (\varvec{\beta },\varvec{\delta })}}{\partial {\delta _i}}\) in (44) to zero for \(i=1,2,\dots,p\), the derivatives of the logarithm of the likelihood function in (44) becomes
The derivative of the logarithm of the likelihood function in (38) with respect to \(\beta _{i,j}\) is
The derivative of the determinant with respect to \(\beta _{i,j}\) in (46) is
The derivative of \(\varvec{\beta }\varvec{\beta }^T\) with respect to \(\beta _{i,j}\) is matrix with all zeros, except row i and column i are the same as column j of \(\varvec{\beta }\), and the element in row i and column i has a scalar multiplier of 2, i.e.,
The trace of the product generated by premultiplying \(\frac{\partial {}}{\partial {\beta _{i,j}}}\left( \varvec{\beta }\varvec{\beta }^T\right) \) by a matrix is two times the dot product of row i of the matrix and column j of \(\varvec{\beta }\). Thus, the trace in (47) can be written as
where \(\varvec{\nu }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\), \(\varvec{\nu }_i=\left[ \begin{array}{cccc}\nu _{i,1}&\nu _{i,2}&\dots&\nu _{i,p}\end{array}\right] \), and \(\varvec{\beta }_j=\left[ \begin{array}{cccc}\beta _{1,j}&\beta _{2,j}&\dots&\beta _{p,j}\end{array}\right] ^T\). The derivative of the trace with respect to \(\beta _{i,j}\) in (46) is
By taking the derivative of the inverse and performing cyclic permutation on the trace, the derivative of the trace with respect to \(\beta _{i,j}\) in (50) becomes
Applying the result in (49), the derivative of the trace with respect to \(\beta _{i,j}\) in (51) becomes
where \(\varvec{\psi }=\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\varvec{S}\left( \varvec{\delta }+\varvec{\beta } \varvec{\beta }^T\right) ^{-1}\) and \(\varvec{\psi }_i=\left[ \begin{array}{cccc}\psi _{i,1}&\psi _{i,2}&\dots&\psi _{i,p}\end{array}\right] \). By substituting (47), (49), and (52) into (46), the derivative of the logarithm of the likelihood function with respect to \(\beta _{i,j}\) in (46) becomes
The derivatives of the logarithm of the likelihood function with respect to \(\beta _{i,j}\) in (53) for \(i=1,2,\dots ,p\) and \(j=1,2,\dots ,q\) can be arranged into a matrix as
By substituting the derivative in (53) into the matrix in (54), the matrix \(\varvec{\Phi }^{'}\) in (54) can be written as
By setting the matrix \(\varvec{\Phi }^{'}\) to a zero matrix, the resulting equation is
The maximum likelihood estimates for \(\varvec{\delta }\) and \(\varvec{\beta }\) are given by the two equations in (45) and (56). These equations can be simplified into more useful forms as follows. The left side and right side of Eq. (45) are diagonal matrices. By performing premultiplication and postmultiplication of each side by the diagonal matrix \(\varvec{\delta }\), Eq. (45) becomes
Equation (57) can be written as
By using the substitution \(\varvec{\delta }=\varvec{C}-\varvec{\beta }\varvec{\beta }^T\) and \(\varvec{C}=\varvec{\delta }+\varvec{\beta }\varvec{\beta }^T\) in Eq. (58), Eq. (58) becomes
By multiplying out the matrices in Eq. (59), Eq. (59) can be simplified to
By using Eq. (56) to simplify the right side of Eq. (60), Eq. (60) becomes
Equation (61) simplifies to the final form with no inverse as
Equation (56) can be simplified by using the identity
to obtain
It is easier to find the inverse of a diagonal matrix than a full matrix, so Eq. (60) is written in the final form as
Appendix 2: Iterative algorithm
An iterative algorithm for solving the system of nonlinear equations in (11) and (12) is derived in this appendix. The derivation is obtained using standard tools in multivariate statistical analysis [38, 39].The initial values for \(\varvec{\beta }\) and \(\varvec{\delta }\) in the iterative algorithm can be obtained by approximating the model in (3) with the maximized subspace model [34]. Using the notations from the model in (1), the MSM detector approximates the pixel \(\varvec{z}\) with a linear transformation of the high-variance principal components \(\varvec{w}\). The columns of the transformation matrix \(\varvec{\gamma }\) have been derived in [34] to be the eigenvectors of the covariance of the pixel. An approach to obtain the initial values is to model the pixel \(\varvec{z}\) as
where \(\varvec{\xi }=\left[ \begin{array}{cccc}\varvec{\xi }_1&\varvec{\xi }_2&\dots&\varvec{\xi }_q \end{array}\right] \) and \(\left( \tau _i,\varvec{\xi }_i\right) \) is the eigenvalue–eigenvector pair of the covariance of \(\varvec{z}\) for \(i=1,2,\dots ,q\). By substituting \(\varvec{z}\) in (66) into \(Cov(\varvec{z},\varvec{z}^T)\), the covariance of \(\varvec{z}\) simplifies to
where \(\varvec{\tau }\) is a diagonal matrix with diagonal elements \(\tau _1,\tau _2,\dots ,\tau _q\). Since \(\varvec{x}=\varvec{z}-\varvec{\mu }_z\), the covariance of \(\varvec{z}\) and the covariance of \(\varvec{x}\) are the same. Thus, Eq. (67) can be written in terms of the eigenvalues and eigenvectors of \(\varvec{x}\) as
where \(\varvec{\upsilon }=\left[ \begin{array}{cccc}\varvec{\upsilon }_1&\varvec{\upsilon }_2&\dots&\varvec{\upsilon }_q \end{array}\right] \), \(\varvec{\omega }\) is a diagonal matrix with diagonal elements \(\omega _1,\omega _2,\dots ,\omega _q\), and \(\left( \omega _i,\varvec{\upsilon _i}\right) \) is the eigenvalue–eigenvector pair of the covariance of \(\varvec{x}\) for \(i=1,2,\dots ,q\), where \(\omega _1\ge \omega _2\ge \dots \ge \omega _q> 0\) . In order to obtain the initial values, the pixel \(\varvec{x}\) in (3) is modeled as
By substituting \(\varvec{x}\) in (69) into the definition of the covariance of \(\varvec{x}\), the covariance of \(\varvec{x}\) becomes
It follows from (68) and (70) that
By estimating the unknown covariance of \(\varvec{x}\) with its known sample covariance \(\varvec{S}\), the computable initial value for \(\varvec{\beta }\) denoted by \(\varvec{\beta }^{(0)}\) is
where \(\left( \omega _i^{(0)},\,\varvec{\upsilon }_i^{(0)}\right) \) is the eigenvalue–eigenvector pair of \(\varvec{S}\). From Eq. (12), the computable initial value for \(\varvec{\delta }\) denoted by \(\varvec{\delta }^{(0)}\) is
Let \(\varvec{\alpha }=\left[ \begin{array}{cccc}\varvec{\alpha }_1&\varvec{\alpha }_2&\dots&\varvec{\alpha }_q\end{array}\right] \) and \(\varvec{\lambda }\) be a diagonal matrix with diagonal elements \(\lambda _1,\lambda _2,\dots ,\lambda _q\). Let the superscript (j) denote the jth iterate. Then the jth iterates of \(\varvec{\beta }\) and \(\varvec{\delta }\) denoted by \(\varvec{\beta }^{(j)}\) and \(\varvec{\delta }^{(j)}\) for \(j=1,2,\dots \) are
where \(\left( \lambda _i^{(j)},\,\varvec{\alpha }_i^{(j)}\right) \) is the eigenvalue–eigenvector pair of \(\varvec{B}^{(j)}\), and
A typical convergence criterion for stopping the iterations is based on the norm or relative norm of the difference of successive iterates of \(\varvec{\beta }\) and \(\varvec{\delta }\). The initial estimate \(\varvec{\beta }^{(0)}\) in (72) and the \(j\) th iterate \(\varvec{\beta }^{(j)}\) in (74) are derived using the models in (66) and (69), which are simplified from the full models in (1) and (3). However, the initial estimate \(\varvec{\delta }^{(0)}\) in (73) and the \(j\)th iterate \(\varvec{\delta }^{(j)}\) in (75) are derived using the full models in (1) and (3). Therefore, \(\varvec{\beta }^{(j)}\) may fail to converge and \(\varvec{\delta }^{(j)}\) would be a better convergence criterion. Thus, the iteration converges when the norm of the successive iterates \(\varvec{\delta }^{(j)}\) and \(\varvec{\delta }^{(j-1)}\) exceeds a prescribed tolerance tol, i.e.,
An alternative convergence criterion to that in (77) is based on the assumption that the pixel from the image actually fits the model in (3). If the model in (3) is the correct model for the pixel from the image, the covariance of the error would almost be a diagonally dominant matrix in which the off-diagonal elements would be close to zero and some diagonal elements would be dominant. As the iteration progresses, the off-diagonal elements would continue to approach closer to zero and the variance of the diagonal elements would continue to increase at a slower rate. Consequently, the iteration would converge when there is no significant change in the covariance of the error from two successive iterations. Thus, the alternative convergence criterion is to terminate the iteration when the ratio between the absolute value of the difference between the two variances of the diagonal elements of \(\varvec{\delta }^{(j)}\) and \(\varvec{\delta }^{(j-1)}\) and the absolute value of the variance of the diagonal elements of \(\varvec{\delta }^{(j-1)}\) exceeds a specified tolerance tol, i.e.,
where \(var\left( diag\left( \varvec{\delta }^{(j)}\right) \right) \) denotes the variance of the diagonal elements of \(\varvec{\delta }^{(j)}\).
Rights and permissions
About this article
Cite this article
Lo, E. Hyperspectral anomaly detection based on constrained eigenvalue–eigenvector model. Pattern Anal Applic 20, 531–555 (2017). https://doi.org/10.1007/s10044-015-0519-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0519-6