Journal of Signal Processing Systems

, Volume 68, Issue 3, pp 379–390

A Novel Approach for Target Detection and Classification Using Canonical Correlation Analysis

Authors

    • Department of Computer Science and Electrical EngineeringUniversity of Maryland Baltimore County
  • Tülay Adalı
    • Department of Computer Science and Electrical EngineeringUniversity of Maryland Baltimore County
  • Darren Emge
    • US Army, Edgewood Chemical and Biological Center, Aberdeen Proving Grounds
Article

DOI: 10.1007/s11265-011-0625-7

Cite this article as:
Wang, W., Adalı, T. & Emge, D. J Sign Process Syst (2012) 68: 379. doi:10.1007/s11265-011-0625-7

Abstract

We present a novel detection approach, detection with canonical correlation (DCC), for target detection without prior information on the interference. We use the maximum canonical correlations between the target set and the observation data set as the detection statistic, and the coefficients of the canonical vector are used to determine the indices of components from a given target library, thus enabling both detection and classification of the target components that might be present in the mixture. We derive an approximate distribution of the maximum canonical correlation when targets are present. For applications where the contributions of components are non-negative, non-negativity constraints are incorporated into the canonical correlation analysis framework and a recursive algorithm is derived to obtain the solution. We demonstrate the effectiveness of DCC and its nonnegative variant by applying them on detection of surface-deposited chemical agents in Raman spectroscopy.

Keywords

DetectionClassificationCanonical correlation analysis

1 Introduction

Target detection and classification is a fundamental problem in many applications, such as in hyperspectral imaging, computer-aided diagnosis, geophysics, and Raman spectroscopy (see e.g. [13]). The aim of target detection is to determine whether certain components are present or not based on a given set of observations, and identify the target(s) that are present in the observations from a given target library. In conventional detection theory, the detection is performed on a sample-by-sample basis, which can be written as [1, 4]:
$$ \begin{array}{rll} \mathcal{H}_0 : \mathbf{x}&=&\mathbf{G}\mathbf{a}_g+\mathbf{v},\\ \mathcal{H}_1 : \mathbf{x}&=&\mathbf{T}\mathbf{a}_t+\mathbf{G}\mathbf{a}_g +\mathbf{v}, \end{array} $$
(1)
where the N × 1 vector x is a single observation, v a noise vector, at and ag are mixing coefficients. Here \(\mathbf{G}\equiv[\mathbf{g}_1, \mathbf{g}_2, \cdots, \mathbf{g}_{M}]\in \mathcal{R}^{N\times M}\) is a matrix of interference, or background components, and \(\mathbf{T}\equiv[\mathbf{t}_1, \mathbf{t}_2, \cdots, \mathbf{t}_{L}]\in \mathcal{R}^{N\times L}\) is a matrix of components in the target library. In most cases N > > M and L.
A number of methods have been proposed for the detection problem given in Eq. 1—see e.g., [2, 47, 10, 19]. When the interference G is known, matched subspace detectors (MSD) [4, 6] give the generalized likelihood ratio solution as [1]:
$$ L(\mathbf{x})=\frac{\mathbf{x}^TP^{\bot}_{\mathbf{G}}E_{TG}P^{\bot}_{\mathbf{G}}\mathbf{x}}{\mathbf{x}^TP^{\bot}_{\mathbf{G}}\left(\mathbf{I}-E_{TG}\right)P^{\bot}_{\mathbf{G}}\mathbf{x}}, $$
(2)
where \(P^{\bot}_{\mathbf{G}}=\mathbf{I}-P_{\mathbf{G}}\) and \(P_{\mathbf{G}}=\mathbf{G}\left( \mathbf{G}^T\mathbf{G}\right)^{-1}\mathbf{G}^T\) is the projection matrix onto subspace \(\langle \mathbf{G}\rangle\), and \(E_{TG}=\mathbf{T}\left( \mathbf{T}^T P^{\bot}_{\mathbf{G}}\mathbf{T}\right)^{-1}\mathbf{T}^TP^{\bot}_{\mathbf{G}}\) is an oblique projection with range spaces \(\langle \mathbf{G}\rangle\) and null space \(\langle \mathbf{T}\rangle\). In this paper, we use chevrons, e.g., \(\langle \mathbf{T} \rangle\), to denote the subspace spanned by the components in T.
In many applications, however, it is difficult to obtain a reliable prior estimate of the interference components. For example, this is the case in target detection in Raman spectroscopy [2, 3] where the background is either unknown or its estimation might not be reliable. Hence, data-driven methods are more desirable in general since they do not rely on prior information on interference components G [710, 19]. In [3], least squares (LS) and non-negative least squares (NNLS) methods have been used for target detection in Raman spectroscopy:
$$ \rho_{\rm LS}=\|\mathbf{x}-\mathbf{Ta}\|^2,\qquad{\rm and}\qquad\mathbf{a}=\left( \mathbf{T}^T\mathbf{T}\right)^{-1}\mathbf{T}^T\mathbf{x}, $$
(3)
where the interference is simply ignored, thus typically resulting in large residual error when used in detection.

Blind source separation techniques have also been investigated as data-driven detection approaches, such as principal component analysis (PCA), non-negative matrix factorization (NMF), and independent component analysis (ICA), which have shown satisfactory performance for a number of cases (see e.g. [8, 9]). Using ICA, the decision can be made by checking correlations between the extracted components and the target. The limitation of ICA is that the mixing components have to be independent from each other, otherwise, components are split into multiple ones, decreasing the correlation with the target—if it exists in the mixture.

In this paper,1 we propose a data-driven detection method for target detection and classification that does not rely on interference information. Instead of using a single observation, we use multiple observations in the detection. The hypothesis testing problem can be represented as:
$$ \begin{array}{rll} \mathcal{H}_0 : \mathbf{X}&=&\mathbf{G}\mathbf{A}_g+\mathbf{V},\\ \mathcal{H}_1 : \mathbf{X}&=&\mathbf{T}\mathbf{A}_t+\mathbf{G}\mathbf{A}_g+\mathbf{V}, \end{array} $$
(4)
where X ≡ [x1, x2, ⋯ , xB] is an N × B observation matrix where each xi is a single observation. \(\mathbf{A}_g\in \mathcal{R}^{M\times B}\) is the mixing matrix of coefficients corresponding to the background components, \(\mathbf{A}_t\in \mathcal{R}^{L\times B}\) the mixing coefficient of T, and V an N × B noise matrix. N > M, L, and B. Here, \(\mathcal{H}_0\) denotes the absence of the target, and \(\mathcal{H}_1\) its presence.

We use the canonical correlations [12] between the whole target library T and the mixtures X as the detection index, and use the coefficients of canonical vectors to determine which components are present in the mixtures. Hence both the detection and the classification problems can be solved with this approach. We show that, by using additional observations in canonical correlation analysis (CCA), the negative influence of unknown interference components can be mitigated in detection, thus enhancing the overall detection performance. An approximate distribution of the maximum canonical correlation in DCC is derived when targets are present and the noise follows a Gaussian distribution. The distribution is in terms of beta functions with parameters determined by the target library and noise variance, hence it can be used to provide a guidance for choosing the decision threshold in practice.

We also develop a non-negative CCA algorithm to take into account the fact that, in some applications such as Raman spectroscopy and image processing, contributions of mixing components can only be non-negative. When this is the case, it is natural to expect a better detection performance with non-negative CCA (DNCC) by constraining all elements in the canonical vector of the target set to be non-negative.

The proposed detection methods are used in the application to Raman spectroscopy. Raman spectroscopy has been shown to be a powerful technique for non-contact and nondestructive detections and identifications. It uses a laser to probe the vibrational energy levels of a molecule or crystal, thus provides information on molecular structure and chemical composition of materials. In this paper, we consider the application of Raman spectroscopy on non-contact detections of surface-deposited chemical agents, which is particularly useful for detecting environmentally hazardous chemicals [3]. Simulation results demonstrate the effectiveness of the DCC and DNCC methods and their relationship to related methods.

2 Canonical Correlation Analysis

Before we introduce the DCC and DNCC detection methods, we first give a brief review of canonical correlation analysis [12]. Given two sets of vectors, \(\mathbf{X}=[\mathbf{x}_1, \cdots, \mathbf{x}_M], \mathbf{X}\!\in\!{\mathbb{R}}^{N\times M}\), and \(\mathbf{Y}\!=\![\mathbf{y}_1, \cdots, \mathbf{y}_L], \mathbf{Y}\in{\mathbb{R}}^{N\times L}\), and their linear combinations
$$ \begin{array}{rll} \mathbf{X}\mathbf{a}&=&[\mathbf{x}_1, \cdots, \mathbf{x}_M] \left[ \begin{array}{l} a_1 \\ \vdots \\ a_M \end{array} \right]~{\rm and}~\\ \mathbf{Y}\mathbf{b}&=&[\mathbf{y}_1, \cdots, \mathbf{y}_L] \left[ \begin{array}{l} b_1 \\ \vdots \\ b_L \end{array} \right], \end{array} $$
canonical correlation analysis seeks a pair of vectors, a* and b*, that maximize the correlation ρ = corr(Xa, Yb), such that
$$ \rho=\max_{\mathbf{a},\mathbf{b}} {\rm corr}(\mathbf{X}\mathbf{a},\mathbf{Y}\mathbf{b}). $$
(5)
The solution of Eq. 5 can be obtained by solving either of the following two eigenvalue problems:
$$ \begin{array}{rll} \mathbf{C}_{xx}^{-1}\mathbf{C}_{xy}\mathbf{C}_{yy}^{-1}\mathbf{C}_{yx}\mathbf{a}^*=\rho^2\mathbf{a}^* \\ \mathbf{b}^*=\mathbf{C}_{xx}^{-1}\mathbf{C}_{xy}\mathbf{a}^*, \end{array} $$
(6)
or
$$ \begin{array}{rll} \mathbf{C}_{yy}^{-1}\mathbf{C}_{yx}\mathbf{C}_{xx}^{-1}\mathbf{C}_{xy}\mathbf{b}^*=\rho^2\mathbf{b}^*, \\ \mathbf{a}^*=\mathbf{C}_{yy}^{-1}\mathbf{C}_{yx}\mathbf{b}^*, \end{array} $$
(7)
where Cxy is a sample covariance matrix between X and Y, in which the ij-th element is the sample correlation between \(\mathbf{x}_i=[x_{1i}, x_{2i}, \cdots, x_{Ni}]^T\) and \(\mathbf{y}_j=[y_{1j}, y_{2j}, \cdots, y_{Nj}]^T\), i.e.,
$$ {\rm corr}(\mathbf{x}_i,\mathbf{y}_j)=\frac{\sum_{k}(x_{ki}-\bar{x}_i)(y_{kj}-\bar{y}_j)}{\sqrt{\sum_{k}(x_{ki}-\bar{x}_i)^2\sum_k(y_{kj}-\bar{y}_j)^2}} $$
where \(\bar{x}_i=\sum_k x_{ki}/N\) and \(\bar{y}_j=\sum_k y_{kj}/N\). Analogously, Cyx is the sample covariance matrix between Y and X, and Cxx and Cyy are the covariance matrices of X and Y, respectively.

The square roots of the eigenvalues obtained from Eq. 6 or Eq. 7 are called canonical correlations, and the vectors a* and b* canonical vectors.

3 Detection Using Canonical Correlation

In the target detection problem, we need to examine the relationship between the observation data set and the target library. Since the canonical correlation analysis provides information on the closeness of two sets of vectors, we investigate using canonical correlation analysis for the target detection problem.

To help explain the idea of DCC, we remove the noise components in Eq. 4. The maximum canonical correlation between X and T is given by
$$ \rho=\max\limits_{\mathbf{a},\mathbf{b}} {\rm corr}(\mathbf{X}\mathbf{a},\mathbf{T}\mathbf{b}) $$
We can see that
  • Under \(\mathcal{H}_0\):

    Since X = GAg, when the subspaces spanned by G and T are orthogonal,
    $$\rho=0.$$
    Note that the orthogonality condition is just a simplification to emphasize the general idea, and not a requirement of the DCC method.
  • Under \(\mathcal{H}_1\):

    X = TAt + GAg, then
    $$\rho=1.$$
    For example, assume tj is the present target component, i.e., \(\mathbf{X}=\mathbf{t}_j\mathbf{a}_t^j+\mathbf{G}\mathbf{A}_g=\mathbf{SA}\), where S ≡ [tjG] and \(\mathbf{A}\equiv\left[ \begin{array}{l} \mathbf{a}_t^j \\ \mathbf{A}_g \end{array} \right],\) an example solution is a* = A − 1[1, 0, ⋯ , 0]T, and \(\mathbf{b}^*=[0,\cdots,1_{(j)},\cdots,0]^T\), given that A is a non-singular square matrix. This is because that
    $$ \mathbf{X}\mathbf{a}^*=[\mathbf{t}_j \mathbf{G}]\mathbf{Aa}^*=[\mathbf{t}_j, \mathbf{g}_1, \cdots, \mathbf{g}_M] \left[ \begin{array}{l} 1 \\ 0\\ \vdots \\ 0 \end{array} \right]=\mathbf{t}_j,~{\rm and} $$
    $$ \mathbf{T}\mathbf{b}^*=[\mathbf{t}_1, \cdots,\mathbf{t}_j, \cdots, \mathbf{t}_L] \left[ \begin{array}{l} 0 \\ \vdots \\ 1_{(j)} \\ \vdots \\ 0 \end{array} \right]=\mathbf{t}_j, $$
    thus
    $$\rho={\rm corr}(\mathbf{t}_j,\mathbf{t}_j)=1.$$
Note that the non-zero element in b indicates the index of component that is present in the mixture in X. b* might not be unique when components in T are not linearly independent. However, this could be mitigated when the condition of T is improved by approaches such as subspace partitioning [21].
Hence, we propose to use the maximum canonical correlation,
$$ \rho=\max\limits_{\mathbf{a},\mathbf{b}} {\rm corr}(\mathbf{X}\mathbf{a},\mathbf{T}\mathbf{b}) $$
(8)
for the target detection and classification problem. The algorithm proceeds in three steps:
  1. 1.

    Calculate the maximum canonical correlation ρ and canonical vector b* by solving Eq. 6 or Eq. 7

     
  2. 2.

    Use ρ as the detection index to determine if any component in the target set is present

     
  3. 3.

    Use canonical vector, b*, to determine the indices of those components that are present

     
Note that, in the ideal case, the number of observations should be chosen as the number of the mixing components. In practice, if the number of the present target and interference components are unknown, the dimensionality of the mixture subspace can be estimated using methods such as PCA.

If an interference component, i.e., a column in G is linearly dependent to those in T, then its presence may lead to a false alarm for the dependent target components. Therefore, DCC requires that the columns of G and T are not linearly dependent, which is a reasonable assumption in most applications.

Geometric Interpretation   We demonstrate the general idea for the mitigation of interferences using additional observations in DCC in Fig. 1. In a geometric point of view, in subspace based detectors, decisions are always based on the distance between the observations and the target subspace. A good detector yields a large distance, i.e., discrimination power, when targets are absent and a small one when targets are present. Here, we assume a noiseless environment, thus observations only consist of components from \(\langle \mathbf{T} \rangle\) and \(\langle \mathbf{G} \rangle\). Let \(\mathbf{x}_0\in \langle \mathbf{G} \rangle\) be the observation when targets are absent. Then \({\mbox{\boldmath $\delta$}}_0=\mathbf{x}_0-\hat{\mathbf{x}}_0\), where \(\hat{\mathbf{x}}_0\) is the orthogonal projection of x0 onto \(\langle \mathbf{T} \rangle\), is a large distance reaching its maximum when \(\langle \mathbf{G} \rangle\) is orthogonal to \(\langle \mathbf{T} \rangle\). Note that Fig. 1 shows the case where there is only one component in G.
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig1_HTML.gif
Figure 1

Geometric interpretation of DCC.

When targets are present, in the ideal case, a good detector should yield zero distance since the noise effects are neglected. Let \(\mathbf{x}_1\in \langle \mathbf{T},\mathbf{G} \rangle\) be an observation when targets are present, as shown in Fig. 1. The minimum distance between x1 and \(\langle \mathbf{T} \rangle\) is \({\mbox{\boldmath $\delta$}}_1=\mathbf{x}_1-\hat{\mathbf{x}}_1\), where \(\hat{\mathbf{x}}_1\) is the orthogonal projection of x1 onto \(\langle \mathbf{T} \rangle\). The contribution of G in x1 cannot be avoided, as in Eq. 3, since \(\langle \mathbf{G} \rangle\) is unknown.

In DCC, with the addition of another observation \(\mathbf{x}_2\in \langle \mathbf{T},\mathbf{G} \rangle\), zero distance under \(\mathcal{H}_1\) can be obtained even without the knowledge of \(\langle \mathbf{G} \rangle\). The canonical correlation is cosine of the minimum angle between the two subspaces. When only x1 is used, α1 > 0 because of the existence of components from \(\langle \mathbf{G} \rangle\); when both x1 and x2 are used, however, the minimum angle between \(\langle \mathbf{x}_1,\mathbf{x}_2 \rangle\) and \(\langle \mathbf{T} \rangle\) is equal to zero with the solution of \(\hat{\mathbf{x}}=\hat{\mathbf{t}}= \langle \mathbf{x}_1,\mathbf{x}_2 \rangle \bigcap \langle \mathbf{T} \rangle,\) where \(\hat{\mathbf{x}}\) is a linear combination of x1 and x2, and \(\hat{\mathbf{t}}\) is a linear combination of tii = 1, ⋯ , L. Thus, the DCC detector can yield a zero distance without using information on \(\langle \mathbf{G} \rangle\), i.e., x2 provides an additional coordinate basis for the projection to make up for the missing information of G when x1 and x2 are combined together to seek the maximum correlation with the target subspace. Hence, the maximum canonical correlation between X and T is a quantity that reflects the existence of components of T, and is invariant to the interference components of G in X. In general, without using the knowledge of G, the DCC detector yields a large distance when targets are absent and zero distance when targets are present. Note that zero distance means a correlation value of one between the two vectors.

There are min (B,L) canonical correlations obtained by solving the eigenvalue problem in Eq. 6 or Eq. 7 when X and T are full rank. The solution looks like it is unique because Fig. 1 can only show a 3D scenario. It would be a multidimensional space in geometry for multiple components, and thus there will be more intersections. When there is more than one target component in X, it can be expected that there are other high values among all canonical correlations besides the maximum one, and we can use the number of the high canonical correlations as an indicator for the number of existing target components.

We can see that the DCB detector given in [7]
$$ \rho=\sqrt{\mathbf{t}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{t}} $$
(9)
is a special case of DCC when there is only one component in the target set T, i.e., T = t. This is the case since the maximum correlation between t and \(\langle \mathbf{X} \rangle\) is equal to the correlation between t and its orthogonal projection onto \(\langle \mathbf{X} \rangle\), which is exactly the X(XTX) − 1XTt part in Eq. 9.

4 DCC with Non-negativity Constraints

In certain applications, such as Raman spectroscopy and image processing, the contributions of mixing components can only be non-negative. When this is the case, it is natural to expect a better detection performance if we constrain all elements in the canonical vector of the target set to be non-negative.

The non-negative CCA (NCCA) can be expressed as an optimization problem as follows
$$ \max\limits_{\mathbf{a},\mathbf{b}} \ \ \ \ {\rm corr}(\mathbf{X}\mathbf{a},\mathbf{Y}\mathbf{b}), $$
(10)
subject to
$$ \mathbf{b} \geq \mathbf{0}. $$

The above problem can be solved numerically using a general constrained optimization algorithm, such as the augmented Lagrange method. Since inequality constraints are usually computationally expensive to incorporate in numerical computation, a more efficient procedure is desirable for the given problem. We develop a non-negative canonical correlation algorithm based on the canonical correlation analysis on two convex polyhedral cones [17].

A convex polyhedral cone C(s) is generated by a finite number of vectors. There exist vectors s1, s2, ⋯ , sk of Rn such that any element x of C(s) can be written as
$$ \mathbf{x}=\sum\limits_{i=1}^k{\phi_i\mathbf{s}_i}, $$
where ϕ1, ⋯ , ϕk are non-negative.

An algorithm of canonical correlation analysis on two convex polyhedral cones is given in [17], where a pair of vectors \(\hat{\mathbf{x}}\) and \(\hat{\mathbf{y}}\) are computed in the cones C(X) and C(Y), respectively, such that the correlation between \(\hat{\mathbf{x}}\) and \(\hat{\mathbf{y}}\) is maximized. In the problem stated in Eq. 10, however, the non-negativity constraint is imposed only on the subspace spanned by Y. In other words, using NCCA for detection, we seek the solution of \(\hat{\mathbf{y}}\) in the cone C(Y) and \(\hat{\mathbf{x}}\) in an unlimited subspace spanned by X, denoted by D(X). Hence the NCCA algorithm is developed for a single cone for the problem given in Eq. 10.

Before introducing the NCCA algorithm, we present the following proposition [17]:

Proposition

Let C be a convex polyhedral cone, x a vector in Rn, y the projection of x into C, and z any vector of C. Then the maximum correlation corr(x, z) is reached at z ∝ y, i.e., z = αy for any scalar α.

Therefore, using this proposition, we can conclude that when the maximum of corr(\(\hat{\mathbf{x}}\), \(\hat{\mathbf{y}}\)) is reached, \(\hat{\mathbf{x}}\) and \(\hat{\mathbf{y}}\) verify the conditions \(\hat{\mathbf{x}} \propto P_{\mathbf{D}}(\hat{\mathbf{y}})\) and \(\hat{\mathbf{y}} \propto P_{\mathbf{C}}(\hat{\mathbf{x}})\), as shown in Fig. 2. Here, PC(·) is the projection operator on cone C(Y) and \(P_{\mathbf{D}}(\cdot)=\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\) is the orthogonal projection operator on subspace D(X).
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig2_HTML.gif
Figure 2

Canonical correlation analysis on cone C(Y) and subspace D(X).

We implement PC using the non-negative least squares algorithm in [18, 19].

The NCCA algorithm is described as follows:

Non-negative canonical correlation analysis algorithm  
  1. 1.

    Initialize of \(\hat{\mathbf{y}_0}\) with \(\hat{\mathbf{y}}_0=\frac{1}{L}\sum_{i=1}^L{\mathbf{y}_i}\),

     
  2. 2.

    \(\hat{\mathbf{x}}_i=P_{\mathbf{D}}(\hat{\mathbf{y}}_{i-1})\), and normalize \(\hat{\mathbf{x}}_i\),

     
  3. 3.

    \(\hat{\mathbf{y}}_i=P_{\mathbf{C}}(\hat{\mathbf{x}}_{i-1})\), and normalize \(\hat{\mathbf{y}}_i\),

     
  4. 4.

    Stop if both \(\|\hat{\mathbf{x}}_i-\hat{\mathbf{x}}_{i-1}\|\) and \(\|\hat{\mathbf{y}}_i-\hat{\mathbf{y}}_{i-1}\|\) are less than a given tolerance; if not, go to step 2.

     

5 Distribution of the Maximum Canonical Correlation in DCC

In this section, we study the distribution of the maximum canonical correlation in Eq. 8 based on the linear mixture model. Even though the distribution of sample canonical correlations between two normally distributed random vectors has been well studied [1315], in DCC, the maximum canonical correlation is between a target data set T and an observation data set X = TAt + GAg + V = SA + V, where S = [TsGs] and \(\mathbf{A}\equiv\left[ \begin{array}{l} \mathbf{A}_t^s \\ \mathbf{A}_g^s \end{array} \right].\) Here Ts is the subset of T that is present in the mixture, and so is Gs. \(\mathbf{A}_t^s\) and \(\mathbf{G}_t^s\) are their corresponding mixing coefficients. We assume that all the elements in the noise matrix V follow a Gaussian distribution \(\mathcal{N}\left(0,\sigma^2\right)\), and are independent and identically distributed (i.i.d.). Then X has the distribution of \(\mathcal{N}\left(\mathbf{SA},\sigma^2\mathbf{1}_{N\times B}\right)\), where 1N × B is an N × B matrix of all ones. The elements of X are not i.i.d., and T is a deterministic matrix instead of being a sample of a random vector. Hence, the distribution results of two random vectors in [1315] cannot be directly applied and we do need to derive the distribution of the maximum canonical correlation given in Eq. 8 for DCC. Note that, in DCC, the decisions are based on the proximity of the detection index ρ to one, therefore, we are interested in the distribution of ρ under \(\mathcal{H}_1\).

First, we need to transfer the problem into a simpler form. The canonical correlation is invariant to linear transformations, hence,
$$ \begin{array}{rll} \rho&=&\max\limits_{\mathbf{a},\mathbf{b}} {\rm corr}(\mathbf{X}\mathbf{a},\mathbf{T}\mathbf{b})\triangleq \psi\left(\mathbf{X},\mathbf{T}\right) \\ &=&\psi\left(\mathbf{S}+\mathbf{V}\mathbf{A}^{-1},\mathbf{T}\right) \end{array} $$
(11)
Here, we assume that the mixing matrix A is an orthonormal matrix (if it is not, observations can be whitened so that A is equivalent to an orthonormal matrix), hence, \(\mathbf{V}'\triangleq \mathbf{V}\mathbf{A}^{-1}\) has the same distribution as V. When the target is present, say s1 = t1, then S = [t1, s2, ⋯ , sM]. Here s2, ⋯ ,sB could be either target or interference components. As discussed in Section 2, the maximum canonical correlation ρ will assume a value close to one, and there exists a solution in which the contribution of t1 is dominant compared to the other mixing components. Hence, we can approximate ρ by the multiple correlation between y = t1 + v1′ and T, i.e.,
$$ \rho=\psi\left(\mathbf{S}+\mathbf{V}',\mathbf{T}\right)\approx \psi\left(\mathbf{t}_1+\mathbf{v}'_1,\mathbf{T}\right)\triangleq R. $$
(12)
The square of multiple correlation can be calculated as [13]:
$$ R^2=\frac{\mathbf{y}^T\mathbf{T}\left(\mathbf{T}^T\mathbf{T}\right)^{-1}\mathbf{T}^T\mathbf{y}}{\mathbf{y}^T\mathbf{y}}. $$
(13)
The distribution of R2 given that \(\mathbf{y}\sim\mathcal{N}\left(\mathbf{t}_1,\sigma^2\mathbf{I}\right)\) is a non-central F-distribution, which is derived in the Appendix as
$$ p\left(R^2\right)=e^{-\frac{\lambda}{2}}\left(1-R^2\right)^{\frac{N-L}{2}-1}\sum\limits_{k=0}^{\infty} \frac{\left(\frac{\lambda}{2}\right)^k\left(R^2\right)^{\frac{L}{2}+k-1}} {\beta\left(\frac{N-L}{2},\frac{L}{2}+k\right)k!}, $$
(14)
where \(\beta(x,y)=\int_0^1{t^{x-1}(1-t)^{y-1}dt}\) is the beta function, and λ is a scalar determined by T and σ2, as defined in Eq. 20. Note that, T is a deterministic matrix and y is a random vector centered at t1, thus, the existing work on the distribution of the multiple correlation can not be used in this case [16].
The expectation of R2 is obtained as:
$$ E\left\{R^2\right\}=e^{-\frac{\lambda}{2}}\sum\limits_{k=0}^{\infty}\frac{\left(\frac{\lambda}{2}\right)^k}{k!}\cdot\frac{L+2k}{N+2k}. $$
(15)
We test the density function given in Eq. 14 with simulations and show the results in Fig. 3. We first generate a random orthogonal matrix T with dimension 200 × 20, and insert a column of T, t1, in a randomly generated 200 × 5 matrix S from a Gaussian distribution such that S = [t1, s2, ⋯ , s5]. Then S is multiplied by a 5 × 5 orthonormal mixing matrix A, and noise is added to X = SA for different signal-to-noise-ratio (SNR) with respect to the generated data xi, i = 1, ⋯ , 5. The SNR is defined as:
$${\rm SNR}=10\log_{10}\left(\frac{\|\bar{\mathbf{x}}_i\|^2}{\sigma^2}\right),$$
where \(\|\bar{\mathbf{x}}_i\|^2=\frac{1}{N}\sum_{k=1}^N x_{ik}^2\) is the average energy of the observation data xi, and σ2 is the variance of the noise. In Fig. 3, the curve of the non-central F-distribution is drawn using Eq. 14, ρ2 is square of the maximum canonical correlation calculated with Eq. 11, i.e., the detection index used in DCC, and R2 is an approximation of ρ2 calculated with Eq. 13. The histograms of R2 and ρ2 are obtained from 10,000 simulation runs with different mixing matrices A and noise matrices V for each run. We can see that the derived non-central F-distribution gives a good approximation for R2 and ρ2. The difference between R2 and ρ2 is slight and thus justify the approximation in Eq. 12.
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig3_HTML.gif
Figure 3

The histograms of canonical correlation ρ2 and its approximation R2 are calculated with Eqs. 11 and 13 respectively. The analytical expression of the distribution of R2 is a non-central F-distribution, drawn using Eq. 14.

6 Simulation Results in Raman Spectroscopy

The proposed DCC and DNCC detectors are tested in application to Raman spectroscopy for detections of surface-deposited chemical agents. As shown in Fig. 4, a Raman spectrum gives a set of peaks that correspond to the characteristic vibrational frequencies of the material, providing information on molecular structure and chemical composition of materials. Hence, it can be used as a signature for identification of various materials.
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig4_HTML.gif
Figure 4

Example Raman spectra of chemical agents.

We have a library of 62 Raman spectra, \(\mathcal{S}=\left\{\mathbf{s}_1,\cdots,\mathbf{s}_{62}\right\},\) where the first 50 are spectra of target chemicals of interest, and the last 12 are spectra of possible background materials, i.e., the interference components. The frequency range of the Raman spectra we use is [500, 2800] cm − 1 represented by a vector with 903 sampling points. The data were collected and measured by the Edgewood Chemical and Biological Center of US Army [20].

To evaluate the detection performance, we plot receiver operating characteristic (ROC) curves for all detectors. PFA is the probability of false alarm, or 1−specificity, and PD is the probability of detection, or sensitivity. The area under ROC curve measures discrimination, which is the ability of the test to make correct decisions. The discrimination values are given in each ROC plot. Each curve is drawn using 200 runs in this paper.

Simulation data are generated for both \(\mathcal{H}_0 \) and \(\mathcal{H}_1\) using the data model
$$ \begin{array}{rll} \mathcal{H}_0 : \mathbf{x}&=&\mathbf{G}^s\mathbf{a}_g+\mathbf{v},\\ \mathcal{H}_1 : \mathbf{x}&=&\mathbf{T}^s\mathbf{a}_t+\mathbf{G}^s\mathbf{a}_g +\mathbf{v}, \end{array} $$
(16)
where Ts is chosen from \(\left\{\mathbf{s}_1,\cdots,\mathbf{s}_{50}\right\},\) and Gs from \(\left\{\mathbf{s}_{51},\cdots,\mathbf{s}_{62}\right\}.\) The coefficients ag and at are randomly generated from a uniform distribution in the range [0, 1] for each detection run. The noise vector v is generated using a Gaussian distribution with zero mean, and its variance σ2 is chosen according to desired SNR with respect to the generated data x.

We investigate the performance of DCC and DNCC detectors along with MSD and NNLS detectors for comparison. Both the MSD and NNLS detectors are performed on a sample by sample basis, while the DCC and DNCC are performed on a block of samples. Note that the NNLS, DCC and DNCC detectors do not rely on the interference information, thus only T is used in their implementations. For MSD, we let G = Gs in Eq. 2, which provides a significant advantage for MSD over NNLS and DCC/DNCC. In practice, however, usually the interference is either unknown or its estimation is not practical. When the actual interference is missing in the MSD detector, it might collapse in detection. Therefore, a practical implementation of MSD is to include all the possible interferences. In our simulation, we implement an MSD using all the interference components in the library as the interference matrix, i.e., G = [s51, ⋯ , s62] in Eq. 2, denoted by MSD-L.

Another variation of MSD is to use a block of samples as DCC/DNCC detectors. It can be implemented by replacing x, T and G in Eq. 2 with 1B ⊗ x, IB ⊗ T and IB ⊗ G, respectively, where ⊗ is the Kronecker product operator, B the number of observations used, IB an identity matrix of size B, and \(\mathbf{1}_B=[1,\cdots,1_B]^T\). This detector is denoted by MSD-LBlk, where all possible interference components are included in G, as in the MSD-L detector.

In addition, we implement a detector using single correlations (DCOR), in which the correlations between the observation and each component in the library are first calculated. Then the maximum correlation is used as the detection index, and the corresponding component is selected as the index of component present in the mixture. Note that the interference information is not used in DCOR as in DCC and DNCC.

We also implement the LS detector in Eq. 3, which is outperformed by the NNLS detector. Thus we do not include the result in this paper.

We study the performance of all detectors with several background materials and chemical mixtures under a select set of noise levels and note close to perfect (Disc. = 1) performance with DNCC for most cases. Here we show two examples. The simulation data in Fig. 5 is generated with \(\mathbf{G}^s=[\mathbf{s}_{54}]\) and \(\mathbf{T}^s=[\mathbf{s}_{10}]\), i.e., s10 is the mixing chemical, and s54 is the background material. We use two observations for applying DCC and DNCC. The zero-mean Gaussian noise vector v is generated with a variance such that the SNR is 2 dB.
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig5_HTML.gif
Figure 5

ROCs for all the detectors (background = s54, present chemical = s10, and SNR = 2dB).

In Fig. 5a and b, we can see a significant improvement of discrimination values for DCC and DNCC detectors over other data-driven detectors such as NNLS and DCOR. As described in Section 3, this is because the interference information embodied in the observation sample block is utilized through the canonical correlation analysis in DCC, hence mitigating the effect of interference in detection. While in NNLS and DCOR, because the detection is performed on a sample-by-sample basis, the interferences from the background cannot be removed. DNCC performs better than DCC as the non-negativity constraints help eliminating the effects of interferences from the rest of the target library.

It is worth noting that MSD also provides a good performance in Fig. 5b, which is not surprising because it has the particular advantage of knowing the exact mixing interference component. However, when more interference components are included, MSD-L detector performs poorly in this experiment. MSD-LBlk gives better performance than MSD-L since it uses more samples for detection. Compared to other detectors, an important issue is the dimensionality of MSD-LBlk implementation. Let M be the block size, then the complexity of MSD-Blk is M3O(n3) while the complexity of MSD and DCC detectors is O(n3). Thus, the computational cost of MSD-LBlk is significantly higher than other detectors, which makes it impractical for online type applications.

The simulation data in Fig. 6 is generated with \(\mathbf{G}^s=[\mathbf{s}_{52},\mathbf{s}_{54}]\) and \(\mathbf{T}^s=[\mathbf{s}_{17},\mathbf{s}_{29}],\) and the SNR is 5 dB. We use four observations for applying DCC and DNCC in this case. In this example, we use more components for both targets and interferences and results are similar to those in Fig. 5.
https://static-content.springer.com/image/art%3A10.1007%2Fs11265-011-0625-7/MediaObjects/11265_2011_625_Fig6_HTML.gif
Figure 6

ROCs for all the detectors (background = [s52, s54], present chemical = [s17,s29], and SNR = 5dB).

7 Conclusion

In this paper, we propose a new detection method, DCC, for data-driven target detection with unknown interferences. Using the maximum canonical correlation and vectors between the mixtures and the target library, we can determine if there is any library component present in the mixtures, and pick those present ones from a given library. We also develop a non-negative canonical correlation algorithm for applications where only non-negative contributions exist in the mixtures, such as the Raman spectroscopy application presented in this paper. The distribution of the maximum canonical correlation is derived when targets are present, which can provide a guidance for determining the threshold in detection.

In simulations, we studied the performance of DCC and DNCC in Raman spectroscopy for detection of surface-deposited chemical agents. The results demonstrate the effectiveness of the proposed methods. The detection performance of DCC, however, degrades if the components in the target set are highly correlated. The performance of DCC and DNCC can be further improved by partitioning the whole target set into some subsets with low inter/intra correlations [21], as demonstrated in [22].

Footnotes
1

This paper is an extension of our earlier work presented at the IEEE Workshop on Machine Learning for Signal Processing [11], where the idea of detection with canonical correlation for unknown interference was first introduced.

 

Copyright information

© Springer Science+Business Media, LLC 2011