Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Guo, Yan-Ru; Bai, Yan-Qin; Li, Chun-Na; Bai, Lan; Shao, Yuan-Hai

doi:10.1007/s10489-021-02843-z

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Published: 05 November 2021

Volume 52, pages 8793–8809, (2022)
Cite this article

Download PDF

Applied Intelligence Aims and scope Submit manuscript

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Download PDF

Yan-Ru Guo¹,
Yan-Qin Bai¹,
Chun-Na Li ORCID: orcid.org/0000-0001-7033-0089²,
Lan Bai³ &
…
Yuan-Hai Shao²

1237 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

The recently proposed L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) was an effective improvement over linear discriminant analysis (LDA) and was used to handle vector input samples. When faced with two-dimensional (2D) inputs, such as images, converting two-dimensional data to vectors, regardless of the inherent structure of the image, may result in some loss of useful information. In this paper, we propose a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). 2DBLDA maximizes the matrix-based between-class distance, which is measured by the weighted pairwise distances of class means and minimizes the matrix-based within-class distance. The criterion of 2DBLDA is equivalent to optimizing the upper bound of the Bhattacharyya error. The weighting constant between the between-class and within-class terms is determined by the involved data that make the proposed 2DBLDA adaptive. The construction of 2DBLDA avoids the small sample size (SSS) problem, is robust, and can be solved through a simple standard eigenvalue decomposition problem. The experimental results on image recognition and face image reconstruction demonstrate the effectiveness of 2DBLDA.

Robust Two-Dimensional Linear Discriminant Analysis via Information Divergence

Article 03 October 2020

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Polynomial linear discriminant analysis

Article 24 June 2023

1 Introduction

Feature extraction plays an important role in pattern recognition. As a powerful supervised feature extraction method, linear discriminant analysis (LDA) [1] has been successfully applied in many problems, such as face recognition [2, 3], text mining [4, 5], image retrieval [6, 7], gait recognition [8], and microarrays [9, 10].

However, classical LDA is a vector (or one-dimensional) 1D based method. When input data are naturally of matrix (or two-dimensional) 2D form, such as images, two issues may arise. First, converting 2D data to 1D data may produce high-dimensional vectors and hence may lead to the small sample size (SSS) problem [11]. For example, a 32× 32 face image corresponds to a 1024-dimensional vector. Second, during the transformation from 2D data to 1D data, the underlying spatial (structural) information is destroyed. Therefore, useful discriminant information may be lost [12, 13]. To handle these problems, many image-as-matrix methods have been developed [14, 15]. In contrast to the image-as-vector methods, the image-as-matrix methods treat an image as a two-order tensor, and their objective functions are expressed as functions of the image matrix instead of the high-dimensional image vector. The representative image-as-matrix method is two-dimensional LDA (2DLDA) [16]. 2DLDA constructed the within-class scatter matrix and between-class scatter matrix by using the original image samples represented in matrix form rather than converting matrices to vectors beforehand. Compared to LDA, 2DLDA can alleviate the SSS problem when a mild condition is satisfied [17] and can preserve the original structure of the input matrix.

Thereafter, some modifications and improvements of 2DLDA were studied by many researchers. Due to the squared L2-norm nature of 2DLDA, it was sensitive to noise and outliers. To improve the robustness of 2DLDA, robust replacements of the L2-norm were studied, including the L1-norm [18,19,20,21], nuclear norm [22, 23], Lp-norm [24, 25], and Schatten Lp-norm, 0 < p < 1 [26]. Some of the studies focused on extracting the discriminative transformations on both sides of the matrix samples. The authors in [27, 28] implemented 2DLDA on matrices in sequence or independently and then combined left-and right side transformations to achieve bilateral dimensionality reduction. Li et al. [25] used iterative schemes to extract transformations on both sides. Extensions to other machine learning problems and real applications were also investigated. For example, Wang et al. [29] proposed a convolutional 2DLDA for nonlinear dimensionality reduction, and Xiao et al. [30] studied a two-dimensional quaternion sparse discriminant analysis that met the requirements of representing RGB and RGB-D images.

Although 2DLDA can ease the SSS problem, it may still face the singularity issue theoretically as LDA since it needs to solve a generalized eigenvalue problem. Recently, a novel vector-based L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] was proposed. Compared to LDA, L2BLDA solved a simple standard eigenvalue decomposition problem rather than a generalized eigenvalue decomposition problem, which avoided the singularity issue and had robustness. In fact, minimizing the Bhattacharyya error [32] bound is a reasonable way to establish classification [33]. In this paper, inspired by L2BLDA, to cope with the SSS problem and improve the robustness of 2DLDA, we first derive a Bhattacharyya error upper bound for matrix input classification and then propose a novel two-dimensional linear discriminant analysis by minimizing this Bhattacharyya error upper bound, called 2DBLDA. The proposed 2DBLDA has the following characteristics:

2DBLDA is proposed for the novel two-dimensional matrix input problem. The 2DBLDA criterion is proven to be an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. We have proved that optimizing this upper bound of the Bhattacharyya error can lead to an optimal discriminant direction. Therefore, the rationality of the 2DBLDA optimization problem is guaranteed theoretically.
The weighting constant of the between-class distance and the within-class distance of 2DBLDA is adaptive to the involved data that is calculated according to input data. This constant not only helps the objective of 2DBLDA achieve the minimum error bound but also makes the proposed 2DBLDA adaptive without tuning any parameters. By considering the above weighted between-class distance information, 2DBLDA could achieve robustness.
Unlike 2DLDA, 2DBLDA is solved effectively through a standard eigenvalue decomposition problem, which does not involve the inverse of a matrix and hence avoids the SSS problem.
To observe the discriminant ability of our method, we consider the accuracy of different databases, plot the variation of the accuracy with dimension reduction, and measure the reconstruction performance of the face image. The experimental results on image recognition and face reconstruction demonstrate the effectiveness of 2DBLDA.

The paper is organized as follows. Section 2 briefly introduces LDA, L2BLDA and 2DLDA. Section 3 proposes our 2DBLDA and gives the corresponding theoretical analysis. Section 4 compares 2DBLDA with its related approaches. Section 5 discusses the relationship between our 2DBLDA and related methods and analyses the experimental results. Finally, the concluding remarks are given in Section 6. The proof of the Bhattacharyya error upper bound of 2DBLDA is given in the Appendix.

The notations of this paper are given as follows. We consider a supervised learning problem in the d₁ × d₂-dimensional matrix space $\mathbb {R}^{d_{1}\times d_{2}}$. The training dataset is given by T = {(X₁,y₁),...,(X_N,y_N)}, where $\textbf {X}_{l}\in \mathbb {R}^{d_{1}\times d_{2}}$ is the l-th input matrix sample and y_l ∈{1,...,c} is the corresponding label, l = 1,...,N. Assume that the i-th class contains N_i samples, i = 1,…,c. Then, we have $\sum \limits _{i=1}^{c}N_{i}=N$. We further write the samples in the i-th class as {X_is}, where X_is is the s-th sample in the i-th class, i = 1,…,c, s = 1,…,N_i. Let $\overline {\textbf {X}}=\frac {1}{N}\sum \limits _{l=1}^{N}\textbf {X}_{l}$ be the mean of all matrix samples and ${\overline {\textbf {X}}}_{i}=\frac {1}{N_{i}}\sum \limits _{s=1}^{N_{i}}\textbf {X}_{is}$ be the mean of matrix samples in the i-th class. For a matrix $\textbf {Q}=(\textbf {q}_{1}, \textbf {q}_{2},\ldots ,\textbf {q}_{n})\in \mathbb {R}^{m\times n}$, its Frobenius norm (F-norm) ||Q||_F is defined as $||\textbf {Q}||_{F}=\sqrt {\sum \limits _{i=1}^{n}||\textbf {q}_{i}||_{2}^{2}}$. The F-norm is a natural generalization of the vector L2-norm on matrices.

2 Related work

2.1 Linear discriminant analysis

Linear discriminant analysis (LDA) finds a projection transformation matrix W such that the ratio of between-class distance to within-class distance is maximized in the projected space. For data in $\mathbb {R}^{n}$, LDA finds an optimal $\textbf {W}\in \mathbb {R}^{n\times r}$, r ≤ n, such that the most discriminant information of the data is retained in $\mathbb {R}^{r}$ by solving the following problem:

$$ \underset{\textbf{W}}{\max}~~\frac{\text{tr}(\textbf{W}^{T}\textbf{S}_{b}\textbf{W})} {\text{tr}(\textbf{W}^{T}\textbf{S}_{w}\textbf{W})}, $$

(1)

where tr(⋅) is the trace operation of a matrix, and the between-class scatter matrix S_b and the within-class scatter matrix S_w are defined by

$$ \textbf{S}_{b}=\frac{1}{N}\sum\limits_{i=1}^{c}N_{i}({\overline{\textbf{x}}}_{i}-{\overline{\textbf{x}}})({\overline{\textbf{x}}}_{i}-{\overline{\textbf{x}}})^{T} $$

(2)

and

$$ \textbf{S}_{w}=\frac{1}{N}\sum\limits_{i=1}^{c}\sum\limits_{s=1}^{N_{i}}(\textbf{x}_{is}-{\overline{\textbf{x}}_{i}})(\textbf{x}_{is}-{\overline{\textbf{x}}_{i}})^{T}, $$

(3)

where $\overline {\textbf {x}}_{i}\in \mathbb {R}^{n}$ is the mean of the samples in the i-th class, $\overline {\textbf {x}}\in \mathbb {R}^{n}$ is the mean of the whole data, and $\textbf {x}_{is}\in \mathbb {R}^{n}$ is the s-th sample of the i-th class. The optimization problem (1) is equivalent to the generalized problem S_bw = λS_ww, where λ≠ 0, with its solution W = (w₁,…,w_r) given by the first r largest eigenvalues of $(\textbf {S}_{w})^{-1}\textbf {S}_{b}$ in case S_w being nonsingular.

2.2 L2-norm linear discriminant analysis criterion via the Bhattacharyya error bound estimation

As an improvement over LDA, the L2-norm linear discriminant analysis criterion based on Bhattacharyya error bound estimation (L2BLDA) [31] is a recently proposed vector-based weighted linear discriminant analysis. In the vector space $\mathbb {R}^{n}$, by minimizing an upper bound of the Bhattacharyya error, the optimization problem of L2BLDA is formulated as

$$ \begin{array}{ll} \underset{\textbf{W}}{\min}~~&-\frac{1}{N}{\sum}_{i<j}\sqrt{N_{i}N_{j}}||\textbf{W}^{T} (\overline{\textbf{x}}_{i}-\overline{\textbf{x}}_{j})||_{2}^{2}+{\varDelta}\sum\limits_{i=1}^{c} \sum\limits_{s=1}^{N_{i}}||\textbf{W}^{T}(\textbf{x}_{is}-\overline{\textbf{x}}_{i})||_{2}^{2}\\ \text{s.t.\ }& \textbf{W}^{T}\textbf{W}=\textbf{\textbf{I}}, \end{array} $$

(4)

where $\textbf {W}\in \mathbb {R}^{n\times r}$, r ≤ n, $P_{i}=\frac {N_{i}}{N}$, $P_{j}=\frac {N_{j}}{N}$, $\overline {\textbf {x}}_{i}\in \mathbb {R}^{n}$ is the mean of the samples in the i-th class, $\textbf {x}_{is}\in \mathbb {R}^{n}$ is the s-th sample of the i-th class, ${\varDelta }=\frac {1}{4}\sum \limits _{i<j}^{c}\sqrt {P_{i}P_{j}}||{\overline {\textbf {x}}_{i}-\overline {\textbf {x}}_{j}}||_{2}^{2}$, and $\textbf {\textbf {I}}\in \mathbb {R}^{r\times r}$ is the identity matrix.

L2BLDA is solved through the following standard eigenvalue decomposition problem:

$$ \begin{array}{ll} \underset{\textbf{W}}{\min}&~~\text{tr}(\textbf{W}^{T}\textbf{S}\textbf{W})\\ \text{s.t.\ }& \textbf{W}^{T}\textbf{W}=\textbf{I}, \end{array} $$

(5)

where

$$ \textbf{S}=-\frac{1}{N}\sum\limits_{i<j}\sqrt{N_{i}N_{j}}(\overline{\textbf{x}}_{i}- \overline{\textbf{x}}_{j})(\overline{\textbf{x}}_{i}-\overline{\textbf{x}}_{j})^{T}+{\varDelta} \sum\limits_{i=1}^{c}\sum\limits_{s=1}^{N_{i}}(\textbf{x}_{is}-\overline{\textbf{x}}_{i})(\textbf{x}_{is}-\overline{\textbf{x}}_{i})^{T}. $$

(6)

Then, W = (w₁,w₂,…,w_r) is obtained by the r orthogonormal eigenvectors that correspond to the first r nonzero smallest eigenvectors of S. After obtaining the optimal W, a new sample $\textbf {x}\in \mathbb {R}^{n}$ is projected into $\mathbb {R}^{r}$ by W^Tx.

2.3 Two-dimensional linear discriminant analysis

Different from LDA or L2BLDA, which works on vector samples, two-dimensional linear discriminant analysis (2DLDA) [16, 17] operates on matrix samples. 2DLDA defines the between-class scatter matrix and the within-class scatter matrix directly on the 2D data set T as

$$ \textbf{S}_{b}=\frac{1}{N}\sum\limits_{i=1}^{c}N_{i}({\overline{\textbf{X}}}_{i}-{\overline{\textbf{X}}})({\overline{\textbf{X}}}_{i}-{\overline{\textbf{X}}})^{T} $$

(7)

and

$$ \textbf{S}_{w}=\frac{1}{N}\sum\limits_{i=1}^{c}\sum\limits_{s=1}^{N_{i}}(\textbf{X}_{is}-{\overline{\textbf{X}}}_{i})(\textbf{X}_{is}-{\overline{\textbf{X}}}_{i})^{T}. $$

(8)

Then 2DLDA solves the following optimization problem:

$$ \underset{\textbf{W}}{\max}~~\frac{\text{tr}(\textbf{W}^{T}\textbf{S}_{b}\textbf{W})}{\text{tr} (\textbf{W}^{T}\textbf{S}_{w}\textbf{W})}=\frac{\sum\limits_{i=1}^{c}N_{i}\|\textbf{W}^{T} ({\overline{\textbf{X}}}_{i}-{\overline{\textbf{X}}})\|_{F}^{2}}{\sum\limits_{i=1}^{c} \sum\limits_{s=1}^{N_{i}}\|\textbf{W}^{T}(\textbf{X}_{is}-{\overline{\textbf{X}}_{i}})\|_{F}^{2}}, $$

(9)

where $\textbf {W}=(\textbf {w}_{1},\ldots ,\textbf {w}_{r})\in \mathbb {R}^{d_{1}\times r}$, r ≤ d₁. i = 1,…,c, j = 1,…,N_i. (9) can be solved through the generalized eigenvalue problem S_bw = λS_ww in case S_w is nonsingular, and its solution is the r eigenvectors corresponding to the first largest r nonzero eigenvalues. After obtaining optimal W, a new sample $\textbf {X}\in \mathbb {R}^{d_{1}\times d_{2}}$ is projected into $\mathbb {R}^{r\times d_{2}}$ by W^TX. Note that 2DLDA will still encounter the singularity problem when S_w is not of full rank.

3 Two-dimensional Bhattacharyya bound linear discriminant analysis

3.1 The derivation of a Bhattacharyya error bound estimation

In this section, we derive a new two-dimensional linear discriminant analysis criterion by minimizing a Bhattacharyya error bound.

From the viewpoint of minimizing the probability of classification error, the Bayes classifier is the best classifier [1], and its error rate, known as the Bayes error, is defined as

$$ \epsilon = 1- \int\underset{i\in\{1,2,\ldots,c\}}{max}\{P_{i}p_{i}(\textbf{X})\}d\textbf{X}, $$

(10)

where X is a sample, P_i is the prior probability, and p_i(X) is the probability density function of the i-th class of the data. Computing the Bayes error is usually hard, and therefore minimizing its upper bound is often considered an alternative effective method [35,36,37]. Among various bounds, the Bhattacharyya error [32] is a close upper bound to the Bayes error, which is given by

$$ \epsilon_{B}=\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}\int\sqrt{p_{i}(\textbf{X})p_{j}(\textbf{X})}d\textbf{X}.\\ $$

(11)

Under the background of two-dimensional supervised dimensionality reduction, if we can derive a relatively close upper bound of 𝜖_B, we may obtain a reasonable dimensionality reduction model. In fact, under some basic assumptions, we can obtain an upper bound of 𝜖_B, as shown in the following proposition.

Proposition 1

Assume P_i and p_i(X) are the prior probability and the probability density function of the i-th class for the training data set T, respectively, and the data samples in each class are independent and identically normally distributed. Let p₁(X),p₂(X),…,p_c(X) be the Gaussian functions given by $p_{i}(\textbf {X})=\mathcal {N}(\textbf {X}|{\overline {\textbf {X}}}_{i}, \boldsymbol {{\varSigma }}_{i})$, where ${\overline {\textbf {X}}}_{i}$ and Σ_i are the class mean and the class covariance matrix, respectively. We further suppose Σ_i = Σ, i = 1,2,…,c, where Σ is the covariance matrix of the data set T, and ${\overline {\textbf {X}}}_{i}$ and Σ can be estimated accurately from T. Then for arbitrary projection vector $\textbf {w}\in \mathbb {R}^{d_{1}}$, the Bhattacharyya error bound 𝜖_B defined by (11) on the data set $\widetilde {T}=\{\widetilde {\textbf {X}}_{i}|\widetilde {\textbf {X}}_{i}=\textbf {w}^{T}\textbf {X}_{i}\in \mathbb {R}^{1\times d_{2}}\}$ satisfies the following:

$$ \begin{array}{@{}rcl@{}} \epsilon_{B} &\leq&-\frac{a}{8}\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}{||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}}+\frac{a}{8}{\varDelta}{\sum}_{i=1}^{c}{\sum}_{s=1}^{N_{i}}||\textbf{w}^{T}(\textbf{X}_{is}-\overline{\textbf{X}}_{i})||_{2}^{2}\\ &&+\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}, \end{array} $$

(12)

where ${\varDelta }=\frac {1}{4}\sum \limits _{i<j}^{c}\sqrt {P_{i}P_{j}}||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2}$, and a > 0 is some constant.

Proof

See the Appendix. □

3.2 The proposed two-dimensional Bhattacharyya bound linear discriminant analysis

Proposition 1 gives a reasonable upper bound of 𝜖_B. After obtaining an upper error bound, it is natural to minimize it. Therefore, we minimize the upper bound of 𝜖_B in (12), that is, the right side of (12). In fact, by minimizing it, we can easily obtain a novel two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA) as follows:

$$ \underset{\textbf{w}^{T}\textbf{w}=1}{\min}~~-\frac{1}{N}\sum\limits_{i<j}\sqrt{N_{i}N_{j}}|| \textbf{w}^{T}(\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j})||_{2}^{2}+{\varDelta}\sum\limits_{i=1}^{c} \sum\limits_{s=1}^{N_{i}}||\textbf{w}^{T}(\textbf{X}_{is}-\overline{\textbf{X}}_{i})||_{2}^{2} $$

(13)

where ${\varDelta }=\frac {1}{4}\sum \limits _{i<j}^{c}\sqrt {P_{i}P_{j}}||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2}$, $\textbf {w}\in \mathbb {R}^{d_{1}}$, $P_{i}=\frac {N_{i}}{N}$.

By applying (13), we can project a d₁ × d₂ sample X into a 1 × d₂ sample $\widetilde {\textbf {X}}$ by $\widetilde {\textbf {X}}=\textbf {w}^{T}\textbf {X}$. However, it does not usually contain enough discriminant information in the 1 × d₂ space, and we may need r ≥ 1 projection vectors w₁,w₂,…,w_r that constitute a projection matrix $\textbf {W}=(\textbf {w}_{1}, \textbf {w}_{2},\ldots ,\textbf {w}_{r})\in \mathbb {R}^{d_{1}\times r}$ and project X into a r × d₂ space by $\widetilde {\textbf {X}}=\textbf {W}^{T}\textbf {X}$.

In general, we consider the following 2DBLDA

$$ \begin{array}{ll} \underset{\textbf{W}}{\min}~~&-\frac{1}{N}\sum\limits_{i<j}\sqrt{N_{i}N_{j}}||\textbf{W}^{T}(\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j})||_{F}^{2}+{\varDelta}\sum\limits_{i=1}^{c}\sum\limits_{s=1}^{N_{i}}||\textbf{W}^{T}(\textbf{X}_{is}-\overline{\textbf{X}}_{i})||_{F}^{2}\\ \text{s.t.\ }& \textbf{W}^{T}\textbf{W}=\textbf{\textbf{I}}, \end{array} $$

(14)

where $\textbf {W}\in \mathbb {R}^{r\times d_{1}}$, r ≤ d₁. We now give the geometric meaning of 2DBLDA. Minimizing the first term in (14) will make the means of two different classes far from each other in the projected space, which guarantees the between-class separativeness. Here, the coefficients $\frac {1}{N}\sqrt {N_{i}N_{j}}$ in the first term weight distance pairs between different class means. Minimizing the second term in (14) forces each sample around its own class mean in the projected space. The weighting constant Δ in front of the second term balances the between-class importance and within-class importance while also ensuring a minimum error bound according to the proof of Proposition 1. We can observe that 2DBLDA is adaptive to different data since Δ is determined by the given data set. To ensure minimum redundancy in the projected space, we also consider an orthogonormal constraint W^TW = I on discriminant directions.

It is easily seen that we can solve 2DBLDA through the following standard eigenvalue decomposition problem:

$$ \begin{array}{ll} \underset{\textbf{W}}{\min}&~~tr(\textbf{W}^{T}\textbf{S}\textbf{W})\\ \text{s.t.\ }& \textbf{W}^{T}\textbf{W}=\textbf{I}, \end{array} $$

(15)

where

$$ \textbf{S}=-\frac{1}{N}\sum\limits_{i<j}\sqrt{N_{i}N_{j}}(\overline{\textbf{X}}_{i}- \overline{\textbf{X}}_{j})(\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j})^{T}+ {\varDelta}\sum\limits_{i=1}^{c}\sum\limits_{s=1}^{N_{i}}(\textbf{X}_{is}-\overline{\textbf{X}}_{i}) (\textbf{X}_{is}-\overline{\textbf{X}}_{i})^{T}. $$

(16)

Then, we obtain the optimal solution as $\textbf {W}=\left (\textbf {w}_{1},\textbf {w}_{2},\ldots ,\textbf {w}_{r}\right )$, where w₁,w₂,…,w_r are the r orthogonormal eigenvectors corresponding to the first r smallest nonzero eigenvectors of S.

4 Experiments

In this section, we compare the proposed 2DBLDA with 2DPCA [34], 2DPCA-L1 [12], 2DLDA [16] and L1-2DLDA [18, 19]. The learning parameter δ of L1-2DLDA is selected optimally from the set {0.001,0.005,0.01,0.05,0.1,0.5,1} by grid search.

We experimented on three image databases for image recognition and one face image database for face reconstruction. In the experiments, after applying a dimensionality reduction method on training data and then obtaining a projection matrix, the test data are projected to lower dimensional space using this projection matrix. For image recognition, the nearest neighbours classifier is employed to obtain classification accuracy. In addition, when the data classes are unbalanced, area under the ROC curve (AUC) and G-mean are used as the performance measurement index. For face reconstruction, the mean reconstruction error is used for performance evaluation. All the methods will be carried out on a PC with P4 2.3 GHz CPU by Matlab 2017b.

4.1 Image recognition

4.1.1 Performance on three image databases

The Yale database^{Footnote 1} is a human face database that contains 165 images of 15 individuals, and each individual includes 11 images. The database is considered to evaluate the performance of methods when facial expression and lighting conditions are changed.

Columbia Object Image Library (Coil100)^{Footnote 2} is a database of colour images of 100 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated 360 degrees to vary object pose with respect to a fixed colour camera. Images of the objects were taken at pose intervals of 5 degrees. The database contains 900 images of 100 objects, with each object containing 9 images.

The COVID database^{Footnote 3} has 349 CT images containing clinical findings of COVID-19 from 216 patients and 397 non-COVID CT scans. The images are collected from COVID-19 related papers from medRxiv^{Footnote 4}, bioRxiv^{Footnote 5}, Lancet, etc. In our experiment, 195 COVID-19 images and 195 non-COVID-19 images were randomly extracted.

We resize each image to 32 × 32 for all the above three databases. Since the number of samples in some classes of the image data used in the experiment is relatively small, to avoid the chance that images in these classes may not be selected due to random cross-validation, for each class, we randomly select 60% of the data samples as the training set, and deem the rest as the test set. Therefore, this strategy makes sure both train and test data set contain samples from every class. First, we obtain all projection matrices from the training data and then compute the test classification accuracy on the projected test data. Since 2DPCA, 2DLDA and 2DBLDA have no parameters, the result of one run is the final result. For L1-2DLDA, there is one parameter, and we adopt ten-fold cross validation on the training set to find its optimal parameter. Then, this optimal parameter is used to run L1-2DLDA ten times on the test set to eliminate the influence of random initialization, and the average accuracy of these ten accuracies is adopted. Similarly, for 2DPCA-L1, since its performance is affected by the initialization projections, we repeat the method ten times and adopt mean accuracy along with standard variance. The results on these databases are listed in Table 1, and the best accuracies are shown in bold figures. From the table, we see that our 2DBLDA has comparable performance compared to other methods. The 2DPCA-L1 and L1-2DLDA obviously have the highest computational burden. In contrast, 2DBLDA costs the least CPU time than 2DPCA and 2DLDA.

Table 1 Comparison of mean accuracy (%), CPU time (second) for different methods on the original three databases

Full size table

To further see the superiority of our 2DBLDA, we artificially pollute the training data by adding each training sample with a rectangle block occlusion at a random location. We set the occlusion area ratio to 10%,20%,30%,40%. For convenience, we denote these four data sets as Yale_b0.1, Yale_b0.2, Yale_b0.3 and Yale_b0.4, where the subscript “b” represents block occlusion and the number next to it means occlusion ratio. For the Coil100 data and COVID data, we add random rectangular Gaussian noise of mean 0 and variance 0.2 that covers 10%,20%,30%,40% areas of each training image at a random position. Denote these eight data sets as Coil_g0.1, Coil_g0.2, Coil_g0.3, Coil_g0.4, COVID_g0.1, COVID_g0.2, COVID_g0.3 and COVID_g0.4, where the subscript “g” represents Gaussian noise, and the number next to it means noise ratio. Some noise samples are shown in Fig. 1.

The classification results on the noise datasets are listed in Table 2. From the table, we have the following observations : (i) All methods are affected by noise, and their corresponding accuracies are lower than those of the original data. In general, the larger the noise area is, the lower the accuracy is. (ii) The proposed 2DBLDA has the highest average accuracy on all noise data. (iii) L1-2DLDA and 2DPCA perform better than 2DPCA-L1 and 2DLDA. (iv) L1-2DLDA can achieve the optimal accuracy when δ is relatively small. (v) For CPU time, we see that 2DPCA-L1 and L1-2DLDA have the same computing time level but are all slower than 2DPCA and 2DLDA, and that 2DLDA and 2DBLDA run the fastest since they obtain all the discriminant vectors once for all.

Table 2 Comparison of mean accuracy (%), CPU time (second) for different methods on noise databases

Full size table

4.1.2 The influence of the reduced dimension

To observe the discriminant ability of the dimensionality method, we measure feature ranking by observing the effect of sample classification in projection space and plot the accuracy variation along with the reduced dimensions in Figs. 2 and 3. Figure 2 depicts the variation of accuracies along dimensions on the original three databases, and Fig. 3 depicts the corresponding results on noise databases.

The results show the following: (i) With the increase of the number of reduced dimensions, the accuracies of 2DPCA and our 2DBLDA first achieve their highest and then have a relatively steady trend, while other methods vary greatly. (ii) Regardless of on the original data or the noise data, the proposed 2DBLDA has the highest accuracy under the optimal reduced dimension. (iii) All the methods are greatly influenced by the reduced dimension, and it is necessary to choose an optimal reduced dimension. (iv) In addition, the optimal reduced dimension of 2DBLDA is not too large compared to other methods in general.

4.1.3 The influence of the unbalanced classes

In this subsection, we verify the influence of our algorithm on unbalanced classes. To construct unbalanced data, different numbers of images are randomly selected from each class to form the training set, and the remaining data are deemed as the test set. In specific, for the COVID database, we randomly select 60% of the sample number for each class from COVID-19 images and non-COVID-19 images in a ratio of 1:1.5 as the training set. Notably, the training set and test set we construct are unbalanced. To test the robustness, as before, we pollute the training images with a black rectangular block, which covers 10%, 20%, 30% and 40% of each image at a random position. In this situation, we use AUC and G-mean to measure the performance of all methods, which are both designed for unbalanced data. The results on original databases and noise databases are demonstrated in Figs. 4 and 5. From Figs. 4 and 5, we can see that the proposed 2DBLDA has the highest AUC and G-mean of all databases. Though the larger the noise area is the lower the performance is for all algorithms, when the block percentage increases, 2DBLDA and L1-2DLDA are less affected by noise, while the performance of other methods decreases dramatically and the proposed 2DBLDA is the best. The result is in fact consistent with the formulation of 2DBLDA, where its weighted between-class distance information and weighting constant of the between-class distance and the within-class distance make contribution to its good performance on unbalanced problems. The result also shows that compared to other methods, our 2DBLDA is more adaptive and robust to different data.

4.2 Face Reconstruction

In this part, the proposed 2DBLDA and other methods are applied to face reconstruction on the Indian female database. The Indian females database contains 242 human face images of 22 female individuals, and each individual has 11 different images. The original images are resized to 32×32 pixels.

We introduce face image reconstruction. For a given image $\textbf {X}\in \mathbb {R}^{d_{1}\times d_{2}}$, suppose we have obtained a projection matrix $\textbf {W}=\left (\textbf {w}_{1},\textbf {w}_{2},\ldots ,\textbf {w}_{r}\right )\in \mathbb {R}^{d_{1}\times r}$, r ≤ d₁. Then X is projected into the r × d₂-dimensional space by $\widetilde {\textbf {X}}=\textbf {W}^{T}\textbf {X}$. Since w₁,w₂,…,w_r are orthonormal, then the reconstructed image of X can be obtained by $\widehat {\textbf {X}}=\textbf {W}\widetilde {\textbf {X}}=\textbf {W}\textbf {W}^{T}\textbf {X}$. To measure the reconstruction performance, we use the average reconstruction error (ARE) as a performance indicator, which is defined as

$$ \bar{e}_{r}=\frac{1}{N}\sum\limits_{i=1}^{N}||\textbf{X}_{i}-\textbf{W}\textbf{W}^{T}\textbf{X}_{i}||_{F}, $$

(17)

where r = 1,2,…,d₁.

We first experiment on the original data and compute the ARE for each method. The variation in ARE along different dimensions is shown in Fig. 6 (a). From the figure, we see that when the dimension is less than 15, our 2DBLDA performs the best, especially when the dimension is greater than 5. When the dimension is greater than 15, 2DPCA is comparable or slightly better than our 2DBLDA, but both of these methods almost achieve steady performance. The result shows that 2DBLDA can achieve good performance for low dimensions. The other three methods obviously perform worse than our 2DBLDA and 2DPCA on all the dimensions. When r = 15, we demonstrate the reconstructed face images for 7 random individuals in Fig. 6b. We can visually see that 2DBLDA and 2DPCA have the best reconstruction performance.

To further evaluate the effectiveness of the proposed 2DBLDA, we add two different types of noise to the data. The first type of noise is Gaussian noise with mean 0 and variance 0.05 that covers 30% of the area of each image. The ARE of each method under different dimensions is plotted in Fig. 6c. On Gaussian noise data, we see that our 2DBLDA outperforms other methods on almost all the reduced dimensions, and 2DPCA is comparable to our 2DBLDA only when the dimension is greater than 27, indicating that the proposed 2DBLDA can achieve fairly good performance by employing only a small number of reduced dimensions. We then add the second type of noise, dummy noise, to the data. Here, the dummy noise is the image that is generated from the discrete uniform distribution on [0,1] and is of the same size as the original image. An additional 100 dummy images are added to the whole database. After the projection matrix is obtained on these polluted data, it is used to reconstruct human face images. The result in Fig. 6e demonstrates that our 2DBLDA has the lowest ARE on these databases for all the dimensions, and when the dimension is greater than 20, it has a rather low ARE. The reconstructed face images when r = 15 shown in Fig. 6f also support the above argument.

5 Discussion

To further clarify the contribution of our method, we discuss the differences between the proposed 2DBLDA and its two closely related methods, RLp2DLDA and L2BLDA, and give a detailed analysis of the experimental results.

5.1 Relationship between RLp2DLDA, L2BLDA and 2DBLDA

(i)
Difference From RLp2DLDA: The formulation of 2DBLDA is different from any existing 2D linear discriminant analysis method, and the 2DBLDA criterion is derived by minimizing an upper bound of the theoretical framework of the Bhattacharyya error bound optimality. Although robust bilateral Lp-norm two-dimensional linear discriminant analysis (RLp2DLDA) is also derived from some upper bound of the Bhattacharyya error, they have different formulations since they have different error bounds. In fact, the bound for 2DBLDA may be closer than the bound of RLp2DLDA, which can be observed from two aspects. First, when deriving its bound, RLp2DLDA ignores the term $\sqrt {P_{i}P_{j}}$ and replaces it by 1, which obviously magnifies the upper bound. In contrast, our 2DBLDA keeps this term and fully explores this weighting information, which leads to one of good properties of 2DBLDA, that is, robustness. Second, RLp2DLDA also magnifies its upper bound when using the Lp-norm (0 < p < 1) rather than the L2-norm. Therefore, this results in two advantages of our 2DBLDA over RLp2DLDA: one is that 2DBLDA obtains a meaningful weighting parameter that does not need tuning, and the other is that 2DBLDA can simply solve its optimization problem through a standard eigenvalue problem, while RLp2DLDA solves its optimization problem through an iteration technique without proving its convergence.
(ii)
Difference From L2BLDA: Compared to the vector-based robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm (L2BLDA), the proposed 2DBLDA is a matrix-based dimensionality reduction method. Although 2DBLDA is a generalization of L2BLDA, it is not so direct from view of the derivation of its upper bound. In fact, the derivation procedure of the Bhattacharyya error bound of 2DBLDA is not exactly the same as that of L2BLDA. In addition, 2DBLDA can more effectively deal with the matrix input without vectoring it first, which improves the computing efficiency, especially when computing the scatter matrices.

5.2 Experimental results summary

(i)
To study the performance of 2DBLDA, we give the variation of accuracies under different databases and different noise levels. The time of 2DBLDA is also investigated in Tables 1-2. Experimental results show that 2DBLDA runs fast and improves the robustness of 2DLDA.
(ii)
To compare the behavior of 2DBLDA and other related methods under different reduced dimensions, we plot the accuracy variation along with the reduced dimensions in Figs. 2-3. The results demonstrate that compared to other methods, the proposed 2DBLDA obtains better classification results under its optimal reduced dimension.
(iii)
To see the application ability of 2DBLDA in unbalanced classes, we experiment on three original and different noise image databases. From the results in Fig. 4 and Fig. 5, we see that the proposed 2DBLDA has the best performance compared to other methods.
(iv)
To observe the behavior of the proposed method visually, we reconstruct face images by the obtained projection matrix. Original and polluted Indian female databases are used for face reconstruction. By choosing an appropriate reduced dimension but not necessarily too large, the proposed 2DBLDA can obtain good face reconstruction performance.

6 Conclusion

This paper proposed a novel two-dimensional linear discriminant analysis via Bhattacharyya upper bound optimality (2DBLDA). Different from the existing 2DLDA, optimizing the criterion of 2DBLDA was equivalent to optimizing the upper bound of the Bhattacharyya error, leading to maximizing a weighted between-class distance and minimizing the within-class distance, where these two distances were weighted by a meaningful adaptive constant that can be computed directly from the involved data. The 2DBLDA had no parameters to be tuned and could be effectively solved by a standard eigenvalue decomposition problem. Experimental results on image recognition and face image reconstruction demonstrated the superiority of the proposed method. Our MATLAB code can be downloaded from http://www.optimal-group.org/Resources/Code/2DBLDA.html.

However, a drawback of 2DBLDA is that its classification performance degrades when the class distribution of the samples is inconsistent. A TAISL technique could be used to handle this issue [38]. Since sparse learning could make the data have better interpretation after dimensionality reduction [20], one of the future studies also includes considering a sparse model. In the end, applying our algorithm to track fault detection is worth studying [39, 40].

Notes

References

Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, New York
MATH Google Scholar
Shah JH, Sharif M, Yasmin M, et al. (2020) Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn Lett 139:166–173
Article Google Scholar
Ouyang AJ, Liu YM, Pei SY, et al. (2020) A hybrid improved kernel LDA and PNN algorithm for efficient face recognition. Neurocomputing 393:214–222
Article Google Scholar
Miha P, Vili P (2017) Text classification method based on self training and LDA topic models. Expert Systems with Applications. 80:83–93
Article Google Scholar
Chen Y, Zhang H, Liu R, et al. (2019) Experimental explorations on short text topic mining between LDA and NMF based schemes. Knowl-Based Syst 163:1–13
Article Google Scholar
Cao G, Iosifidis A, Gabbouj M, et al. (2017) Multi view nonparametric discriminant analysis for image retrieval and recognition. IEEE Signal Process Lett 24(10):1537–1541
Article Google Scholar
Liu Z, Zhang CM, Chen CX (2018) MMDF LDA An improved multi modal latent dirichlet allocation model for social image annotation. Expert Syst Appl 104:168–184
Article Google Scholar
Wang H, Fan Y, Fang B, et al. (2018) Generalized linear discriminant analysis based on euclidean norm for gait recognition. Int J Mach Learn Cybern 9(4):569–576
Article Google Scholar
Dong K, Zhao H, Tong T, et al. (2016) NBLDA: Negative binomial linear discriminant analysis for RNA-seq data. BMC Bioinforma 17(1):1–10
Article Google Scholar
Ibrahim W, Abadeh MS (2019) Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis. Neural Comput Appl 31(8):4201–4214
Article Google Scholar
Sharma A, Paliwal KK (2015) Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern 6(3):443–454
Article Google Scholar
Li X, Pang Y, Yuan Y (2010) L1-norm-based 2DPCA. IEEE Trans Syst Man Cybern Part B (Cybern) 40(4):1170–1175
Article Google Scholar
Mi JX, Zhang YN, Li Y, et al. (2020) Generalized two-dimensional PCA based on ℓ₂-norm minimization. Int J Mach Learn Cybern 11:2421–2438
Article Google Scholar
Lu Y, Yuan C, Lai Z, et al. (2019) Horizontal and vertical nuclear norm based 2DLDA for image representation. IEEE Trans Circ Syst Video Technol 29(4):941–955
Article Google Scholar
Zhao M, Jia ZG, Cai YF, et al. (2021) Advanced variations of two-dimensional principal component analysis for face recognition. Neurocomputing 452:653–664
Article Google Scholar
Li M, Yuan B (2005) 2D-LDA: a statistical linear discriminant analysis for image matrix. Pattern Recogn Lett 26(5):527–532
Article Google Scholar
Imani M, Ghassemian H (2015) Two dimensional linear discriminant analyses for hyperspectral data. Photogr Eng Remote Sens 81(10):777–786
Article Google Scholar
Chen SB, Chen DR, Luo B (2015) L1-norm based two-dimensional linear discriminant analysis. J Electron Inf Technol 37(6):1372–1377
Google Scholar
Li CN, Shao YH, Deng NY (2015) Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw 65:92–104
Article Google Scholar
Li CN, Shang MQ, Shao YH, et al. (2019) Sparse L1-norm two dimensional linear discriminant analysis via the generalized elastic net regularization. Neurocomputing 337:80–96
Article Google Scholar
Li M, Wang J, Wang Q, et al. (2017) Trace ratio 2DLDA with L1-norm optimization. Neurocomputing 266(29):216–225
Article Google Scholar
Lu Y, Yuan C, Lai Z, et al. (2018) Horizontal and vertical nuclear norm-based 2DLDA for image representation. IEEE Trans Circ Syst Video Technol 29(4):941–955
Article Google Scholar
Zhang P, Deng S, Nie F, et al. (2019) Nuclear-norm based 2DLDA with application to face recognition. Neurocomputing 339:94–104
Article Google Scholar
Li CN, Shao YH, Chen WJ, et al. (2021) Generalized two-dimensional linear discriminant analysis with regularization. Neural Netw 142:73–91
Article Google Scholar
Li CN, Shao YH, Wang Z, et al. (2019) Robust bilateral Lp-norm two-dimensional linear discriminant analysis. Inf Sci 500:274–297
Article MathSciNet Google Scholar
Du H, Zhao Z, Wang S, et al. (2017) Two-dimensional discriminant analysis based on Schatten p-norm for image feature extraction. J Vis Commun Image Represent 45:87–94
Article Google Scholar
Lee YP (2015) Palm vein recognition based on a modified (2d)²LDA. Signal Image Video Process 9(1):229–242
Article Google Scholar
Liu X, Cao Y, Cao Y, et al. (2015) Novel method fusing (2d)²LDA with multichannel model for face recognition. J Harbin Inst Technol 22(6):110–114
Google Scholar
Wang Q, Qin Z, Nie F, et al. (2017) Convolutional 2DLDA for nonlinear dimensionality reduction. Int Joint Conf Artif Intell:2929–2935
Xiao X, Chen Y, Gong YJ, et al. (2019) Two-dimensional quaternion sparse discriminant analysis. IEEE Trans Image Process 29:2271–2286
Article Google Scholar
Li CN, Shao YH, Wang Z, et al. (2019) Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm. Knowl-Based Syst 183:104858
Article Google Scholar
Nielsen F (2014) Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. Pattern Recogn Lett 42:25–34
Article Google Scholar
Guo YR, Bai YQ, Li CN, et al. (2021) Reverse nearest neighbors Bhattacharyya bound linear discriminant analysis for multimodal classification. Eng Appl Artif Intell 97:104033
Article Google Scholar
Yang J, Zhang D, Frangi AF, et al. (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Article Google Scholar
Ayed IB, Punithakumar K, Li S (2015) Distribution matching with the Bhattacharyya similarity: A Bound Optimization Framework. IEEE Trans Pattern Anal Mach Intell 37(9):1777–1791
Article Google Scholar
Jiang B, Zhu B (2021) Dynamic Bhattacharyya bound based approach for fault classification in industrial processes. IEEE Transactions on Industrial Informatics
Gyamfi KS, Brusey J, Hunt A, et al. (2018) Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics. Expert Syst Appl 91:252–262
Article Google Scholar
Hu CF, Wang YX, Gu JW (2020) Cross domain intelligent fault classification of bearings based on tensor aligned invariant subspace learning and two dimensional convolutional neural networks. Knowl-Based Syst 209:106214
Article Google Scholar
Hu CF, Wang YX (2019) Multidimensional denoising of rotating machine based on tensor factorization. Mech Syst Signal Process 122:273–289
Article Google Scholar
Hu CF, He SL, Wang YX (2021) A classification method to detect faults in a rotating machinery based on kernelled support tensor machine and multilinear principal component analysis. Appl Intell 51:2609–2621
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Hainan Provincial Natural Science Foundation of China (No. 620QN234 and No. 120RC449) and the National Natural Science Foundation of China (No. 12171307, No. 62066012, No. 61703370, No. 11871183, No. 61866010, No. 11771275 and No. 6210021509), in part by Zhejiang Soft Science Research Project (2021C35003) and in part by the Natural Science Foundation of Inner Mongolia Autonomous Region (No. 2019BS01009).

Author information

Authors and Affiliations

Department of Mathematics, Shanghai University, Shanghai, 200444, People’s Republic of China
Yan-Ru Guo & Yan-Qin Bai
Management School, Hainan University, Haikou, 570228, People’s Republic of China
Chun-Na Li & Yuan-Hai Shao
School of Mathematical Sciences, Inner Mongolia University, Hohhot, 010021, People’s Republic of China
Lan Bai

Authors

Yan-Ru Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Qin Bai
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Na Li
View author publications
You can also search for this author in PubMed Google Scholar
Lan Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Hai Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Na Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Proposition 1:

We first note that $p_{i}(\widetilde {{\textbf {X}}})=\mathcal {N}(\widetilde {{\textbf {X}}}|\widetilde {{\overline {\textbf {X}}}}_{i}, \widetilde {\boldsymbol {{\varSigma }}})$, where $\widetilde {{\overline {\textbf {X}}}}_{i}=\textbf {w}^{T}\overline {\textbf {X}}_{i}\in \mathbb {R}^{1\times d_{2}}$ is the i-class mean, and $\widetilde {\boldsymbol {{\varSigma }}}$ is the covariance matrix in the 1 × d₂ projected space. Denote

$$ \textbf{D} = \left( \begin{array}{cc}\textbf{w}^{T}\textbf{X}_{1}\\ \vdots\\\textbf{w}^{T}\textbf{X}_{N} \end{array}\right)^{T}\!\in\!\mathbb{R}^{d_{2}\times N} ~\text{and}~\widetilde{\overline{\textbf{X}}}_{\textbf{I}} = \left( \begin{array}{cc}\textbf{w}^{T}\overline{\textbf{X}}_{t_{1}}\\ \vdots\\\textbf{w}^{T}\overline{\textbf{X}}_{t_{N}} \end{array}\right)^{T}\!\in\!\mathbb{R}^{d_{2}\times N}. $$

(18)

Then $\widetilde {\boldsymbol {\Sigma }}=(\textbf {D}-\widetilde {\overline {\textbf {X}}}_{\textbf {I}}) (\textbf {D}-\widetilde {\overline {\textbf {X}}}_{\textbf {I}})^{T}$.

According to [1], we have

$$ \int\sqrt{p_{i}(\widetilde{\textbf{X}})p_{j}(\widetilde{\textbf{X}})}= e^{-\frac{1}{8}(\widetilde{{\overline{\textbf{X}}}_{i}}- \widetilde{{\overline{\textbf{X}}}_{j}}) \widetilde{\boldsymbol{{\varSigma}}}^{-1}(\widetilde{{\overline{\textbf{X}}}_{i}}-\widetilde{{\overline{\textbf{X}}}_{j}})^{T}}. $$

(19)

The upper bound of the error 𝜖_B can be estimated as

$$ \begin{array}{@{}rcl@{}} \epsilon_{B} &=&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} e^{-\frac{1}{8}(\widetilde{{\overline{\textbf{X}}}_{i}}-\widetilde{{\overline{\textbf{X}}}_{j}})\widetilde{\boldsymbol{{\varSigma}}}^{-1}(\widetilde{{\overline{\textbf{X}}}_{i}}-\widetilde{{\overline{\textbf{X}}}_{j}})^{T}}\\ &=&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} e^{-\frac{1}{8}||(\widetilde{{\overline{\textbf{X}}}_{i}}-\widetilde{{\overline{\textbf{X}}}_{j}})\widetilde{\boldsymbol{{\varSigma}}}^{-\frac{1}{2}}||_{2}^{2}}\\ &\leq&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} \left( 1-\frac{a}{8}||(\widetilde{{\overline{\textbf{X}}}_{i}}-\widetilde{{\overline{\textbf{X}}}_{j}})\widetilde{\boldsymbol{{\varSigma}}}^{-\frac{1}{2}}||_{2}^{2}\right)\\ &=&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} -\frac{a}{8}\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}\cdot ||(\textbf{w}^{T}{{\overline{\textbf{X}}}_{i}}-\textbf{w}^{T}{{\overline{\textbf{X}}}_{j}})\widetilde{\boldsymbol{{\varSigma}}}^{-\frac{1}{2}}||_{2}^{2}\\ &\leq&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} -\frac{a}{8}\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}\cdot \frac{||(\textbf{w}^{T}{{\overline{\textbf{X}}}_{i}}-\textbf{w}^{T}{{\overline{\textbf{X}}}_{j}})||_{2}^{2}} {||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}}\\ &\leq&\sum\limits_{i<j}^{c} \sqrt{P_{i}P_{j}} -\frac{a}{8}\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}\cdot ||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}\\ &&~~~+\frac{a}{8}\sum\limits_{i<j}^{c}\sqrt{P_{i}P_{j}}\cdot {\varDelta}_{ij}^{\prime}||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}, \end{array} $$

(20)

where ${\varDelta }_{ij}^{\prime }= \frac {1}{4}||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2}$, a > 0 is some constant. For the first inequality of (20), note that the real value function f(z) = e^−z is concave when z ∈ [0,b], b > 0; therefore, $e^{-z}\leq 1-\frac {1-e^{-b}}{b}z$. By taking $a=\frac {1-e^{-b}}{b}$ and noting $\widetilde {{\overline {\textbf {X}}}_{i}}=\textbf {w}^{T}{\overline {\textbf {X}}}_{i}$, the first inequality is obtained. For the second inequality, we first note that for any $\textbf {z}\in \mathbb {R}^{1\times {d_{2}}}$ and an invertible $\textbf {A}\in \mathbb {R}^{d_{2}\times d_{2}}$, $||\textbf {z}||_{2}=||(\textbf {z}\textbf {A})\textbf {A}^{-1}||_{2}\leq ||\textbf {z}\textbf {A}||_{2}\cdot ||\textbf {A}^{-1}||_{F}$, which implies $||\textbf {z}\textbf {A}||_{2}\geq \frac {||\textbf {z}||_{2}}{||\textbf {A}^{-1}||_{F}}$. By taking $\textbf {z}=\textbf {w}^{T}{{\overline {\textbf {X}}}_{i}}-\textbf {w}^{T}{{\overline {\textbf {X}}}_{j}}$ and $\textbf {A}=\widetilde {\boldsymbol {{\varSigma }}}^{-\frac {1}{2}}$, we get the second inequality. For the last inequality, since ||w||₂ = 1, $||\textbf {w}^{T}({\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}})||_{2}^{2}\leq ||\textbf {w}||_{2}^{2}\cdot ||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2} = ||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2}$ and $\frac {1}{||\widetilde {\boldsymbol {{\varSigma }}}^{\frac {1}{2}}||_{F}^{2}} \left (1-\frac {1}{||\widetilde {\boldsymbol {{\varSigma }}}^{\frac {1}{2}}||_{F}^{2}}\right )\leq \frac {1}{4}$, we have

$$ \begin{array}{@{}rcl@{}} &&\left( ||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2} -\frac{||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}} {||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}}\right)\cdot\frac{1}{||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}}\\ &=&||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}\cdot\frac{1}{||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}} \left( 1-\frac{1}{||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}}\right)\\ &\leq &\frac{1}{4}||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}\\ &\leq &\frac{1}{4}||{\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}}||_{F}^{2}\\ &= &{\varDelta}_{ij}^{\prime}. \end{array} $$

(21)

which implies

$$ -\frac{||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2}} {||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}} \leq -||\textbf{w}^{T}({\overline{\textbf{X}}_{i}-\overline{\textbf{X}}_{j}})||_{2}^{2} +{\varDelta}_{ij}^{\prime}\cdot ||\widetilde{\boldsymbol{{\varSigma}}}^{\frac{1}{2}}||_{F}^{2}. $$

(22)

By multiplying $\frac {a}{8}\sqrt {P_{i}P_{j}}$ to both sides of (22) and summing it over all 1 ≤ i < j ≤ c, we obtain the last inequality of (20).

Take ${\varDelta }=\sum \limits _{i<j}^{c}\sqrt {P_{i}P_{j}}{\varDelta }_{ij}^{\prime }= \frac {1}{4}\sum \limits _{i<j}^{c}\sqrt {P_{i}P_{j}}||{\overline {\textbf {X}}_{i}-\overline {\textbf {X}}_{j}}||_{F}^{2}$, and note that $||\widetilde {\boldsymbol {{\varSigma }}}^{\frac {1}{2}}||_{F}^{2}={\sum }_{i=1}^{c}{\sum }_{s=1}^{N_{i}}||\textbf {w}^{T}(\textbf {X}_{is}-\overline {\textbf {X}}_{i})||_{2}^{2}$, we then obtain (12). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, YR., Bai, YQ., Li, CN. et al. Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications. Appl Intell 52, 8793–8809 (2022). https://doi.org/10.1007/s10489-021-02843-z

Download citation

Accepted: 12 September 2021
Published: 05 November 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10489-021-02843-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications

Abstract

Similar content being viewed by others

Robust Two-Dimensional Linear Discriminant Analysis via Information Divergence

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Polynomial linear discriminant analysis

1 Introduction

2 Related work

2.1 Linear discriminant analysis

2.2 L2-norm linear discriminant analysis criterion via the Bhattacharyya error bound estimation

2.3 Two-dimensional linear discriminant analysis

3 Two-dimensional Bhattacharyya bound linear discriminant analysis

3.1 The derivation of a Bhattacharyya error bound estimation

Proposition 1

Proof

3.2 The proposed two-dimensional Bhattacharyya bound linear discriminant analysis

4 Experiments

4.1 Image recognition

4.1.1 Performance on three image databases

4.1.2 The influence of the reduced dimension

4.1.3 The influence of the unbalanced classes

4.2 Face Reconstruction

5 Discussion

5.1 Relationship between RLp2DLDA, L2BLDA and 2DBLDA

5.2 Experimental results summary

6 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Proof of Proposition 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation