Improved method for correcting sample Mahalanobis distance without estimating population eigenvalues or eigenvectors of covariance matrix

The recognition performance of the sample Mahalanobis distance (SMD) deteriorates as the number of learning samples decreases. Therefore, it is important to correct the SMD for a population Mahalanobis distance (PMD) such that it becomes equivalent to the case of infinite learning samples. In order to reduce the computation time and cost for this main purpose, this paper presents a correction method that does not require the estimation of the population eigenvalues or eigenvectors of the covariance matrix. In short, this method only requires the sample eigenvalues of the covariance matrix, number of learning samples, and dimensionality to correct the SMD for the PMD. This method involves the summation of the SMD’s principal components (each of which is divided by its expectation obtained using the delta method), Lawley’s bias estimation, and the variances of the sample eigenvectors. A numerical experiment demonstrates that this method works well for various cases of learning sample number, dimensionality, population eigenvalues sequence, and non-centrality. The application of this method also shows improved performance of estimating a Gaussian mixture model using the expectation–maximization algorithm.


Introduction
The Mahalanobis distance (MD) has been used for statistical learning as a basic discriminator under multidimensional normal distributions [1,2]. The MD can be easily defined using the covariance matrix of learning samples and can be calculated faster and more easily than neural networks (NNs) and support vector machines (SVMs). The MD is still widely used for industrial and various other purposes (see a recent example [3]), and there have been cases in which the MD was said to have better recognition performance than NNs and SVMs if there are few learning samples [4][5][6][7]. Usually, the population covariance matrix Σ of the learning samples is unknown in advance. Therefore, Hotelling's distance T 2 (sample MD, SMD, or T 2 ) for a p-variate test vector y is defined as B Yasuyuki Kobayashi ykoba@ics.teikyo-u.ac.jp 1 Faculty of Science and Engineering, Teikyo University, 1-1 Toyosato-dai, Utsunomiya 320-8551, Japan where the sample covariance matrix S and sample mean vectorx are calculated from the p-variate learning sample vectors x 1 , . . . , x n that independently follow a p-variate normal distribution N p (μ, Σ) with population mean μ and population covariance matrix Σ. T 2 follows an Fdistribution F( p, n − p) with p and n − p degrees of freedom if y also follows the same distribution N p (μ, Σ) independently [8]. T 2 follows the F-distribution as in [9]: If μ and Σ are known in advance, the population MD (PMD or D 2 ) can be defined as and D 2 follows a chi-squared distribution χ 2 ( p) with p degrees of freedom if y and x 1 , . . . , x n follow the distribution N p (μ, Σ) independently [8]. T 2 has a broader distribution than D 2 , and the recognition performance of T 2 is worse than that of D 2 as n decreases and approaches p. This is because the sample eigenvalues of S have a bias from a lack of sufficient learning samples. The large sample eigenvalues tend to be larger than the corresponding population eigenvalues of Σ, whereas the small sample eigenvalues tend to be smaller than the corresponding population eigenvalues. Therefore, some correction must be applied to T 2 in order to improve the recognition performance.
To date, several studies have been conducted to improve T 2 . In order to reduce the bias of the sample eigenvalues of S, S + λI is used with a regularizing constant λ to regularize T 2 [10,11]. For a regularized T 2 , the constant λ is often experimentally determined on an ad hoc basis using a cross-validation method. For example, another method involves determining λ as the average of the p − m smallest sample eigenvalues, where m is determined experimentally [12]. Recently, a kernel using a regularized MD for an SVM was proposed [13]. The distribution of the regularized T 2 obtained from a numerical experiment is unknown, so performing a theoretical analysis of the regularized T 2 is difficult.
Another method involves estimating the unknown Σ using S. The substitution of the estimated population eigenvalues λ i of Σ for the sample eigenvalues l i of S in T 2 results in an improvement in the recognition performance of T 2 [14]. Furthermore, the estimated λ i obtained based on the estimated error of the sample eigenvectors of S using a Monte Carlo simulation provides better performance than when only estimating λ i [15]. However, the Monte Carlo simulation requires significant computation cost and time. In order to reduce the computation cost and time, the expansion of the factors of the inversed sample eigenvalues, i.e., the eigenvalues of S −1 , in T 2 to the second-order Taylor expansion with the variances of the sample eigenvalues obtained from the measurement can be used to correct T 2 [16]. However, there exists a case in which the proposed method provides a corrected S −1 that is not positive definite [16]. This method is related to the delta method in statistics [17]. A more precise application of the delta method to S −1 results in a formula for which the corrected S −1 is always positive definite, and the formula only requires the estimation of the value of λ i and ratio of λ i /E[l i ] to correct T 2 [18]. However, estimating the unknown λ i is difficult, and thus, a new correction formula that requires neither the estimation of the value of λ i nor the ratio of λ i /E[l i ] is proposed [19]. However, the recognition performance of this corrected T 2 is worse than that obtained in [18]. The proposed methods in [18,19] do not take the sample eigenvectors into consideration.
In recent years, a linear metric learning method known as Mahalanobis metric learning has been studied. Mahalanobis metric learning corresponds to a generalization of the MD and attempts to learn the distances of the form (x − y) A(x − y), where A is a positive-semidefinite matrix [20]. If A is equal to the inverse of Σ, the distance is equivalent to the MD. Mahalanobis metric learning optimizes A subject to similarity/dissimilarity constraints with regularizers of A (refer to [20]). The regularizers of A are classified into the following three types: Frobenius norm regularization A 2 F [21][22][23], linear regularization tr(AC) [24][25][26][27], and information theoretic metric learning tr(A) − log det A [28]. All of these could potentially provide better performance than the MD; however, they require A to be optimized iteratively using a complicated technique, and their distances with the optimized A have an unknown distribution. Therefore, estimating A from only the sample eigenvalues without optimizing the calculation is attractive if it can reduce computation cost and time.
This paper proposes a method for correcting T 2 for D 2 using only the sample eigenvalues of S to define T 2 with dimensionality p and number of the learning samples n. The proposed correction method does not require the population eigenvalues or eigenvectors to be estimated. Therefore, the proposed correction method is apparently easy to use as a discriminator while reducing the computation time and cost. Section 2 discusses the theory for deriving the proposed correction method. Section 3 presents the performance of the proposed correction method compared to that of previous methods using the Monte Carlo simulation. Further, an example of an application for the correction method to improve estimation of a Gaussian mixture model (GMM) by using the expectation-maximization (EM) algorithm is given. Section 4 summarizes the findings and concludes the paper. "Appendix A" lists the common notations used, and "Appendix B" shows the procedures of the simulations in the paper in details.

Theory
In this section, Sect. 2.1 introduces some background theories of T 2 and statistical approximation techniques used for correcting T 2 . Section 2.2 derives the proposed correction methods for T 2 considering the effect of both sample eigenvalues and sample eigenvectors of S. Section 2.3 shows previous relative correction methods with only the effect of sample eigenvalues. In order to show the superiority of the proposed correction methods by numerical experiments in Sects. 3, 2.4 summarizes the proposed correction methods and the previous ones.

Background
Here, let n be the number of learning vector samples x 1 , . . . , x n , and let a test vector sample y independently follow the p-multivariate normal distribution N p (μ, Σ) with a population mean vector μ and population covariance matrix Σ, where Σ has eigenvalues λ 1 ≥ · · · ≥ λ p and corresponding eigenvectors φ 1 , . . . , φ p . Then, the learning samples x 1 , . . . , x n give a sample covariance matrix , where S has eigenvalues l 1 ≥ · · · ≥ l p and the corresponding eigenvectors f 1 , . . . , f p . Definition 1 D 2 is defined by and decomposed with population eigenvalues λ i and eigenvectors φ i as In addition, T 2 is defined by and decomposed with sample eigenvalues l i and eigenvectors f i as T 2 is different from D 2 because the sample covariance matrix S that is calculated using finite learning samples to define T 2 has an estimation bias from the population covariance matrix Σ. The sample eigenvalues l i and sample eigenvectors f i are random variables corresponding to λ i and φ i (i = 1, . . . , p), respectively.
Lemma 1 [29] The biases for the expectation and variance of l i , such as where it is assumed that λ 1 > λ 2 > · · · > λ p .
As n → ∞, l i converges to λ i according to (6) and (7) in probability, and f i also converges to φ i according to (8) in probability. Therefore, T 2 converges to D 2 as n → ∞. However, it is difficult to calculate the exact distributions of l i and f i because their formulae contain zonal polynomials and hypergeometric functions [31]. Therefore, approximated distributions of l i and f i are proposed [32,33]. Furthermore, each principal component term of T 2 , t 2 i = {(y −x) · f i } 2 /l i in (5), also has a bias depending on the order of magnitude of the sample eigenvalue l i , which is different from the tdistribution; the distribution of t 2 i is more concentrated than that of t 2 j if i < j [34]. However, the exact distribution of t 2 i is unknown. Therefore, the approximated E[t 2 i ] will be considered and applied in Sect. 2.2.
Here, some approximation methods to estimate expectations are shown for use in Sect. 2.2.
Lemma 3 [17] The expectation of the function f (X , Y ) of two random variables X and Y can be approximated using the delta method up to the second order as follows: In statistics, the delta method is widely used to approximate the moments of a function of random variables. The delta method is an intuitive technique that involves the use of Taylor's expansion [17].

Proposed models
In order to correct the bias of T 2 more precisely, each t 2 i should be corrected according to its own bias.

Lemma 6
The bias of T 2 can be corrected by dividing each , the expectation of t 2 i : where i ] is therefore one by the analogy that the expectation of d 2 i is equal to one, where , each principal component term of D 2 in (4) follows χ 2 (1), a chi-squared distribution with one degree of freedom. However, it is difficult to calculate the exact distributions of l i and f i , and it is necessary to approximate E[t 2 i ] precisely. In order to apply the delta method for obtaining E[t 2 i ], let us assume that y −x, l i , and f i are random variables and φ i and λ i are not.
Theorem 1 If f i is given by a linear combination of φ i , using the delta method approximates t 2 i as where a i j denotes the coefficients of the linear combination of φ i , C i is the function of the squared a i j , and z i is a random variable that follows N (0, 1) for i = 1, . . . , p, j = i.

Proof
As n is generally large, let us suppose y −x follows the distribution N p (μ, Σ); y −x is then given by On applying (16) to t 2 i , we obtain According to (8), f i can be decomposed into a linear combination of φ j ( j = 1, . . . , p) with sufficiently small random coefficients a i j , i.e., In order to satisfy the condition that the sample eigenvector f i is a unit vector even if a i j is large, the normalized f i in (18) can be written as As φ i φ j = δ i j , i.e., the Kronecker delta, on applying (20) to (17), we obtain On applying (22) to (17), we obtain (15), where all the random variables C i , l i , z i , and a i j (i = 1, . . . , p, j = i) in (15) are independent, and thus, the covariance of each pair of random variables is zero and does not appear in the resultant formulae of the delta method obtained from (15).
Proof Let us suppose that the delta method is applied up to the second order for l i by using V [l i ] = 2λ 2 i /(n−1) in (7) and E[l i ], up to the zeroth order for a i j by using E[a i j ] = 0, up to the zeroth order for a 2 i j by using E[a 2 i j ], up to the zeroth order for z 2 i by using E[z 2 i ] = 1, up to the zeroth order for z i z j by using E[z i z j ] = 0, and up to the zeroth order for a i j a ik by using E[a i j a ik ] = 0. 1/l i in (15) is applied with the delta method up to the second order, and E[1/l i ] is given by (15) is applied with the delta method up to the zeroth order, and Thus, T 2 is approximately corrected by using (14) with (23). In the case of (23), if n → ∞, then E[l i ] → λ i , and thus Therefore, correcting T 2 by using (14) with (23) has no effect if n → ∞. The first term in (23), i.e., (25) indicates the effect of the bias and variance of l i in the denominator in (5). This shows that substituting λ i for l i in (5) is insufficient for correcting T 2 . Furthermore, the second term in (23), i.e., indicates the effect of the variances of f i , and the variance becomes large if the neighbors of λ i approach λ i . However, if λ i ∼ = λ j for only one pair of λ i and λ j , (26) approaches λ j /λ i and E[T 2 ] of (23) is given by the principal components for Therefore, (23) reflects all the effects of l i and f i in terms of all λ j ( j = 1, . . . , p). However, if there is only a restricted number of unknown learning samples, it is difficult to estimate λ i and the true E[l i ] precisely and to apply (23) to (14) for correcting T 2 . Hence, it would then be required to estimate λ i and the true E[l i ] approximately, with as low a cost as possible. Then, approximating λ i /E[l i ] as (11) or (12) can satisfy the above condition.

Theorem 3 The approximated E[t 2
i ] including the effect of the variance of sample eigenvectors f i are given by or Both (28) and (29) can be calculated without estimating λ i or E[l i ].
Thus, T 2 can be corrected without estimating λ i or φ i by applying E[t 2 i ] from (28) or (29) to (14). It is necessary to compare the performance of (28) and (29) with that of models without the effect of the variance of f i .

Corollary 1 An approximated E[t 2
i ] not including the effect of the variance of f i is given by Proof (30) is given by omitting (26) from (29). (30) includes no correction from the effects of f i , as it is assumed that ppl. smd. pd Sd fSd Ld fLd Applying (28) and (29) to (14) can correct the bias of T 2 (5) from the randomness of l i and f i for D 2 (4). However, (30) can correct the bias of T 2 (5) from the randomness of l i .

Related models as f i ∼ = i in previous studies
Corollary 2 [18] which is the first factor on the right side of (23).
However, it is unusual for the true values of λ i to be known in advance.

Corollary 3 [19]
Without estimating λ i , by applying (11) to (31), another approximated E[t 2 i ] assuming f i ∼ = φ i is as follows: On applying (32) to (14), T 2 is corrected well such that the recognition performance of the corrected T 2 is improved from that of the uncorrected T 2 . However, a more accurate correction of T 2 can be obtained using (31) than that obtained using (32) if the true values of λ i are given in advance. It has been reported that applying the delta method for t 2 i up to the third or higher order does not improve the recognition performance of the corrected T 2 [19]. Table 1 summarizes the formulae of D 2 , uncorrected T 2 , and the corrected T 2 (CT 2 ) that are proposed above. The abbreviations in Table 1 are also shown to indicate the corresponding D 2 or T 2 in Sect. 3, and each ct i in Table 1 is the corrected studentized principal component score, i.e., the principal component score divided by √ l i applied with the proposed E[t 2 i ] in the CT 2 . Table 1 indicates each method as follows:
The following methods from "fSd" to "fLd" are proposed in this paper for the first time. -"fSd" (37) corresponds to the CT 2 using (28) in Theorem 3, assuming that f i = φ i . -"Ld" (38) corresponds to the CT 2 using (30) in Corollary 1, assuming that f i ∼ = φ i . -"fLd" (39) corresponds to the CT 2 using (29) in Theorem 3, assuming that f i = φ i .

Monte Carlo simulation of MDs
In the numerical experiments, the dependence of the corrected MDs on the non-centrality δ, number of learning samples n, and dimensionality p is considered using Monte Carlo simulation. The detailed procedure of the Monte Carlo simulation is shown in "Appendix B.1". Furthermore, the dependence on the population eigenvalues λ i should be considered because the correcting methods of fSd (37), Ld (38), and fLd (39) derived from (6) and (8) are based on the assumption that λ 1 > λ 2 > · · · > λ p . Thus, two sequence types of λ i (i = 1, . . . , p) are considered. Sequence 1 is a geometric sequence between λ 1 = 1 and λ p = 10 −10 for λ 1 > λ 2 > · · · > λ p . Sequence 2 is a geometric sequence between λ 1 = 1 and λ p = 1 for λ 1 = λ 2 = · · · = λ p . Eqn. (26) does not hold for Sequence 2. However, (26) holds with the substitution of l i for λ i because the sample eigenvalue l i has a bias from λ i and no two values of l i are equal; therefore, fSd (37), Ld (38), and fLd (39) can hold even if λ 1 = λ 2 = · · · = λ p . In order to eliminate as much of the sample error as possible from the empirical PDCs, n Σ is fixed as 10,000 and n y is fixed as 1,000. Therefore, the number of test samples is equal to n Σ · n y = 10, 000, 000. The MDs considered are listed in Table 1: D 2 (ppl.), T 2 (smp.), and five of the corrected MDs (pd, Sd, fSd, Ld, and fLd).

Results of sequence 1 ( 1 > · · · > p )
In order to compare the correcting performance of T 2 for D 2 , the Hellinger distance (44) of the MDs from D 2 (ppl.) is one of the most accessible metrics. Figures 1, 2 and 3 show the dependence of the Hellinger distances from D 2 on the non-centrality δ, number of learning samples n, and dimensionality p, respectively. The conditions for δ, n, and p in each case are written in the respective figures. Figure 1 shows that, as the non-centrality δ increases, all the distances also increase gradually. Figure 2 shows that, as n increases, all the distances decrease rapidly. Figure 3 shows that, as p increases, all the distances also increase gradually. Overall, all the corrected MDs are closer to D 2 than T 2 . In particular,  Here, we confirm that the model fLd (39) is the best corrected MD if λ i is unknown in advance. Figure 4 shows the PDCs of the MDs. The PDCs of fLd (39) and pd (35) are the closest to that of ppl. (D 2 ) (33). Figures 5 and 6 show the dependences of the expectation and the variance on the non-centrality δ. Both of fLd (39) and pd (35) are the closest to that of ppl. (33). However, the variances of both fLd (39) and pd (35) are still larger than that of ppl. (33). This implies that neither the fLd (39) nor pd (35) model can perfectly correct the right long tails of the PDCs, such as that of ppl. (33), as seen in Fig. 4. Figure 7 shows the misrecognition rates (α + β) min of the MDs. (α + β) min of pd (35) is the second smallest of all, and that of fLd (39) is the third smallest. However, the difference between the two is small in Fig. 7. Figure 8 shows the threshold Θ at (α + β) min of the MDs. The thresholds of ppl. (33), fLd (39), and pd (35) are almost the same and the smallest in Fig. 8. Therefore, the model

Results of sequence 2 ( 1 = · · · = p )
As mentioned in the preface in Sect. 3.2, an investigation of the case in which the assumption that λ 1 > λ 2 > · · · > λ p does not hold is required. Figures 9, 10 and 11 show the dependence of the Hellinger distances from D 2 on the non-centrality δ, number of learning samples n, and dimensionality p, respectively. Figure 9 shows that as the non-centrality δ increases, all the distances also increase gradually. However, the Hellinger distance of pd (35) in Fig. 9 is larger than that of fLd (39) in Sequence 2, in contrast to Fig. 1 for Sequence 1. This is because the PDC of pd (35) is overbiased to the left compared to that of ppl. (33) in Fig. 12. Figure 13 shows that the misrecognition rate (α + β) min of pd (35) is closer to that of ppl. (33) than that of fLd (39).
However, Fig. 14 shows that the threshold Θ of pd (35) is smaller than that of ppl. (33) because pd (35) is overbiased in Fig. 12. The relation of the Hellinger distance between pd (35) and fLd (39) becomes reversed as n increases in Fig. 10 or p increases in Fig. 11. This is because the PDC of pd (35) becomes closer to that of ppl.  increases (graphs of the PDCs are not shown in this paper). However, pd (35) is difficult to calculate if λ i is unknown in advance. Therefore, the model fLd (39) can also correct T 2 for D 2 to improve the recognition performance of MDs if λ i or φ i is unknown in advance even in Sequence 2. From all the results of the numerical experiments, it is observed that the model fLd (39) can best correct T 2 for D 2 without estimating λ i or φ i with respect to the dependence on the non-centrality δ, number of learning samples n, and dimensionality p for both λ 1 > · · · > λ p and λ 1 = · · · = λ p . This is because fLd (39) corrects both the sample eigenvalue l i in the denominator and sample eigenvector f i in the numerator of the decomposed principal component score of T 2 . It is interesting that Lawley's approximation (12) appears to be better than Stain's approximation (11) for the estimation of λ i /E[l i ], even if λ 1 = · · · = λ p . However, the reason for this is still an open question.

Example of application
In order to evaluate the performance of correcting T 2 for D 2 for an application, the model fLd (39) is applied to the EM algorithm for GMMs. A p-variate GMM with k components has the density as the sum of probability density functions of normal distribution N p (m, S): where θ = {(w j , m j , S j )} k j=1 , and w j is the probability that the observed sample x is generated by the jth component and satisfies w j > 0 and k j=1 w j = 1. The EM algorithm for a GMM [36] with k components given n independent and identically distributed samples After running the EM algorithm, a sample y is assigned to the component j for which (41) for a GMM, and (42) is a type of T 2 , and n j , which is the effective number of the learning samples of the component j, holds n j ≤ n. We contend that the well-known EM algorithm for a GMM (see [36]) does not consider that S j is different from Σ j if n j is not sufficiently large. Therefore, g(x|m j , S j ) tends to have a bias from a lack of sufficient learning samples, and the estimations of the EM algorithm become worse unless p n j . In order to reduce the bias, the application of fLd (39) (41) and (42) is investigated. The dataset "Wine recognition data" in Ref. [37] was used for classifying by means of the EM algorithm for a GMM. These data are the results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents ( p = 13) found in each of the three types of wines (k = 3). The numbers of instances are n 1 = 59 for class 1, n 2 = 71 for class 2, and n 3 = 48 for class 3 (total n = 178). As each class does not satisfy n j p, this dataset appears to have a bias from a lack of sufficient learning sam-ples. After running the EM algorithm, the sample eigenvalues l i of each component range from 10 4 to 10 −3 , which corresponds to Sequence 1 in Sect. 3.1.1. This suggests that fLd (39) is appropriate for the dataset. Table 2 summarizes the results. For log-likelihood (L.-L.) of p(x|θ) (40), the L.-L. of fLd (39) is larger than that of smd. (34). This means that by using fLd (39) the GMM can be estimated more correctly than by using smd. (34), i.e., the conventional SMD used in the EM algorithm for a GMM. Thus, fLd (39) improves the classification rate of Class 1 in comparison to smd. (34). However, smd. (34) works better than fLd (39) for the classification rate of Class 2. This is because the EM algorithm refines the estimates of the whole model of a GMM to improve its likelihood, not that of each component. Another method should be considered to improve the classification rate of each component.
Therefore, correcting T 2 for D 2 with fLd (39) can improve the estimation performance for models including formulae like T 2 , such as a GMM, by simply replacing T 2 with fLd (39) without estimating the true eigenvalues or eigenvectors of the covariance matrix in T 2 in advance. The above GMM example does not have singular sample covariance matrices. Therefore, the model fLd (39) can be applied to improve estimation performance. However, the EM algorithm for GMMs sometimes has a singularity problem; S j of a fitted component j has very small positive eigenvalues when the data of S j are in a low-dimensional subspace [36,38], in which case the model fLd (39) cannot correct T 2 . One of the methods to avoid the problem is regularizing S j to be a low-rank matrix [39]. Hence, the model fLd (39) should be extended for a low-rank S j in the future.

Conclusion
In order to correct the SMD or T 2 for PMD or D 2 , this paper presents a correction method (called fLd in this paper) that requires only the sample eigenvalues of the sample covariance matrix, dimensionality p, and number of learning samples n. The proposed method requires neither the estimation of the population eigenvalues nor eigenvectors nor the use of optimization or time-consuming methods such as the Monte Carlo method. The proposed method eliminates the bias of each principal component score of T 2 by dividing the principal component score by its expectation. By using the delta method in statistics, Lawley's bias estimation of the sample eigenvalues, and the variances of the sample eigenvectors, the approximated expectation can be obtained by using the sample eigenvalues, p, and n. A numerical experiment using the Monte Carlo method to generate test samples was conducted to investigate the correction performance of the proposed method under various conditions, such as noncentrality δ, n, p, and two types of population eigenvalue sequences: λ 1 > λ 2 > · · · > λ p and λ 1 = λ 2 = · · · = λ p . Although Lawley's bias estimation of the sample eigenvalues is based on the assumption that λ 1 > λ 2 > · · · > λ p , the proposed method substitutes the sample eigenvalues for the population eigenvalues in the formula and corrects T 2 well even if λ 1 = λ 2 = · · · = λ p . In addition, the dependence of the corrected or uncorrected T 2 on δ shows that the recognition performance, i.e., the minimum misrecognition rate of T 2 corrected using the proposed method is improved from that of the uncorrected T 2 . Furthermore, the application of the proposed method shows an improvement in estimating a GMM by using the EM algorithm. Thus, it is expected that the proposed model can be applied to other models including terms such as the SMD. However, the proposed method still has room to correct T 2 for D 2 perfectly. The PDC of the corrected T 2 is closer to that of D 2 than any other previously proposed corrected T 2 , although it does not coincide with that of D 2 . The minimum misrecognition rate of the corrected T 2 decreases but is not equal to that of D 2 . Moreover, the probability distribution of the corrected T 2 is currently theoretically unknown. The above result suggests that there is room for improvement in the elimination of the bias of T 2 using the proposed method. In order to improve the T 2 correction, some new ideas should be considered, such as theoretically elaborating the probability density function of each studentized principal component score of T 2 , improving Lawley's bias approximation, and the modeling of sample eigenvectors.  (i = 1, . . . , p). .....step 3) Generate Δ = (δ 1 , . . . , δ p ) randomly to satisfy Input: Predefine λ 1 , . . . , λ p , the population eigenvalues of Σ. All λ i satisfy λ 1 ≥ λ 2 ≥ · · · ≥ λ p . In order to avoid numerical errors in T 2 , all λ 1 , . . . , λ p must satisfy the condition (43), where ε F is the relative error of the real variables (ε F ∼ = 10 −16 for Excel VBA), ε is the relative error of the test sample vector y of T 2 , and F α ( p, n − p) is the upper 100α% point of the F distribution with p and n − p degrees of freedom [40]. (43) is a type of regulation for the condition number of Σ. In this study, it is assumed that ε = ε F and α = 0.05. Moreover, the non-centrality δ is predefined.
(3) S is decomposed into the sample eigenvalues l i and sample eigenvectors f i (i = 1, . . . , p). All l i satisfy l 1 ≥ l 2 ≥ · · · ≥ l p , and all f 1 , . . . , f p are orthogonal and normalized. (4) If δ > 0 for generating non-central test samples, Δ = (δ 1 , . . . , δ p ) is generated randomly to satisfy p i=1 δ 2 i = δ. (5) A population principal component score z is generated, which follows N p (0, I ), i.e., z = (z 1 , . . . , z p ) and z i ∼ N (0, 1) independently. D 2 is calculated as D 2 = (z + Δ) 2 . The total number of generated z is n Σ · n y , where n y is the number of generated y for each generated Σ. It has been confirmed with Q-Q plots prior to these experiments that the simulated D 2 follows a central chi-squared distribution with p degrees of freedom χ 2 ( p) if δ = 0 and a non-central chi-squared distribution with p degrees of freedom and non-centrality δ, χ 2 ( p; δ) if δ > 0. (6) y that follows N p (Δ, Σ) with z is generated as y = Φ diag( √ λ 1 , . . . , λ p )(z +Δ). The total number of generated y is also n Σ · n y . (7) t i = (y −x) · f i / √ l i (i = 1, . . . , p) is calculated, where t i is the studentized principal component score, i.e., the principal component of T 2 divided by the corresponding sample standard deviation.