Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations

Hayashi, Kenichi

doi:10.1007/s11634-016-0266-6

Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations

Regular Article
Published: 27 July 2016

Volume 12, pages 315–339, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Kenichi Hayashi¹

307 Accesses
1 Citation
Explore all metrics

Abstract

It has been reported that using unlabeled data together with labeled data to construct a discriminant function works successfully in practice. However, theoretical studies have implied that unlabeled data can sometimes adversely affect the performance of discriminant functions. Therefore, it is important to know what situations call for the use of unlabeled data. In this paper, asymptotic relative efficiency is presented as the measure for comparing analyses with and without unlabeled data under the heteroscedastic normality assumption. The linear discriminant function maximizing the area under the receiver operating characteristic curve is considered. Asymptotic relative efficiency is evaluated to investigate when and how unlabeled data contribute to improving discriminant performance under several conditions. The results show that asymptotic relative efficiency depends mainly on the heteroscedasticity of the covariance matrices and the stochastic structure of observing the labels of the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Semi-supervised Uncertain Linear Discriminant Analysis

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Article Open access 01 February 2016

Atinuke Adebanji, Michael Asamoah-Boaheng & Olivia Osei-Tutu

An Asymptotic Expansion for the Distribution of Euclidean Distance-Based Discriminant Function in Normal Populations

Article 31 August 2022

Tomoyuki Nakagawa & Shuntaro Ohtsuka

References

Airoldi J-P, Flury BD, Salvioni M (1995) Discrimination between two species of Microtus using both classified and unclassified observations. J Theor Biol 177:247–262
Article Google Scholar
Anderson TW, Bahadur RR (1962) Classification into two multivariate normal distributions with different covariance matrices. Ann Math Stat 33:420–431
Article MathSciNet MATH Google Scholar
Boldea O, Magnus JR (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549
Article MathSciNet MATH Google Scholar
Brefeld U, Scheffer T (2005) AUC maximizing support vector learning. In: Proceedings of ICML workshop on ROC Analysis in Machine Learning
Castelli V, Cover TM (1996) The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Tran Inform Theory 42:2102–2117
Article MathSciNet MATH Google Scholar
Chang YCI (2013) Maximizing and ROC-type measure via linear combination of markers when the gold reference is continuous. Stat Med 136:1893–1903
Article MathSciNet Google Scholar
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Book Google Scholar
Cozman FG, Cohen I (2002) Unlabeled data can degrade classification performance of generative classifiers. In: Fifteenth International Frolida Artificial Intelligence Society Conference, pp 327–331
Cozman FG, Cohen I, Cirelo MC (2003) Semi-supervised learning of mixture models. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp 99–106
Efron B (1975) The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc 70:892–898
Article MathSciNet MATH Google Scholar
Eguchi S, Copas J (2002) A class of logistic-type discriminant functions. Biometrika 1:1–22
Article MathSciNet MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Fujisawa H (2006) Robust estimation in the normal mixture model. J Stat Plann Inference 136:3989–4011
Article MathSciNet MATH Google Scholar
Hanley JA, McNeil B (1982) The meaning and use of the area under the receiver operating characteristic (ROC) curve. Radiology 143:29–36
Article Google Scholar
Hayashi K, Takai K (2015) Finite-sample analysis of impacts of unlabelled data and their labelling mechanisms in linear discriminant analysis. Communications in Statistics—Simulation and Computation (in press). doi:10.1080/03610918.2014.957847
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Prentice Hall, Upper Saddle River
MATH Google Scholar
Kawakita M, Kanamori T (2013) Semi-supervised learning with density-ratio estimation. Mach Learn 91:189–209
Article MathSciNet MATH Google Scholar
Komori O (2011) A boosting method for maximization of the area under the ROC curve. Ann Inst Stat Math 63:961–979
Article MathSciNet MATH Google Scholar
Lehmann EL (1999) Elements of large sample theory. Springer, New York
Book MATH Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Ma S, Huang J (2005) Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 73:821–826
Google Scholar
Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York
MATH Google Scholar
McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition, 2nd edn. Wiley, New York
MATH Google Scholar
McLachlan GJ, Scot D (1995) Asymptotic relative efficiency of the linear discriminant function under partial nonrandom classification of the training data. J Stat Comp Simul 52:415–426
Article MathSciNet MATH Google Scholar
Oba S, Ishii S (2006) Semi-supervised discovery of differential genes. BMC Bioinform 7:1–13
Article Google Scholar
O’Neill TJ (1978) Normal discrimination with unclassified observations. J Am Stat Assoc 73:821–826
Article MathSciNet MATH Google Scholar
Pepe MS, Thompson ML (2000) Combining diagnostic test results to increasing accuracy. Biostatistics 1:123–140
Article MATH Google Scholar
Rosset S, Zhu J, Zou H, Hastie T (2005) A method for inferring label sampling mechanisms in semi-supervised learning. In: Advances in Neural Information Processing Systems, 17, MIT Press Cambridge, MA
Sokolovska N, Cappé O, Yvon F (2008) The asymptotics of semi-supervised learning in discriminative probabilistic models. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), pp 984–991
Su JQ, Liu JS (1993) Linear combinations of multiple diagnostic markers. J Am Stat Assoc 88:1350–1355
Article MathSciNet MATH Google Scholar
Takai K, Hayashi K (2014) Effects of unlabeled data on classification error in normal discriminant analysis. J Stat Plann Inference 147:66–83
Article MathSciNet MATH Google Scholar
Takai K, Kano Y (2013) Asymptotic inference with incomplete data. Commun Stat Theor Methods 42:2474–2490
Article MathSciNet MATH Google Scholar
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML), pp 912–919
Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan & Claypool Press, San Rafael
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
Kenichi Hayashi

Authors

Kenichi Hayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenichi Hayashi.

Appendices

Appendix A: Asymptotic covariance matrix of $\hat{{\varvec{\vartheta }}}_\mathrm{C,Ind}$

In the appendices, we give the details of the asymptotic covariance matrices of estimators of ${\varvec{\vartheta }}$ based on four score functions. First, we derive the asymptotic covariance matrix of the estimator of ${\varvec{\theta }}$ based on $S_\mathrm{C,Ind}({\varvec{\theta }})$. This is obtained by calculating $\mathrm{E}\left[ \frac{\partial ^2}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}'}S_\mathrm{C,Ind}({\varvec{\theta }})\right] $ and taking its inverse. Denote this matrix by $\bar{{\varvec{\Lambda }}}_\mathrm{C,Ind}$. Then, ${\varvec{\Lambda }}_\mathrm{C,Ind}$, appearing in the equation for ARE$_\mathrm{Ind}$, is obtained by eliminating the first column and row of $\bar{{\varvec{\Lambda }}}_\mathrm{C,Ind}$. The other asymptotic covariance matrices ${\varvec{\Lambda }}_\mathrm{P,Ind}$, ${\varvec{\Lambda }}_\mathrm{C,Dep}$ and ${\varvec{\Lambda }}_\mathrm{P,Dep}$ are obtained in the same manner.

Because the feature-independent labeling mechanism indicates that the labeling probability $\mathrm{P}[R=1|{\varvec{x}};{\varvec{\phi }}]=\mathrm{P}[R=1;{\varvec{\phi }}]=\gamma $ is the same for all ${\varvec{x}}$, the expectation above is equivalent to the standard argument of the Fisher information up to the factor $\gamma $ (e.g., see Magnus and Neudecker 1999). Therefore, the asymptotic variance is analytically represented as

$$\begin{aligned} \bar{{\varvec{\Lambda }}}_\mathrm{C,Ind} = \mathbf{L}_\mathrm{Ind}^{-1} , \end{aligned}$$

where $\mathbf{L}_\mathrm{Ind}$ is the matrix defined as

$$\begin{aligned} \frac{1}{\gamma } \begin{pmatrix} (\pi _1\pi _0)^{-1} &{} \mathbf {0}^T &{} \mathbf {0}^T &{} \mathbf {0}^T &{} \mathbf {0}^T\\ \mathbf {0} &{} \pi _1{\varvec{\Sigma }}_1^{-1} &{} O &{} O &{} O\\ \mathbf {0} &{} O &{} \displaystyle \frac{\pi _1}{2}{} \mathbf{D}'({\varvec{\Sigma }}_1\otimes {\varvec{\Sigma }}_1)^{-1}{} \mathbf{D} &{} O &{} O\\ \mathbf {0} &{} O &{} O &{} \pi _0{\varvec{\Sigma }}_0^{-1} &{} O \\ \mathbf {0} &{} O &{} O &{} O &{} \displaystyle \frac{\pi _0}{2}{} \mathbf{D}'({\varvec{\Sigma }}_0\otimes {\varvec{\Sigma }}_0)^{-1}{} \mathbf{D} \end{pmatrix} . \end{aligned}$$

Clearly $\bar{{\varvec{\Lambda }}}_\mathrm{C,Ind}$ is non-singular unless either ${\varvec{\Sigma }}_1$ or ${\varvec{\Sigma }}_0$ is singular.

Appendix B: Asymptotic covariance matrix of $\hat{{\varvec{\vartheta }}}_\mathrm{P,Ind}$

We derive the asymptotic covariance matrix of estimator $\hat{{\varvec{\vartheta }}}_\mathrm{P,Ind}$ under the feature-independent labeling mechanism. Together with the result in Appendix A, the asymptotic covariance matrix of the above estimator, denoted by $\bar{{\varvec{\Lambda }}}_\mathrm{P,Ind}$, is obtained by taking the expectation of the second derivative of the score function $S_\mathrm{P,Ind}({\varvec{\theta }})$ and taking its inverse. Because the information corresponding to the first term of $S_\mathrm{P,Ind}({\varvec{\theta }})$ has already been given by $\mathbf{L}_\mathrm{Ind}$, it suffices to calculate the information of the unlabeled data. That is,

$$\begin{aligned} \bar{{\varvec{\Lambda }}}_\mathrm{P,Ind} = (\mathbf{L}_\mathrm{Ind}+\mathbf{U}_\mathrm{Ind})^{-1} \end{aligned}$$

with $\mathbf{U}_\mathrm{Ind} = (1-\gamma )\int _{\mathbf {R}^d} {{\varvec{a}}}({\varvec{x}}){{\varvec{a}}}({\varvec{x}})'f({\varvec{x}};{\varvec{\theta }})d{\varvec{x}}$, where

$$\begin{aligned}&{{\varvec{a}}}({\varvec{x}}) = \begin{pmatrix} a_{\bullet }({\varvec{x}}) \\ a_1({\varvec{x}}){\varvec{c}}_1({\varvec{x}}) \\ a_0({\varvec{x}}){\varvec{c}}_0({\varvec{x}}) \end{pmatrix},\\&a_\ell ({\varvec{x}})=\frac{\pi _\ell f_\ell ({\varvec{x}};{\varvec{\theta }})}{\pi _1f_1({\varvec{x}};{\varvec{\theta }})+\pi _0f_0({\varvec{x}};{\varvec{\theta }})},\ \ {\varvec{c}}_\ell ({\varvec{x}}) = \begin{pmatrix} {\varvec{d}}_\ell ({\varvec{x}}) \\ -\frac{1}{2}D'{\varvec{v}}_\ell ({\varvec{x}}) \end{pmatrix} ,\\&{\varvec{d}}_\ell ({\varvec{x}})={\varvec{\Sigma }}_{\ell }^{-1}({\varvec{x}}-{\varvec{\mu }}_\ell ),\ \ {\varvec{v}}_\ell ({\varvec{x}})=\mathrm{vec}({\varvec{\Sigma }}_{\ell }^{-1}-{\varvec{d}}_\ell ({\varvec{x}}){\varvec{d}}_\ell ({\varvec{x}})'),\ \ \ell =1,0\\&\mathrm{and}\ \ a_{\bullet }({\varvec{x}}) = \frac{a_1({\varvec{x}})}{\pi _1}-\frac{a_0({\varvec{x}})}{\pi _0}. \end{aligned}$$

The form of $\mathbf{U}_\mathrm{Ind}$ is equivalent to a result given in Boldea and Magnus (2009) up to constant $(1-\gamma )$. Because integration in $\mathbf{U}_\mathrm{Ind}$ cannot be expressed in an analytic form, the Monte Carlo approximation is needed for its calculation.

Appendix C: Asymptotic covariance matrix of $\hat{{\varvec{\vartheta }}}_\mathrm{C,Dep}$

As well as the feature-independent labeling mechanism, we derive the asymptotic covariance matrix of the estimator of ${\varvec{\theta }}$ based on $S_\mathrm{C,Dep}({\varvec{\theta }})$ under the feature-dependent labeling mechanism. This situation differs from that for the feature-independent labeling mechanism in that term $1/\mathrm{P}[R=1|{\varvec{x}};{\varvec{\phi }}]$ is included in the score functions. As the labeling probability $\mathrm{P}[R=1|{\varvec{x}};{\varvec{\phi }}]$ is not constant for the feature-dependent labeling mechanisms, the information matrices do not have an explicit representation, unlike $\mathbf{L}_\mathrm{Ind}$. Therefore, a computation method such as the Monte Carlo integral is needed to calculate these information matrices.

The asymptotic variance of estimator $\hat{{\varvec{\vartheta }}}_\mathrm{C,Dep}$ is obtained as follows:

$$\begin{aligned} \bar{{\varvec{\Lambda }}}_\mathrm{C,Dep} = \mathbf{L}_\mathrm{Dep}^{-1}, \end{aligned}$$

where $\mathbf{L}_\mathrm{Dep}= \int _{\mathbf {R}^d}\left\{ \pi _1f_1({\varvec{x}};{\varvec{\theta }})\mathbf{B}_{1}({\varvec{x}})+\pi _0f_0({\varvec{x}};{\varvec{\theta }})\mathbf{B}_{0}({\varvec{x}}) \right\} d{\varvec{x}}$,

$\mathbf{B}_{\ell }({\varvec{x}}) = \displaystyle \frac{{{\varvec{b}}_\ell ({\varvec{x}}){\varvec{b}}_\ell ({\varvec{x}})'}}{\mathrm{P}[R=1|{\varvec{x}};{\varvec{\phi }}]}$, $\ell =0,1$, ${\varvec{b}}_{1}({\varvec{x}}) = \begin{pmatrix} \pi _1^{-1}\\ {\varvec{d}}_1({\varvec{x}}) \\ -\frac{1}{2}{\varvec{v}}_1({\varvec{x}})'{} \mathbf{D} \\ {\varvec{0}}_{d} \\ {\varvec{0}}_{d(d+1)/2} \end{pmatrix}$, ${\varvec{b}}_{0}({\varvec{x}}) = \begin{pmatrix} -\pi _0^{-1}\\ {\varvec{0}}_{d} \\ {\varvec{0}}_{d(d+1)/2} \\ {\varvec{d}}_0({\varvec{x}}) \\ -\frac{1}{2}{\varvec{v}}_0({\varvec{x}})'{} \mathbf{D} \end{pmatrix}$ and ${\varvec{0}}_d$ is the d-dimensional zero vector.

Appendix D: Asymptotic covariance matrix of $\hat{{\varvec{\vartheta }}}_\mathrm{P,Dep}$

Besides the direct calculation of the asymptotic covariance matrix of estimator $\hat{{\varvec{\vartheta }}}_\mathrm{P,Dep}$, it suffices to calculate the expectation of the second derivative of the second term on the right-hand side of $S_\mathrm{P,Dep}({\varvec{\theta }})$. Then, the asymptotic covariance matrix, denoted by $\bar{{\varvec{\Lambda }}}_\mathrm{P,Dep}$, is obtained from

$$\begin{aligned} \bar{{\varvec{\Lambda }}}_\mathrm{P,Dep} = (\mathbf{L}_\mathrm{Dep}+\mathbf{U}_\mathrm{Dep})^{-1}, \end{aligned}$$

where $\mathbf{U}_\mathrm{Dep} = \int _{\mathbf {R}^d} \frac{{{\varvec{a}}}({\varvec{x}}){{\varvec{a}}}({\varvec{x}})'}{\mathrm{P}[R=0|{\varvec{x}};{\varvec{\phi }}]}f({\varvec{x}};{\varvec{\theta }})d{\varvec{x}}. $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hayashi, K. Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations. Adv Data Anal Classif 12, 315–339 (2018). https://doi.org/10.1007/s11634-016-0266-6

Download citation

Received: 08 January 2014
Revised: 07 July 2016
Accepted: 18 July 2016
Published: 27 July 2016
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11634-016-0266-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations

Abstract

Access this article

Similar content being viewed by others

Semi-supervised Uncertain Linear Discriminant Analysis

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

An Asymptotic Expansion for the Distribution of Euclidean Distance-Based Discriminant Function in Normal Populations

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{C,Ind}\)

Appendix B: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{P,Ind}\)

Appendix C: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{C,Dep}\)

Appendix D: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{P,Dep}\)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations

Abstract

Access this article

Similar content being viewed by others

Semi-supervised Uncertain Linear Discriminant Analysis

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

An Asymptotic Expansion for the Distribution of Euclidean Distance-Based Discriminant Function in Normal Populations

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{C,Ind}\)

Appendix B: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{P,Ind}\)

Appendix C: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{C,Dep}\)

Appendix D: Asymptotic covariance matrix of \(\hat{{\varvec{\vartheta }}}_\mathrm{P,Dep}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation