Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data

Zhang, Chong; Liu, Yufeng

doi:10.1007/s11749-015-0474-y

Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data

Discussion
Published: 25 January 2016

Volume 25, pages 44–46, (2016)
Cite this article

Download PDF

TEST Aims and scope Submit manuscript

Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data

Download PDF

Chong Zhang¹ &
Yufeng Liu^2,3,4,5

1374 Accesses
Explore all metrics

The authors are to be congratulated for their solid contribution in providing a powerful method that handles classification and dimension reduction problems with functional data sets. This type of problems has drawn much attention in the literature, and is known to be difficult due to the complex structure of the corresponding data sets. In particular, many existing dimension reduction methods ignore the relationship between predictors and labels, and perform dimension reduction only using the covariates. Such procedures can be suboptimal and may lead to unstable results, especially when the predictors are sparsely observed. The proposed PEFCS method integrates the observed labels in the dimension reduction step by estimating class-conditional probabilities, and is shown to enjoy more competitive and robust performance in numerical examples.

This interesting paper leads to many promising research directions. For example, class-conditional probability (we denote it by $P_{j}(\hat{X}_i) = {\text {pr}}(Y=j \mid \hat{X}_i)$) estimation is a crucial step in the PEFCS method. In the literature, it is known that classification methods can be grouped into two main categories: soft and hard classifiers (Wahba 2002; Liu et al. 2011). Soft classifiers directly estimate class-conditional probabilities, which further leads to classification rules. Typical examples of soft classification include Fisher’s LDA and logistic regression. In contrast, hard classifiers bypass direct estimation of probabilities and focus on classification boundary estimation. Typical examples of hard classification include the support vector machine (SVM, Boser et al. 1992; Cortes and Vapnik 1995) and $\psi $-learning (Shen et al. 2003). In Liu et al. (2011), it was observed that the classification performance of various classifiers depends heavily on the underlying distribution of (X, Y). The authors use the hinge loss for the SVM in this paper. Therefore, a possible generalization of the proposed technique is to employ a more general loss function in their equation (5) for probability estimation, instead of using the weighted SVM. We will briefly discuss the idea below.

Consider the optimization problem

$$\begin{aligned} \min _{g\in \mathcal {F}_K} \sum _{i=1}^n \ell \left\{ Y_i g \left( \hat{X}_i\right) \right\} + \lambda \Vert g\Vert _{\mathcal {F}_K}^2, \end{aligned}$$

(1)

where $\ell (\cdot )$ is a differentiable loss function for a soft classifier. One can verify that $P_{+1}(\hat{X}_i)$ can be estimated using $\ell '\{ - \hat{g} (\hat{X}_i)\} / \ell '\{ \hat{g} (\hat{X}_i)\}$ (Liu et al. 2011). For standard classification where the predictors are scalars or vectors, Liu et al. (2011) pointed out that when the underlying class-conditional probability, as a function of the predictors, is relatively smooth, soft classifiers tend to perform better than the hard ones. Moreover, the transition behavior from soft to hard classifiers were thoroughly investigated using the large-margin unified machine family proposed by Liu et al. (2011). For functional data classification, the comparison between soft and hard classifiers and the corresponding transition behavior are largely unknown, and further exploration in this direction can be very interesting.

Another potential research topic is to extend the PEFCS methodology to handle multicategory problems. In this case, the construction of slices in the EDR method becomes more involved. In particular, when $Y\in \{+1,-1\}$, only one direction of the EDR space can be recovered, because of the existence of homogeneity in learning problems with binary responses. To overcome this difficulty, Shin et al. (2014) proposed to construct slices based on $P_{+1}(\hat{X}_i)$. In multicategory classification, estimation of the class-conditional probabilities becomes more complex, as one needs to estimate a probability vector $\{P_{1}(\hat{X}_i), P_{2}(\hat{X}_i),\ldots , P_{k}(\hat{X}_i)\}$. Furthermore, how to construct $S_{(P_1,P_2,\ldots ,P_k) \mid \varvec{X}}$ remains unclear. Therefore, it can be interesting and challenging to develop new methodology in this future research direction. Next, we provide one possible way to generalize the PEFCS methodology for multicategory problems.

For margin-based classification, when the number of classes is three or larger, one classification function $g(\cdot )$ is not enough to discriminate all classes. To overcome this difficulty, a common approach in the literature is to use k functions for k classes, and impose a sum-to-zero constraint on the k functions to reduce the parameter space and to ensure some theoretical properties such as Fisher consistency. Recently, Zhang and Liu (2014) suggested that using k functions and the sum-to-zero constraint can be inefficient and suboptimal, and proposed the angle-based large margin classifiers for multicategory classification. In particular, consider a simplex $\varvec{W}$ with k vertices $\{\varvec{W}_1,\ldots ,\varvec{W}_k\}$ in a $(k-1)$-dimensional space, such that

$$\begin{aligned} \varvec{W}_j=\left\{ \begin{array}{ll} (k-1)^{-1/2}\varvec{1}_{k-1}, &{} ~~~ j=1,\\ -\left( 1+k^{1/2}\right) / \{(k-1)^{3/2}\} \varvec{1}_{k-1}+\left\{ k/(k-1)\right\} ^{1/2}\varvec{e}_{j-1}, &{} ~~~ 2 \le j \le k, \end{array} \right. \end{aligned}$$

where $\varvec{1}_{k-1}$ is a vector of 1’s with length $k-1$, and $\varvec{e}_j \in \mathbb {R}^{k-1}$ is a vector with the jth element 1 and 0 elsewhere. In angle-based classification, one uses a $(k-1)$-dimensional classification function $\varvec{f}= (f_1,\ldots ,f_{k-1})^T$, which maps $\varvec{x}$ to $\varvec{f}(\varvec{x}) \in \mathbb {R}^{k-1}$, where $\varvec{x}$ is the predictor vector. Observe that $\varvec{f}$ introduces k angles with respect to $\varvec{W}_1,\ldots ,\varvec{W}_k$, namely, $\angle (\varvec{f},W_j); \ j=1,\ldots ,k$. The prediction rule is based on which angle is the smallest. In particular, $\hat{y}(\varvec{x}) = \mathop {\mathrm{argmin}}_{j \in \{1,\ldots ,k\}} \angle (\varvec{f},W_j)$, where $\hat{y}(\varvec{x})$ is the predicted label for $\varvec{x}$. Based on the observation that

$$\begin{aligned} \mathop {\mathrm{argmin}}_{j \in \{1,\ldots ,k\}} \angle (\varvec{f},W_j) = \mathop {\mathrm{argmax}}_{j \in \{1,\ldots ,k\}} \langle \varvec{f}, \varvec{W}_j \rangle , \end{aligned}$$

Zhang and Liu (2014) proposed the following optimization problem for the angle-based classifier

$$\begin{aligned} \min \sum _{i=1}^n \ell \{ \langle \varvec{W}_{y_i},\varvec{f}(\varvec{x}_i) \rangle \} + \lambda J(\varvec{f}), \end{aligned}$$

(2)

where $\ell (\cdot )$ is a binary margin-based surrogate loss function, $J(\varvec{f})$ is a penalty on $\varvec{f}$ to prevent overfitting, and $\lambda $ is a tuning parameter to balance the goodness of fit and the model complexity. One advantage of the angle-based classifier is that it is free of the commonly used sum-to-zero constraint, hence it can be more efficient for learning with big data sets. Thus, generalization of the PEFCS method in the angle-based framework should be feasible and promising.

References

Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Haussler D (ed) Proceedings of the 5th annual workshop on computational learning theory, COLT ’92, Association for Computing Machinery, New York, NY, U.S.A., pp 144–152. doi:10.1145/130385.130401
Cortes C, Vapnik VN (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Liu Y, Zhang HH, Wu Y (2011) Soft or hard classification? Large margin unified machines. J Am Stat Assoc 106:166–177
Article MathSciNet Google Scholar
Shen X, Tseng GC, Zhang X, Wong WH (2003) On $\psi $-learning. J Am Stat Assoc 98:724–734
Article MathSciNet MATH Google Scholar
Shin SJ, Wu Y, Zhang HH, Liu Y (2014) Probability-enhanced sufficient dimension reduction for binary classification. Biometrics 70(3):546–555
Article MathSciNet MATH Google Scholar
Wahba G (2002) Soft and hard classification by reproducing kernel Hilbert space methods. Proc Natl Acad Sci 99(26):16,524–16,530
Article MathSciNet Google Scholar
Zhang C, Liu Y (2014) Multicategory angle-based large-margin classification. Biometrika 101(3):625–640
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors were supported in part by National Science and Engineering Research Council of Canada (NSERC) (Zhang), US NSF Grant DMS-1407241 (Liu) and NIH/NCI Grant R01 CA-149569 (Liu), and National Natural Science Foundation of China (NSFC 61472475) (Liu).

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
Chong Zhang
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Yufeng Liu
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Yufeng Liu
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Yufeng Liu
Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Yufeng Liu

Authors

Chong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yufeng Liu.

Additional information

This comment refers to the invited paper available at: doi:10.1007/s11749-015-0470-2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Liu, Y. Comments on: Probability enhanced effective dimension reduction for classifying sparse functional data. TEST 25, 44–46 (2016). https://doi.org/10.1007/s11749-015-0474-y

Download citation

Published: 25 January 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11749-015-0474-y

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation