The authors are to be congratulated for their solid contribution in providing a powerful method that handles classification and dimension reduction problems with functional data sets. This type of problems has drawn much attention in the literature, and is known to be difficult due to the complex structure of the corresponding data sets. In particular, many existing dimension reduction methods ignore the relationship between predictors and labels, and perform dimension reduction only using the covariates. Such procedures can be suboptimal and may lead to unstable results, especially when the predictors are sparsely observed. The proposed PEFCS method integrates the observed labels in the dimension reduction step by estimating class-conditional probabilities, and is shown to enjoy more competitive and robust performance in numerical examples.

This interesting paper leads to many promising research directions. For example, class-conditional probability (we denote it by \(P_{j}(\hat{X}_i) = {\text {pr}}(Y=j \mid \hat{X}_i)\)) estimation is a crucial step in the PEFCS method. In the literature, it is known that classification methods can be grouped into two main categories: soft and hard classifiers (Wahba 2002; Liu et al. 2011). Soft classifiers directly estimate class-conditional probabilities, which further leads to classification rules. Typical examples of soft classification include Fisher’s LDA and logistic regression. In contrast, hard classifiers bypass direct estimation of probabilities and focus on classification boundary estimation. Typical examples of hard classification include the support vector machine (SVM, Boser et al. 1992; Cortes and Vapnik 1995) and \(\psi \)-learning (Shen et al. 2003). In Liu et al. (2011), it was observed that the classification performance of various classifiers depends heavily on the underlying distribution of (XY). The authors use the hinge loss for the SVM in this paper. Therefore, a possible generalization of the proposed technique is to employ a more general loss function in their equation (5) for probability estimation, instead of using the weighted SVM. We will briefly discuss the idea below.

Consider the optimization problem

$$\begin{aligned} \min _{g\in \mathcal {F}_K} \sum _{i=1}^n \ell \left\{ Y_i g \left( \hat{X}_i\right) \right\} + \lambda \Vert g\Vert _{\mathcal {F}_K}^2, \end{aligned}$$
(1)

where \(\ell (\cdot )\) is a differentiable loss function for a soft classifier. One can verify that \(P_{+1}(\hat{X}_i)\) can be estimated using \(\ell '\{ - \hat{g} (\hat{X}_i)\} / \ell '\{ \hat{g} (\hat{X}_i)\}\) (Liu et al. 2011). For standard classification where the predictors are scalars or vectors, Liu et al. (2011) pointed out that when the underlying class-conditional probability, as a function of the predictors, is relatively smooth, soft classifiers tend to perform better than the hard ones. Moreover, the transition behavior from soft to hard classifiers were thoroughly investigated using the large-margin unified machine family proposed by Liu et al. (2011). For functional data classification, the comparison between soft and hard classifiers and the corresponding transition behavior are largely unknown, and further exploration in this direction can be very interesting.

Another potential research topic is to extend the PEFCS methodology to handle multicategory problems. In this case, the construction of slices in the EDR method becomes more involved. In particular, when \(Y\in \{+1,-1\}\), only one direction of the EDR space can be recovered, because of the existence of homogeneity in learning problems with binary responses. To overcome this difficulty, Shin et al. (2014) proposed to construct slices based on \(P_{+1}(\hat{X}_i)\). In multicategory classification, estimation of the class-conditional probabilities becomes more complex, as one needs to estimate a probability vector \(\{P_{1}(\hat{X}_i), P_{2}(\hat{X}_i),\ldots , P_{k}(\hat{X}_i)\}\). Furthermore, how to construct \(S_{(P_1,P_2,\ldots ,P_k) \mid \varvec{X}}\) remains unclear. Therefore, it can be interesting and challenging to develop new methodology in this future research direction. Next, we provide one possible way to generalize the PEFCS methodology for multicategory problems.

For margin-based classification, when the number of classes is three or larger, one classification function \(g(\cdot )\) is not enough to discriminate all classes. To overcome this difficulty, a common approach in the literature is to use k functions for k classes, and impose a sum-to-zero constraint on the k functions to reduce the parameter space and to ensure some theoretical properties such as Fisher consistency. Recently, Zhang and Liu (2014) suggested that using k functions and the sum-to-zero constraint can be inefficient and suboptimal, and proposed the angle-based large margin classifiers for multicategory classification. In particular, consider a simplex \(\varvec{W}\) with k vertices \(\{\varvec{W}_1,\ldots ,\varvec{W}_k\}\) in a \((k-1)\)-dimensional space, such that

$$\begin{aligned} \varvec{W}_j=\left\{ \begin{array}{ll} (k-1)^{-1/2}\varvec{1}_{k-1}, &{} ~~~ j=1,\\ -\left( 1+k^{1/2}\right) / \{(k-1)^{3/2}\} \varvec{1}_{k-1}+\left\{ k/(k-1)\right\} ^{1/2}\varvec{e}_{j-1}, &{} ~~~ 2 \le j \le k, \end{array} \right. \end{aligned}$$

where \(\varvec{1}_{k-1}\) is a vector of 1’s with length \(k-1\), and \(\varvec{e}_j \in \mathbb {R}^{k-1}\) is a vector with the jth element 1 and 0 elsewhere. In angle-based classification, one uses a \((k-1)\)-dimensional classification function \(\varvec{f}= (f_1,\ldots ,f_{k-1})^T\), which maps \(\varvec{x}\) to \(\varvec{f}(\varvec{x}) \in \mathbb {R}^{k-1}\), where \(\varvec{x}\) is the predictor vector. Observe that \(\varvec{f}\) introduces k angles with respect to \(\varvec{W}_1,\ldots ,\varvec{W}_k\), namely, \(\angle (\varvec{f},W_j); \ j=1,\ldots ,k\). The prediction rule is based on which angle is the smallest. In particular, \(\hat{y}(\varvec{x}) = \mathop {\mathrm{argmin}}_{j \in \{1,\ldots ,k\}} \angle (\varvec{f},W_j)\), where \(\hat{y}(\varvec{x})\) is the predicted label for \(\varvec{x}\). Based on the observation that

$$\begin{aligned} \mathop {\mathrm{argmin}}_{j \in \{1,\ldots ,k\}} \angle (\varvec{f},W_j) = \mathop {\mathrm{argmax}}_{j \in \{1,\ldots ,k\}} \langle \varvec{f}, \varvec{W}_j \rangle , \end{aligned}$$

Zhang and Liu (2014) proposed the following optimization problem for the angle-based classifier

$$\begin{aligned} \min \sum _{i=1}^n \ell \{ \langle \varvec{W}_{y_i},\varvec{f}(\varvec{x}_i) \rangle \} + \lambda J(\varvec{f}), \end{aligned}$$
(2)

where \(\ell (\cdot )\) is a binary margin-based surrogate loss function, \(J(\varvec{f})\) is a penalty on \(\varvec{f}\) to prevent overfitting, and \(\lambda \) is a tuning parameter to balance the goodness of fit and the model complexity. One advantage of the angle-based classifier is that it is free of the commonly used sum-to-zero constraint, hence it can be more efficient for learning with big data sets. Thus, generalization of the PEFCS method in the angle-based framework should be feasible and promising.