When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.
Albert, A. (1972). Regression and the Moore-Penrose pseudoinverse. San Diego: Academic Press.
Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.
Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., & van der Vorst, H. (Eds.) (2000). Templates for the solution of eigenvalue problems: a practical guide. Philadelphia: Society for Industrial and Applied Mathematics.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396.
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Cai, D., He, X., & Han, J. (2007). Semi-supervised discriminant analysis. In Proceedings of the IEEE international conference on computer vision (pp. 1–7), Rio de Janeiro, Brazil.
Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press.
Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Society.
Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In The 27th annual international ACM SIGIR conference (pp. 250–257), Sheffield, UK.
Duffy, N., & Collins, M. (2002). Convolution kernels for natural language. In Advances in neural information processing systems (Vol. 14, pp. 625–632). Cambridge: MIT Press.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175.
Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press.
Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5, S268–S275.
Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 129–143).
Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Advances in neural information processing systems (Vol. 18, pp. 451–458). Cambridge: MIT Press.
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In Advances in neural information processing systems (Vol. 17, pp. 513–520). Cambridge: MIT Press.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (Vol. 16, pp. 153–160). Cambridge: MIT Press.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507.
Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Dordrecht: Kluwer Academic.
Jolliffe, I. T. (1986). Principal component analysis. New York: Springer.
Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the nineteenth international conference on machine learning (pp. 291–298). San Mateo: Morgan Kaufmann.
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference on machine learning (pp. 321–328). San Mateo: Morgan Kaufmann.
Kohavi, R., & John, G. (1997). Wrappers for feature selection. Artificial Intelligence, 97, 273–324.
Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the nineteenth international conference on machine learning (pp. 315–322).
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.
Loog, M. (2007). A complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 8, 2121–2123.
Loog, M. (2008). On the equivalence of linear dimensionality-reducing transformations. Journal of Machine Learning Research, 9, 2489–2490.
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., & Müller, K.-R. (2003). Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 623–628.
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge: MIT Press.
Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227–244.
Song, Y., Nie, F., Zhang, C., & Xiang, S. (2008). A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 41, 2789–2799.
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061.
Sugiyama, M., Krauledat, M., & Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8, 985–1005.
Sugiyama, M., Ide, T., Nakajima, S., & Sese, J. (2008). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. In Advances in knowledge discovery and data mining (pp. 333–344). Berlin: Springer.
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.
Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (Vol. 18, pp. 1473–1480). Cambridge: MIT Press.
Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (Vol. 15, pp. 505–512). Cambridge: MIT Press.
Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502.
Ye, J. (2008). Comments on the complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 9, 517–519.
Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on machine learning (pp. 903–910). New York: ACM.
Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. In Advances in neural information processing systems (Vol. 17, pp. 1601–1608). Cambridge: MIT Press.
Zhang, D., Zhou, Z.-H., & Chen, S. (2007). Semi-supervised dimensionality reduction. In Proceedings of the 7th SIAM international conference on data mining (pp. 629–634), Minneapolis, MN, USA.
A preliminary version of this paper was previously published in Sugiyama et al. (2008). A MATLAB implementation of the proposed dimensionality reduction method SELF is available from http://sugiyama-www.cs.titech.ac.jp/~sugi/software/SELF.
Editor: Roni Khardon.
About this article
Cite this article
Sugiyama, M., Idé, T., Nakajima, S. et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach Learn 78, 35 (2010). https://doi.org/10.1007/s10994-009-5125-7
- Semi-supervised learning
- Dimensionality reduction
- Cluster assumption
- Local Fisher discriminant analysis
- Principal component analysis