Semi-supervised local Fisher discriminant analysis for dimensionality reduction

Abstract

When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.

References

  1. Albert, A. (1972). Regression and the Moore-Penrose pseudoinverse. San Diego: Academic Press.

    Google Scholar 

  2. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.

    MATH  Article  MathSciNet  Google Scholar 

  3. Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., & van der Vorst, H. (Eds.) (2000). Templates for the solution of eigenvalue problems: a practical guide. Philadelphia: Society for Industrial and Applied Mathematics.

    Google Scholar 

  4. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396.

    MATH  Article  Google Scholar 

  5. Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.

    MathSciNet  Google Scholar 

  6. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Google Scholar 

  7. Cai, D., He, X., & Han, J. (2007). Semi-supervised discriminant analysis. In Proceedings of the IEEE international conference on computer vision (pp. 1–7), Rio de Janeiro, Brazil.

  8. Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press.

    Google Scholar 

  9. Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Society.

    Google Scholar 

  10. Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In The 27th annual international ACM SIGIR conference (pp. 250–257), Sheffield, UK.

  11. Duffy, N., & Collins, M. (2002). Convolution kernels for natural language. In Advances in neural information processing systems (Vol. 14, pp. 625–632). Cambridge: MIT Press.

    Google Scholar 

  12. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Google Scholar 

  13. Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175.

    Article  MathSciNet  Google Scholar 

  14. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press.

    Google Scholar 

  15. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5, S268–S275.

    Article  Google Scholar 

  16. Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 129–143).

  17. Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Advances in neural information processing systems (Vol. 18, pp. 451–458). Cambridge: MIT Press.

    Google Scholar 

  18. Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In Advances in neural information processing systems (Vol. 17, pp. 513–520). Cambridge: MIT Press.

    Google Scholar 

  19. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    MATH  Article  Google Scholar 

  20. He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (Vol. 16, pp. 153–160). Cambridge: MIT Press.

    Google Scholar 

  21. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507.

    Article  MathSciNet  Google Scholar 

  22. Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Dordrecht: Kluwer Academic.

    Google Scholar 

  23. Jolliffe, I. T. (1986). Principal component analysis. New York: Springer.

    Google Scholar 

  24. Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the nineteenth international conference on machine learning (pp. 291–298). San Mateo: Morgan Kaufmann.

    Google Scholar 

  25. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference on machine learning (pp. 321–328). San Mateo: Morgan Kaufmann.

    Google Scholar 

  26. Kohavi, R., & John, G. (1997). Wrappers for feature selection. Artificial Intelligence, 97, 273–324.

    MATH  Article  Google Scholar 

  27. Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the nineteenth international conference on machine learning (pp. 315–322).

  28. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.

    MATH  Article  Google Scholar 

  29. Loog, M. (2007). A complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 8, 2121–2123.

    MathSciNet  Google Scholar 

  30. Loog, M. (2008). On the equivalence of linear dimensionality-reducing transformations. Journal of Machine Learning Research, 9, 2489–2490.

    MathSciNet  Google Scholar 

  31. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., & Müller, K.-R. (2003). Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 623–628.

    Article  Google Scholar 

  32. Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge: MIT Press.

    Google Scholar 

  33. Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.

    MATH  Article  Google Scholar 

  34. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.

    Article  Google Scholar 

  35. Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.

    Article  Google Scholar 

  36. Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227–244.

    MATH  Article  MathSciNet  Google Scholar 

  37. Song, Y., Nie, F., Zhang, C., & Xiang, S. (2008). A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 41, 2789–2799.

    MATH  Article  Google Scholar 

  38. Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061.

    Google Scholar 

  39. Sugiyama, M., Krauledat, M., & Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8, 985–1005.

    Google Scholar 

  40. Sugiyama, M., Ide, T., Nakajima, S., & Sese, J. (2008). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. In Advances in knowledge discovery and data mining (pp. 333–344). Berlin: Springer.

    Google Scholar 

  41. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.

    Article  Google Scholar 

  42. Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (Vol. 18, pp. 1473–1480). Cambridge: MIT Press.

    Google Scholar 

  43. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (Vol. 15, pp. 505–512). Cambridge: MIT Press.

    Google Scholar 

  44. Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502.

    Google Scholar 

  45. Ye, J. (2008). Comments on the complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 9, 517–519.

    Google Scholar 

  46. Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on machine learning (pp. 903–910). New York: ACM.

    Google Scholar 

  47. Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. In Advances in neural information processing systems (Vol. 17, pp. 1601–1608). Cambridge: MIT Press.

    Google Scholar 

  48. Zhang, D., Zhou, Z.-H., & Chen, S. (2007). Semi-supervised dimensionality reduction. In Proceedings of the 7th SIAM international conference on data mining (pp. 629–634), Minneapolis, MN, USA.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Masashi Sugiyama.

Additional information

A preliminary version of this paper was previously published in Sugiyama et al. (2008). A MATLAB implementation of the proposed dimensionality reduction method SELF is available from http://sugiyama-www.cs.titech.ac.jp/~sugi/software/SELF.

Editor: Roni Khardon.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sugiyama, M., Idé, T., Nakajima, S. et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach Learn 78, 35 (2010). https://doi.org/10.1007/s10994-009-5125-7

Download citation

Keywords

  • Semi-supervised learning
  • Dimensionality reduction
  • Cluster assumption
  • Local Fisher discriminant analysis
  • Principal component analysis