Machine Learning

, 78:35 | Cite as

Semi-supervised local Fisher discriminant analysis for dimensionality reduction

  • Masashi Sugiyama
  • Tsuyoshi Idé
  • Shinichi Nakajima
  • Jun Sese
Article

Abstract

When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.

Keywords

Semi-supervised learning Dimensionality reduction Cluster assumption Local Fisher discriminant analysis Principal component analysis 

References

  1. Albert, A. (1972). Regression and the Moore-Penrose pseudoinverse. San Diego: Academic Press. MATHGoogle Scholar
  2. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404. MATHCrossRefMathSciNetGoogle Scholar
  3. Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., & van der Vorst, H. (Eds.) (2000). Templates for the solution of eigenvalue problems: a practical guide. Philadelphia: Society for Industrial and Applied Mathematics. MATHGoogle Scholar
  4. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396. MATHCrossRefGoogle Scholar
  5. Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434. MathSciNetGoogle Scholar
  6. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. MATHGoogle Scholar
  7. Cai, D., He, X., & Han, J. (2007). Semi-supervised discriminant analysis. In Proceedings of the IEEE international conference on computer vision (pp. 1–7), Rio de Janeiro, Brazil. Google Scholar
  8. Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press. Google Scholar
  9. Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Society. MATHGoogle Scholar
  10. Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In The 27th annual international ACM SIGIR conference (pp. 250–257), Sheffield, UK. Google Scholar
  11. Duffy, N., & Collins, M. (2002). Convolution kernels for natural language. In Advances in neural information processing systems (Vol. 14, pp. 625–632). Cambridge: MIT Press. Google Scholar
  12. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188. Google Scholar
  13. Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175. CrossRefMathSciNetGoogle Scholar
  14. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press. MATHGoogle Scholar
  15. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5, S268–S275. CrossRefGoogle Scholar
  16. Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 129–143). Google Scholar
  17. Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Advances in neural information processing systems (Vol. 18, pp. 451–458). Cambridge: MIT Press. Google Scholar
  18. Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In Advances in neural information processing systems (Vol. 17, pp. 513–520). Cambridge: MIT Press. Google Scholar
  19. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. MATHCrossRefGoogle Scholar
  20. He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (Vol. 16, pp. 153–160). Cambridge: MIT Press. Google Scholar
  21. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507. CrossRefMathSciNetGoogle Scholar
  22. Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Dordrecht: Kluwer Academic. Google Scholar
  23. Jolliffe, I. T. (1986). Principal component analysis. New York: Springer. Google Scholar
  24. Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the nineteenth international conference on machine learning (pp. 291–298). San Mateo: Morgan Kaufmann. Google Scholar
  25. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference on machine learning (pp. 321–328). San Mateo: Morgan Kaufmann. Google Scholar
  26. Kohavi, R., & John, G. (1997). Wrappers for feature selection. Artificial Intelligence, 97, 273–324. MATHCrossRefGoogle Scholar
  27. Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the nineteenth international conference on machine learning (pp. 315–322). Google Scholar
  28. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444. MATHCrossRefGoogle Scholar
  29. Loog, M. (2007). A complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 8, 2121–2123. MathSciNetGoogle Scholar
  30. Loog, M. (2008). On the equivalence of linear dimensionality-reducing transformations. Journal of Machine Learning Research, 9, 2489–2490. MathSciNetGoogle Scholar
  31. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., & Müller, K.-R. (2003). Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 623–628. CrossRefGoogle Scholar
  32. Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge: MIT Press. Google Scholar
  33. Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320. MATHCrossRefGoogle Scholar
  34. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. CrossRefGoogle Scholar
  35. Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319. CrossRefGoogle Scholar
  36. Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227–244. MATHCrossRefMathSciNetGoogle Scholar
  37. Song, Y., Nie, F., Zhang, C., & Xiang, S. (2008). A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 41, 2789–2799. MATHCrossRefGoogle Scholar
  38. Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061. Google Scholar
  39. Sugiyama, M., Krauledat, M., & Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8, 985–1005. Google Scholar
  40. Sugiyama, M., Ide, T., Nakajima, S., & Sese, J. (2008). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. In Advances in knowledge discovery and data mining (pp. 333–344). Berlin: Springer. CrossRefGoogle Scholar
  41. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323. CrossRefGoogle Scholar
  42. Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (Vol. 18, pp. 1473–1480). Cambridge: MIT Press. Google Scholar
  43. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (Vol. 15, pp. 505–512). Cambridge: MIT Press. Google Scholar
  44. Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502. Google Scholar
  45. Ye, J. (2008). Comments on the complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 9, 517–519. Google Scholar
  46. Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on machine learning (pp. 903–910). New York: ACM. Google Scholar
  47. Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. In Advances in neural information processing systems (Vol. 17, pp. 1601–1608). Cambridge: MIT Press. Google Scholar
  48. Zhang, D., Zhou, Z.-H., & Chen, S. (2007). Semi-supervised dimensionality reduction. In Proceedings of the 7th SIAM international conference on data mining (pp. 629–634), Minneapolis, MN, USA. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Masashi Sugiyama
    • 1
  • Tsuyoshi Idé
    • 2
  • Shinichi Nakajima
    • 3
  • Jun Sese
    • 4
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyMeguro-kuJapan
  2. 2.IBM Research, Tokyo Research LaboratoryYamato-shiJapan
  3. 3.Nikon CorporationKumagaya-shiJapan
  4. 4.Department of Information ScienceOchanomizu UniversityBunkyoJapan

Personalised recommendations