Semi-supervised local Fisher discriminant analysis for dimensionality reduction
- 2.5k Downloads
- 155 Citations
Abstract
When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.
Keywords
Semi-supervised learning Dimensionality reduction Cluster assumption Local Fisher discriminant analysis Principal component analysisReferences
- Albert, A. (1972). Regression and the Moore-Penrose pseudoinverse. San Diego: Academic Press. zbMATHGoogle Scholar
- Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404. zbMATHCrossRefMathSciNetGoogle Scholar
- Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., & van der Vorst, H. (Eds.) (2000). Templates for the solution of eigenvalue problems: a practical guide. Philadelphia: Society for Industrial and Applied Mathematics. zbMATHGoogle Scholar
- Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396. zbMATHCrossRefGoogle Scholar
- Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434. MathSciNetGoogle Scholar
- Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
- Cai, D., He, X., & Han, J. (2007). Semi-supervised discriminant analysis. In Proceedings of the IEEE international conference on computer vision (pp. 1–7), Rio de Janeiro, Brazil. Google Scholar
- Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press. Google Scholar
- Chung, F. R. K. (1997). Spectral graph theory. Providence: American Mathematical Society. zbMATHGoogle Scholar
- Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In The 27th annual international ACM SIGIR conference (pp. 250–257), Sheffield, UK. Google Scholar
- Duffy, N., & Collins, M. (2002). Convolution kernels for natural language. In Advances in neural information processing systems (Vol. 14, pp. 625–632). Cambridge: MIT Press. Google Scholar
- Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188. Google Scholar
- Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175. CrossRefMathSciNetGoogle Scholar
- Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press. zbMATHGoogle Scholar
- Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5, S268–S275. CrossRefGoogle Scholar
- Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 129–143). Google Scholar
- Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Advances in neural information processing systems (Vol. 18, pp. 451–458). Cambridge: MIT Press. Google Scholar
- Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In Advances in neural information processing systems (Vol. 17, pp. 513–520). Cambridge: MIT Press. Google Scholar
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. zbMATHCrossRefGoogle Scholar
- He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (Vol. 16, pp. 153–160). Cambridge: MIT Press. Google Scholar
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507. CrossRefMathSciNetGoogle Scholar
- Joachims, T. (2002). Learning to classify text using support vector machines: methods, theory and algorithms. Dordrecht: Kluwer Academic. Google Scholar
- Jolliffe, I. T. (1986). Principal component analysis. New York: Springer. Google Scholar
- Kashima, H., & Koyanagi, T. (2002). Kernels for semi-structured data. In Proceedings of the nineteenth international conference on machine learning (pp. 291–298). San Mateo: Morgan Kaufmann. Google Scholar
- Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference on machine learning (pp. 321–328). San Mateo: Morgan Kaufmann. Google Scholar
- Kohavi, R., & John, G. (1997). Wrappers for feature selection. Artificial Intelligence, 97, 273–324. zbMATHCrossRefGoogle Scholar
- Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the nineteenth international conference on machine learning (pp. 315–322). Google Scholar
- Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444. zbMATHCrossRefGoogle Scholar
- Loog, M. (2007). A complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 8, 2121–2123. MathSciNetGoogle Scholar
- Loog, M. (2008). On the equivalence of linear dimensionality-reducing transformations. Journal of Machine Learning Research, 9, 2489–2490. MathSciNetGoogle Scholar
- Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., & Müller, K.-R. (2003). Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 623–628. CrossRefGoogle Scholar
- Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge: MIT Press. Google Scholar
- Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320. zbMATHCrossRefGoogle Scholar
- Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. CrossRefGoogle Scholar
- Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319. CrossRefGoogle Scholar
- Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227–244. zbMATHCrossRefMathSciNetGoogle Scholar
- Song, Y., Nie, F., Zhang, C., & Xiang, S. (2008). A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 41, 2789–2799. zbMATHCrossRefGoogle Scholar
- Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061. Google Scholar
- Sugiyama, M., Krauledat, M., & Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8, 985–1005. Google Scholar
- Sugiyama, M., Ide, T., Nakajima, S., & Sese, J. (2008). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. In Advances in knowledge discovery and data mining (pp. 333–344). Berlin: Springer. CrossRefGoogle Scholar
- Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323. CrossRefGoogle Scholar
- Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (Vol. 18, pp. 1473–1480). Cambridge: MIT Press. Google Scholar
- Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems (Vol. 15, pp. 505–512). Cambridge: MIT Press. Google Scholar
- Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502. Google Scholar
- Ye, J. (2008). Comments on the complete characterization of a family of solutions to a generalized Fisher criterion. Journal of Machine Learning Research, 9, 517–519. Google Scholar
- Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on machine learning (pp. 903–910). New York: ACM. Google Scholar
- Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. In Advances in neural information processing systems (Vol. 17, pp. 1601–1608). Cambridge: MIT Press. Google Scholar
- Zhang, D., Zhou, Z.-H., & Chen, S. (2007). Semi-supervised dimensionality reduction. In Proceedings of the 7th SIAM international conference on data mining (pp. 629–634), Minneapolis, MN, USA. Google Scholar