Geometric Methods for Feature Extraction and Dimensional Reduction

Burges, Christopher J. C.

doi:10.1007/0-387-25465-X_4

Christopher J. C. Burges²

21k Accesses
29 Citations

Abstract

We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a self-contained review of the concepts and mathematics underlying these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Aizerman, E.M. Braverman, and L.I. Rozoner. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.
Google Scholar
P.F. Baldi and K. Hornik. Learning in linear neural networks: A survey. IEEE Transactions on Neural Networks, 6(4):837–858, July 1995.
Article Google Scholar
A. Basilevsky. Statistical Factor Analysis and Related Methods. Wiley, New York, 1994.
Google Scholar
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6): 1373–1396, 2003.
Article Google Scholar
Y. Bengio, J. Paiement, and P. Vincent. Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.
Google Scholar
C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysys on Semigroups. Springer-Verlag, 1984.
Google Scholar
C. M. Bishop. Bayesian PCA. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 382–388, Cambridge, MA, 1999. The MIT Press.
Google Scholar
I. Borg and P. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, 1997.
Google Scholar
B. E. Boser, I. M. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, 1992. ACM.
Google Scholar
C.J.C. Burges. Some Notes on Applied Mathematics for Machine Learning. In O. Bousquet, U. von Luxburg, and G. Ratsch, editors, Advanced Lectures on Machine Learning, pages 21–40. Springer Lecture Notes in Aritificial Intelligence, 2004.
Google Scholar
C.J.C. Burges, J.C. Platt, and S. Jana. Extracting noise-robust features from audio. In Proc. IEEE Conference on Acoustics, Speech and Signal Processing, pages 1021–1024. IEEE Signal Processing Society, 2002.
Google Scholar
C.J.C. Burges, J.C. Platt, and S. Jana. Distortion discriminant analysis for audio fingerprinting. IEEE Transactions on Speech and Audio Processing, 11(3): 165–174, 2003.
Article Google Scholar
F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
Google Scholar
T.F. Cox and M.A.A. Cox., Multidimensional Scaling. Chapman and Hall, 2001.
Google Scholar
R.B. Darlington. Factor analysis. Technical report, Cornell University, http://comp9.psych.cornell.edu/Darlington/factor.htm.
Google Scholar
V. de Silva and J.B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 705–712. MIT Press, 2002.
Google Scholar
P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Annals of Statistics, 12:793–815, 1984.
MathSciNet Google Scholar
K.I. Diamantaras and S.Y. Kung. Principal Component Neural Networks. John Wiley, 1996.
Google Scholar
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John Wiley, 1973.
Google Scholar
C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the Nyström method. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(2), 2004.
Google Scholar
J.H. Friedman and W. Stuetzle. Projection pursuit regression. Journal of the American Statistical Association, 76(376):817–823, 1981.
Article MathSciNet Google Scholar
J.H. Friedman, W. Stuetzle, and A. Schroeder. Projection pursuit density estimation. J. Amer. Statistical Assoc, 79:599–608, 1984.
Article MathSciNet Google Scholar
J.H. Friedman and J.W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, c-23(9):881–890, 1974.
Google Scholar
G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins, third edition, 1996.
Google Scholar
M. Gondran and M. Minoux. Graphs and Algorithms. John Wiley and Sons, 1984.
Google Scholar
I. Guyon. NIPS 2003 workshop on feature extraction: http://clopinet.com/isabelle/Projects/NIPS2003/.
Google Scholar
J. Ham, D.D. Lee, S. Mika, and B. Schölkopf. A kernel view of dimensionality reduction of manifolds. In Proceedings of the International Conference on Machine Learning, 2004.
Google Scholar
T.J. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989.
Article MathSciNet Google Scholar
R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
Google Scholar
P.J. Huber. Projection pursuit. Annals of Statistics, 13(2):435–475, 1985.
MATH MathSciNet Google Scholar
A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley, 2001.
Google Scholar
Y. LeCun and Y. Bengio. Convolutional networks for images, speech and time-series. In M. Arbib, editor, The Handbook of Brain Theory and Neural Networks. MIT Press, 1995.
Google Scholar
M. Meila and J. Shi. Learning segmentation by random walks. In Advances in Neural Information Processing Systems, pages 873–879, 2000.
Google Scholar
S. Mika, B. Schölkopf, A. J. Smola, K.-R. Müller, M. Scholz, and G. Rätsch. Kernel PCA and de-noising in feature spaces. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11. MIT Press, 1999.
Google Scholar
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.
Google Scholar
J. Platt. Private Communication.
Google Scholar
J. Platt. Fastmap, MetricMap, and Landmark MDS are all Nyström algorithms. In Z. Ghahramani and R. Cowell, editors, Proc. 10th International Conference on Artificial Intelligence and Statistics, 2005.
Google Scholar
W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vettering. Numerical recipes in C: the art of scientific computing. Cambridge University Press, 2nd edition, 1992.
Google Scholar
S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(22):2323–2326, 2000.
Article Google Scholar
I.J. Schoenberg. Remarks to maurice frechet’s article sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur espace de hilbert. Annals of Mathematics, 36:724–732, 1935.
Article MATH MathSciNet Google Scholar
B. Schölkopf. The kernel trick for distances. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 301–307. MIT Press, 2001.
Google Scholar
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, 2002.
Google Scholar
B. Schölkopf, A. Smola, and K-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5): 1299–1319, 1998.
Article Google Scholar
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
Article Google Scholar
C.E. Spearman. ’General intelligence’ objectively determined and measured. American Journal of Psychology, 5:201–293, 1904.
Article Google Scholar
C.J. Stone. Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10(4): 1040–1053, 1982.
MATH MathSciNet Google Scholar
J.B. Tenenbaum. Mapping a manifold of perceptual observations. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
Google Scholar
M.E. Tipping and C.M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, 61(3):611, 1999A.
Article MathSciNet Google Scholar
M.E. Tipping and C.M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443–482, 1999B.
Article Google Scholar
P. Viola and M. Jones. Robust real-time object detection. In Second international workshop on statistical and computational theories of vision-modeling, learning, computing, and sampling, 2001.
Google Scholar
S. Wilks. Mathematical Statistics. John Wiley, 1962.
Google Scholar
C.K.I. Williams. On a Connection between Kernel PCA and Metric Multidimensional Scaling. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 675–681. MIT Press, 2001.
Google Scholar
C.K.I. Williams and M. Seeger. Using the Nystrom method to speed up kernel machines. In Leen, Dietterich, and Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682–688. MIT Press, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, USA
Christopher J. C. Burges

Authors

Christopher J. C. Burges
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Burges, C.J.C. (2005). Geometric Methods for Feature Extraction and Dimensional Reduction. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_4

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics