Abstract
This chapter gives an overview of the book. Section 1 briefly introduces high-dimensional data and the necessity of dimensionality reduction. Section 2 discusses the acquisition of high-dimensional data. When dimensions of the data are very high, we shall meet the so-called curse of dimensionality, which is discussed in Section 3. The concepts of extrinsic and intrinsic dimensions of data are discussed in Section 4. It is pointed out that most high-dimensional data have low intrinsic dimensions. Hence, the material in Section 4 shows the possibility of dimensionality reduction. Finally, Section 5 gives an outline of the book.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ashbaugh, D.R.: Ridgeology. Journal of Forensic Identification 41(1) (1991). URL http://onin.com/fp/ridgeology.pdf. Accessed 1 December 2011.
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~ mccallum/bow. Accessed 1 December 2011.
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980).
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961).
Scott, D.W., Thompson, J.R.: Probability density estimation in higher dimensions. In: J.E. Gentle (ed.) Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface, pp. 173–179. North Holland-Elsevier Science Publishers, Amsterdam, New York, Oxford (1983).
Posse, C.: Tools for two-dimensional exploratory projection pursuit. Journal of Computational and Graphical Statistics 4, 83–100 (1995).
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer (2007).
Carreira-Perpiñán, M. Á.: A review of dimension reduction techniques. Tech. Rep. CS-96-09, Dept. of Computer Science, University of Sheffield, UK (1996).
Demartines, P.: Analyse de dennées par réseaux de newrones auto-organisés. Ph.D. thesis, Institut National Polytechnique de Grenoble, Grenoble, France (1994).
Aggarwal, C.C., Hinneburg, A., Kein, D.A.: On the surpricing behavior of distance metrics in high dimensional space. In: J.V. den Busche, V. Vianu (eds.) Proceedings of the Eighth International Conference on Database Theory, Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer, London (2001).
Beyer, K.S., Goldstein, J., U. Shaft, R.R.: When is “nearest neighbor” meaningful? In: Proceeedings of the Seventh International Conference on Database Theory, Lecture Notes in Computer Science, vol. 1540, pp. 217–235. Springer-Verlag, Jerusalem, Israel (1999).
Borodin, A., Ostrovsky, R., Rabani, Y.: Low bounds for high dimensional nearest neighbor search and relaterd problems. In: Proceeedings of the thirtyfirst annual ACM symposium on Theory of Computing, Atlanta, GA, pp. 312–321. ACM Press, New York (1999).
Francois, D.: High-dimensional data analysis: optional metrics and feature selection. Ph.D. thesis, Université catholique de Louvain, Département d’Ingénierie Mathématique, Louvain-la-Neuve, Belgium (2006).
Erlebach, T., Jansen, K., Seidel, E.: Polynomial-time approximation schemes for geometric graphs. In: Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, pp. 671–679 (2001).
Hastad, J.: Clique is hard to approximate within n 1-ε. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 627–636 (1996).
Pettis, K.W., Bailey, T.A., Jain, A.K., Dubes, R.C.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans on PAMI 1, 25–37 (1979).
Verveer, P., Duin, R.: An evaluation of intrinsic dimensionality estimators. IEEE Trans. on PAMI 17(1), 81–86 (1995).
Kegl, B.: Intrinsic dimension estimation using packing numbers. in Advanced NIPS 14 (2002).
Little, A., Lee, J., Jung, Y.M., Maggioni, M.: Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale svd. In: Statistical Signal Processing, IEEE/SP 15th Workshop, pp. 85–88 (2009).
Lee, J.: Multiscale estimation of intrinsic dimensionality of point cloud data and multiscale analysis of dynamic graphs (2010).
Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in r d. In: Proceedings of the 22nd international conference on Machine learning, ACM International Conference Proceeding Series, vol. 119. ACM, New York (2005).
Costa, J., Hero, A.O.: Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. on Signal Processing 52, 2210–2221 (2004).
Levina, E., Bickel, P.: Maximum likelihood estimation of intrinsic dimension. In: L.B. L. K. Saul Y. Weiss (ed.) Advances in NIPS 17 (2005).
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000).
Weinberger, K.Q., Packer, B.D., Saul, L.K.: Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proc. of the 10th International Workshop on AI and Statistics (2005).
Gashler, M., Ventura, D., Martinez, T.: Iterative non-linear dimensionality reduction with manifold sculpting. In: J. Platt, D. Koller, Y. Singer, S. Roweis (eds.) Advances in Neural Information Processing Systems 20, pp. 513–520. MIT Press, Cambridge (2008).
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Networks 19(6), 889–899 (2006).
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000).
Zhang, Z.Y., Zha, H.Y.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004).
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003).
Donoho, D.L., Grimes, C.: Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. USA 100, 5591–5596 (2003).
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).
Ultsch, A.: Emergence in self-organizing feature maps. In: Workshop on Self-Organizing Maps (WSOM 07). Bielefeld (2007).
Bishop, C.M., Hinton, G.E., Strachan, I.G.D.: GTM through time. In: IEE Fifth International Conference on Artificial Neural Networks, pp. 111–116 (1997).
Bishop, C.M., Svensén, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Computation 10(1), 215–234 (1998).
Candes, E., Tao, T.: Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. on Information Theory 52(12), 5406–5425 (2006).
Candes, E., Wakin, M.: An introduction to compressive sampling. IEEE Trans. Signal Processing 25(2), 21–30 (2008).
Donoho, D.: Compressed sensing. IEEE Trans. on Information Theory 52(4), 1289–1306 (2006).
Hayes, B.: The best bits. American Scientist 97(4), pp. 276 (2009).
Sarvotham, S., Baron, D., Baraniuk, R.: Measurements vs. bits: Compressed sensing meets information theory. In: Proc. Allerton Conference on Communication, Control, and Computing. Monticello, IL (2006).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Higher Education Press, Beijing and Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, J. (2012). Introduction. In: Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27497-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-27497-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27496-1
Online ISBN: 978-3-642-27497-8
eBook Packages: Computer ScienceComputer Science (R0)