Geometric Structure of High-Dimensional Data
In applications, a high-dimensional data is given as a discrete set in a Euclidean space. If the points of data are well sampled on a manifold, then the data geometry is inherited from the manifold. Since the underlying manifold is hidden, it is hard to know its geometry by the classical manifold calculus. The data graph is a useful tool to reveal the data geometry. To construct a data graph, we first find the neighborhood system on the data, which is determined by the similarity (or dissimilarity) among the data points. The similarity information of data usually is driven by the application in which the data are used. In this chapter, we introduce the methods for defining the data similarity (or dissimilarity). We also introduce the preliminary spectral graph theory to analyze the data geometry. In Section 1, the construction of neighborhood system on data is discussed. The neighborhood system on a data set defines a data graph, which can be considered as a discrete form of a manifold. In Section 2, we introduce the basic concepts of graphs. In Section 3, the spectral graph analysis is introduced as a tool for analyzing the data geometry. Particularly, the Laplacian on a graph is briefly discussed in this section. Most of the materials in Sections 2 and 3 are found in [1–3].
KeywordsFast Fourier Transform Geometric Structure Adjacency Matrix Undirected Graph Weighted Graph
Unable to display preview. Download preview PDF.
- Bondy, J., Murty, U.: Graph Theory. Springer (2008).Google Scholar
- Chartrand, G.: Introductory Graph Theory. Dover (1985).Google Scholar
- Chung, F.R.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, No. 9. AMS (1996).Google Scholar
- Shakhnarovich, G., Darrell, T., Indyk, P. (eds.): Nearest-Neighbor Methods in Learning and Vision, Theory and Practice. MIT (2006).Google Scholar
- Bozkaya, T., Ozsoyoghu, M.: Distance-based indexing for highdimensional metric spaces. In: Proc. ACM SIGMOD, p. 357–368 (1997).Google Scholar
- Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. Proc. ACM SIGMOD p. 369–380 (1997).Google Scholar
- Lubiarz, S., Lockwood, P.: Evaluation of fast algorithms for finding the nearest neighbor. Proc. IEEE Int. Conf. Acoust., Speechand Signal Process. 2, 1491–1494 (1997).Google Scholar
- Yianilos, P.N.: Data structure and algorithms for nearest neighbor search in general metric spaces. Proc. ACM-SIAMSymp. Discr. Algorithms p. 311–321 (1993).Google Scholar
- Chui, C.K.: An Introduction to Wavelets, Wavelet Analysis and its Applications, vol. 1. Academic Press, Inc. (1992).Google Scholar
- Chui, C.K.: Wavelets: A Mathematical Tool for Signal Analysis. SIAMMonographs on Mathematical Modeling and Computation. Society for Industrial and Applied Mathematics, Philadelphia (1997).Google Scholar