Abstract
Indexing high-dimensional data for efficient nearest-neighbor searches poses challenges. It is well known that when data dimension is very high, the search time can exceed the time required for performing a linear scan on the entire dataset. To alleviate this dimensionality curse, indexing schemes such as locality sensitive hashing (LSH) and M-trees were proposed to perform approximate searches. In this chapter,\(^\dagger\) we present a hypersphere indexer, named \({\sf SphereDex},\) to perform such searches. \({\sf SphereDex}\) partitions the data space using concentric hyperspheres. By exploiting geometric properties, \({\sf SphereDex}\) can perform effective pruning. Our empirical study shows that \({\sf SphereDex}\) enjoys three advantages over competing schemes for achieving the same level of search accuracy. First, \({\sf SphereDex}\) requires fewer disk-seek operations. Second, \({\sf SphereDex}\) can maintain disk accesses sequential most of the time. And third, it requires fewer distance computations. More importantly, \({\sf SphereDex}\) can be extended to support hyperplane queries for Support Vector Machines (SVMs) or the kernel methods. In classification problems using SVMs, the data instances closest to the hyperplane are considered to be most ambiguous, and the ones farthest away from the hyperplane to be most certain (or most confident) regarding their class membership. Hyperplane queries, rather than point queries, are essential to supporting fast retrieval of applications using SVMs. In the end of this chapter, we illustrate how \({\sf SphereDex}\) can be extended to support both nearest and farthest neighbor hyperplane query processing.
†© ACM, 2006. This chapter is written based on my work with Navneet Panda [1] published in MULTIMEDIA’06. Permission to publish this chapter is granted under copyright license #2587680080590.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Each dimension of a rectangular MBR has two neighboring rectangular MBRs. To ensure retrieval of exact set of top-k nearest neighbors, a search needs to examine all \(3^d\) neighboring MBRs [11]. For the tree-structures using spherical MBRs, [5] reports that their performance decays almost as rapid as the tree-structures using rectangular MBRs.
- 2.
References
N. Panda, E.Y. Chang, Efficient top-k hyperplane query processing for multimedia information retrieval, in Proceedings of ACM Multimedia, 2006, pp. 317–326
S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, in Proceedings of ACM SODA, 1994, pp. 573–582
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of VLDB, 1998, pp. 604–613
J.M. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions, in Proceedings of ACM STOC, 1997, pp. 599–608
R. Weber, H.J. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the 24th International Conference on Very Large Data Bases VLDB, 1998, pp. 194–205
J. Bentley, Multidimensional binary search trees used for associative binary searching. Commun. ACM 18(9), 509–517 (1975)
N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1997, pp. 369–380
D.A. White, R. Jain, Similarity indexing with the SS-Tree, in Proceedings of IEEE ICDE, 1996, pp. 516–523
E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, in Proceedings of the 30th STOC, 1998, pp. 614–623
K. Clarkson, An algorithm for approximate closest-point queries, in Proceedings of SCG, 1994, pp. 160–164
C. Li, E. Chang, H. Garcia-Molina, G. Wilderhold, Clindex: approximate similarity queries in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 792–808 (2002)
A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing. VLDB, 1999, pp. 301–312
P. Ciaccia, M. Patella, Pac nearest neighbor queries: approximate and controlled search in high-dimensional and metric spaces, in Proceedings of IEEE ICDE, pp. 244–255 (2000)
A. Qamra, Y. Meng, E.Y. Chang, Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(3):379–391 (2005)
J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)
D.A. Keim, Tutorial on high-dimensional index structures: database support for next decade’s applications, in Proceedings of the ACM SIGMOD, 1998, p. 501
M.E. Moule, J. Sakuma, Fast approximate similarity search in extremely high-dimensional data sets, in Proceedings of IEEE ICDE, 2005, pp. 619–630
S. Berchtold, D. Keim, H. Kriegel, The X-tree: an index structure for high-dimensional data, in Proceedings of the 22nd Conference on Very Large Databases VLDB, 1996, pp. 28–39
N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The \({R}^{*}\) tree: an efficient and robust access method for points and rectangles, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1990, pp. 322–331
K.I. Lin, H.V. Jagadish, C. Faloutsos, The TV-tree: an index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)
P. Ciaccia, M. Patella, P. Zezula, M-tree: an efficient access method for similarity search in metric spaces, in Proceedings of 23rd International Conference on Very Large Databases VLDB, 1997, pp. 426–435
T. Bozkaya, M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999)
S. Brin, Near neighbor search in large metric spaces. VLDB, 1995, 574–584
G. Navarro, Searching in metric spaces by spatial approximation, in SPIRE/CRIWG, 1999, pp. 141–148
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A.E. Abbadi, Approximate nearest neighbor searching in multimedia databases, in Proceedings if IEEE ICDE, 2001, pp. 503–511
M. Patella, http://www-db.deis.unibo.it/mtree/download.html
S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
B.S. Manjunath, Airphoto dataset, http://vision.ece.ucsb.edu/download.html
B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
E.B. Goldstein, Sensation and Perception, 4th edn. (Brooks/Cole, Pacific Grove, 1999)
J.G. Leu, Computing a shape’s moments from its boundary. Pattern Recognit 24, 949–957 (1991)
J.R. Smith, S.-F. Chang, VisualSEEk: A Fully Automated Content-Based Image Query System. ACM Multimedia, 1996, pp. 87–98.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres
About this chapter
Cite this chapter
Chang, E.Y. (2011). Approximate High-Dimensional Indexing with Kernel. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)