Approximate High-Dimensional Indexing with Kernel

Chang, Edward Y.

doi:10.1007/978-3-642-20429-6_11

Edward Y. Chang²

1000 Accesses
1 Citations

Abstract

Indexing high-dimensional data for efficient nearest-neighbor searches poses challenges. It is well known that when data dimension is very high, the search time can exceed the time required for performing a linear scan on the entire dataset. To alleviate this dimensionality curse, indexing schemes such as locality sensitive hashing (LSH) and M-trees were proposed to perform approximate searches. In this chapter,\(^\dagger\) we present a hypersphere indexer, named \({\sf SphereDex},\) to perform such searches. \({\sf SphereDex}\) partitions the data space using concentric hyperspheres. By exploiting geometric properties, \({\sf SphereDex}\) can perform effective pruning. Our empirical study shows that \({\sf SphereDex}\) enjoys three advantages over competing schemes for achieving the same level of search accuracy. First, \({\sf SphereDex}\) requires fewer disk-seek operations. Second, \({\sf SphereDex}\) can maintain disk accesses sequential most of the time. And third, it requires fewer distance computations. More importantly, \({\sf SphereDex}\) can be extended to support hyperplane queries for Support Vector Machines (SVMs) or the kernel methods. In classification problems using SVMs, the data instances closest to the hyperplane are considered to be most ambiguous, and the ones farthest away from the hyperplane to be most certain (or most confident) regarding their class membership. Hyperplane queries, rather than point queries, are essential to supporting fast retrieval of applications using SVMs. In the end of this chapter, we illustrate how \({\sf SphereDex}\) can be extended to support both nearest and farthest neighbor hyperplane query processing.

^†© ACM, 2006. This chapter is written based on my work with Navneet Panda [1] published in MULTIMEDIA’06. Permission to publish this chapter is granted under copyright license #2587680080590.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Each dimension of a rectangular MBR has two neighboring rectangular MBRs. To ensure retrieval of exact set of top-k nearest neighbors, a search needs to examine all \(3^d\) neighboring MBRs [11]. For the tree-structures using spherical MBRs, [5] reports that their performance decays almost as rapid as the tree-structures using rectangular MBRs.
2.
The M-tree also provides an approximate version of the algorithm [13]. For a fair comparison, we use the approximate version of the code provided by M. Patella to conduct experiments. The code is available on request, but is not part of the basic download available at [26].

References

N. Panda, E.Y. Chang, Efficient top-k hyperplane query processing for multimedia information retrieval, in Proceedings of ACM Multimedia, 2006, pp. 317–326
Google Scholar
S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, in Proceedings of ACM SODA, 1994, pp. 573–582
Google Scholar
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of VLDB, 1998, pp. 604–613
Google Scholar
J.M. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions, in Proceedings of ACM STOC, 1997, pp. 599–608
Google Scholar
R. Weber, H.J. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the 24th International Conference on Very Large Data Bases VLDB, 1998, pp. 194–205
Google Scholar
J. Bentley, Multidimensional binary search trees used for associative binary searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet MATH Google Scholar
N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1997, pp. 369–380
Google Scholar
D.A. White, R. Jain, Similarity indexing with the SS-Tree, in Proceedings of IEEE ICDE, 1996, pp. 516–523
Google Scholar
E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, in Proceedings of the 30th STOC, 1998, pp. 614–623
Google Scholar
K. Clarkson, An algorithm for approximate closest-point queries, in Proceedings of SCG, 1994, pp. 160–164
Google Scholar
C. Li, E. Chang, H. Garcia-Molina, G. Wilderhold, Clindex: approximate similarity queries in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 792–808 (2002)
Article Google Scholar
A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing. VLDB, 1999, pp. 301–312
Google Scholar
P. Ciaccia, M. Patella, Pac nearest neighbor queries: approximate and controlled search in high-dimensional and metric spaces, in Proceedings of IEEE ICDE, pp. 244–255 (2000)
Google Scholar
A. Qamra, Y. Meng, E.Y. Chang, Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(3):379–391 (2005)
Google Scholar
J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)
Article Google Scholar
D.A. Keim, Tutorial on high-dimensional index structures: database support for next decade’s applications, in Proceedings of the ACM SIGMOD, 1998, p. 501
Google Scholar
M.E. Moule, J. Sakuma, Fast approximate similarity search in extremely high-dimensional data sets, in Proceedings of IEEE ICDE, 2005, pp. 619–630
Google Scholar
S. Berchtold, D. Keim, H. Kriegel, The X-tree: an index structure for high-dimensional data, in Proceedings of the 22nd Conference on Very Large Databases VLDB, 1996, pp. 28–39
Google Scholar
N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The \({R}^{*}\) tree: an efficient and robust access method for points and rectangles, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1990, pp. 322–331
Google Scholar
K.I. Lin, H.V. Jagadish, C. Faloutsos, The TV-tree: an index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)
Article Google Scholar
P. Ciaccia, M. Patella, P. Zezula, M-tree: an efficient access method for similarity search in metric spaces, in Proceedings of 23rd International Conference on Very Large Databases VLDB, 1997, pp. 426–435
Google Scholar
T. Bozkaya, M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999)
Article Google Scholar
S. Brin, Near neighbor search in large metric spaces. VLDB, 1995, 574–584
Google Scholar
G. Navarro, Searching in metric spaces by spatial approximation, in SPIRE/CRIWG, 1999, pp. 141–148
Google Scholar
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A.E. Abbadi, Approximate nearest neighbor searching in multimedia databases, in Proceedings if IEEE ICDE, 2001, pp. 503–511
Google Scholar
M. Patella, http://www-db.deis.unibo.it/mtree/download.html
S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
B.S. Manjunath, Airphoto dataset, http://vision.ece.ucsb.edu/download.html
B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Article Google Scholar
E.B. Goldstein, Sensation and Perception, 4th edn. (Brooks/Cole, Pacific Grove, 1999)
Google Scholar
J.G. Leu, Computing a shape’s moments from its boundary. Pattern Recognit 24, 949–957 (1991)
Article MathSciNet Google Scholar
J.R. Smith, S.-F. Chang, VisualSEEk: A Fully Automated Content-Based Image Query System. ACM Multimedia, 1996, pp. 87–98.
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., Mountain View, CA, 94306, USA
Edward Y. Chang

Authors

Edward Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chang, E.Y. (2011). Approximate High-Dimensional Indexing with Kernel. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-20429-6_11
Published: 26 August 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics