Skip to main content

Approximate High-Dimensional Indexing with Kernel

  • Chapter
  • First Online:
Foundations of Large-Scale Multimedia Information Management and Retrieval

Abstract

Indexing high-dimensional data for efficient nearest-neighbor searches poses challenges. It is well known that when data dimension is very high, the search time can exceed the time required for performing a linear scan on the entire dataset. To alleviate this dimensionality curse, indexing schemes such as locality sensitive hashing (LSH) and M-trees were proposed to perform approximate searches. In this chapter,\(^\dagger\) we present a hypersphere indexer, named \({\sf SphereDex},\) to perform such searches. \({\sf SphereDex}\) partitions the data space using concentric hyperspheres. By exploiting geometric properties, \({\sf SphereDex}\) can perform effective pruning. Our empirical study shows that \({\sf SphereDex}\) enjoys three advantages over competing schemes for achieving the same level of search accuracy. First, \({\sf SphereDex}\) requires fewer disk-seek operations. Second, \({\sf SphereDex}\) can maintain disk accesses sequential most of the time. And third, it requires fewer distance computations. More importantly, \({\sf SphereDex}\) can be extended to support hyperplane queries for Support Vector Machines (SVMs) or the kernel methods. In classification problems using SVMs, the data instances closest to the hyperplane are considered to be most ambiguous, and the ones farthest away from the hyperplane to be most certain (or most confident) regarding their class membership. Hyperplane queries, rather than point queries, are essential to supporting fast retrieval of applications using SVMs. In the end of this chapter, we illustrate how \({\sf SphereDex}\) can be extended to support both nearest and farthest neighbor hyperplane query processing.

†© ACM, 2006. This chapter is written based on my work with Navneet Panda [1] published in MULTIMEDIA’06. Permission to publish this chapter is granted under copyright license #2587680080590.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Each dimension of a rectangular MBR has two neighboring rectangular MBRs. To ensure retrieval of exact set of top-k nearest neighbors, a search needs to examine all \(3^d\) neighboring MBRs [11]. For the tree-structures using spherical MBRs, [5] reports that their performance decays almost as rapid as the tree-structures using rectangular MBRs.

  2. 2.

    The M-tree also provides an approximate version of the algorithm [13]. For a fair comparison, we use the approximate version of the code provided by M. Patella to conduct experiments. The code is available on request, but is not part of the basic download available at [26].

References

  1. N. Panda, E.Y. Chang, Efficient top-k hyperplane query processing for multimedia information retrieval, in Proceedings of ACM Multimedia, 2006, pp. 317–326

    Google Scholar 

  2. S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, in Proceedings of ACM SODA, 1994, pp. 573–582

    Google Scholar 

  3. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of VLDB, 1998, pp. 604–613

    Google Scholar 

  4. J.M. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions, in Proceedings of ACM STOC, 1997, pp. 599–608

    Google Scholar 

  5. R. Weber, H.J. Schek, S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in Proceedings of the 24th International Conference on Very Large Data Bases VLDB, 1998, pp. 194–205

    Google Scholar 

  6. J. Bentley, Multidimensional binary search trees used for associative binary searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1997, pp. 369–380

    Google Scholar 

  8. D.A. White, R. Jain, Similarity indexing with the SS-Tree, in Proceedings of IEEE ICDE, 1996, pp. 516–523

    Google Scholar 

  9. E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, in Proceedings of the 30th STOC, 1998, pp. 614–623

    Google Scholar 

  10. K. Clarkson, An algorithm for approximate closest-point queries, in Proceedings of SCG, 1994, pp. 160–164

    Google Scholar 

  11. C. Li, E. Chang, H. Garcia-Molina, G. Wilderhold, Clindex: approximate similarity queries in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 792–808 (2002)

    Article  Google Scholar 

  12. A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing. VLDB, 1999, pp. 301–312

    Google Scholar 

  13. P. Ciaccia, M. Patella, Pac nearest neighbor queries: approximate and controlled search in high-dimensional and metric spaces, in Proceedings of IEEE ICDE, pp. 244–255 (2000)

    Google Scholar 

  14. A. Qamra, Y. Meng, E.Y. Chang, Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(3):379–391 (2005)

    Google Scholar 

  15. J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)

    Article  Google Scholar 

  16. D.A. Keim, Tutorial on high-dimensional index structures: database support for next decade’s applications, in Proceedings of the ACM SIGMOD, 1998, p. 501

    Google Scholar 

  17. M.E. Moule, J. Sakuma, Fast approximate similarity search in extremely high-dimensional data sets, in Proceedings of IEEE ICDE, 2005, pp. 619–630

    Google Scholar 

  18. S. Berchtold, D. Keim, H. Kriegel, The X-tree: an index structure for high-dimensional data, in Proceedings of the 22nd Conference on Very Large Databases VLDB, 1996, pp. 28–39

    Google Scholar 

  19. N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The \({R}^{*}\) tree: an efficient and robust access method for points and rectangles, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1990, pp. 322–331

    Google Scholar 

  20. K.I. Lin, H.V. Jagadish, C. Faloutsos, The TV-tree: an index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)

    Article  Google Scholar 

  21. P. Ciaccia, M. Patella, P. Zezula, M-tree: an efficient access method for similarity search in metric spaces, in Proceedings of 23rd International Conference on Very Large Databases VLDB, 1997, pp. 426–435

    Google Scholar 

  22. T. Bozkaya, M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999)

    Article  Google Scholar 

  23. S. Brin, Near neighbor search in large metric spaces. VLDB, 1995, 574–584

    Google Scholar 

  24. G. Navarro, Searching in metric spaces by spatial approximation, in SPIRE/CRIWG, 1999, pp. 141–148

    Google Scholar 

  25. H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A.E. Abbadi, Approximate nearest neighbor searching in multimedia databases, in Proceedings if IEEE ICDE, 2001, pp. 503–511

    Google Scholar 

  26. M. Patella, http://www-db.deis.unibo.it/mtree/download.html

  27. S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  28. B.S. Manjunath, Airphoto dataset, http://vision.ece.ucsb.edu/download.html

  29. B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)

    Article  Google Scholar 

  30. E.B. Goldstein, Sensation and Perception, 4th edn. (Brooks/Cole, Pacific Grove, 1999)

    Google Scholar 

  31. J.G. Leu, Computing a shape’s moments from its boundary. Pattern Recognit 24, 949–957 (1991)

    Article  MathSciNet  Google Scholar 

  32. J.R. Smith, S.-F. Chang, VisualSEEk: A Fully Automated Content-Based Image Query System. ACM Multimedia, 1996, pp. 87–98.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres

About this chapter

Cite this chapter

Chang, E.Y. (2011). Approximate High-Dimensional Indexing with Kernel. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20429-6_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20428-9

  • Online ISBN: 978-3-642-20429-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics