Skip to main content

LuSH: A Generic High-Dimensional Index Framework

  • Conference paper
Web-Age Information Management (WAIM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7419))

Included in the following conference series:

  • 788 Accesses

Abstract

Fast similarity retrieval for high-dimensional unstructured data is becoming significantly important. In high-dimensional space, traditional tree-based index is incompetent comparing with hashing methods. As a state-of-the-art hashing approach, Spectral Hashing (SH) aims at designing compact binary codes for high-dimensional vectors so that the similarity structure of original vector space can be preserved in the code space. We propose a generic high-dimensional index framework named LuSH in this paper, which means Lucenebased SH. It uses SH as high-dimensional index and Lucene, the well-known open source inverted index, as underlying index file. To speedup retrieval efficiency, two improvement strategies are proposed. Experiments on large scale datasets containing up to 10 million data show significant performance of our LuSH framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics 8(6), 460–472 (1978)

    Article  Google Scholar 

  2. Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 312–322. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  4. Salakhutdinov, R., Hinton, G.: Semantic hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)

    Article  Google Scholar 

  5. Apache Lucene Project, http://lucene.apache.org/

  6. Guattman, A.: R-Tree: A Dynamic Index Structure for Spatial Searching. In: ACM SIGMOD Int. Conf. on Management of Data, Boston, pp. 47–57 (1984)

    Google Scholar 

  7. Bentley, J.L.: Multidimensional binary search Trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bellman, R.: Adaptive control processes: a guided tour. Princeton University Press (1961)

    Google Scholar 

  9. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM, New York (1998)

    Chapter  Google Scholar 

  10. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)

    Google Scholar 

  11. Salakhutdinov, R., Hinton, G.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: AI and Statistics, p. 5 (2007)

    Google Scholar 

  12. Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: ICCV, pp. 750–757 (2003)

    Google Scholar 

  13. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)

    Google Scholar 

  14. Bengio, Y., Delalleau, O., Roux, N., et al.: Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation 16(10), 2197–2219 (2004)

    Article  MATH  Google Scholar 

  15. Coifman, R., Lafon, S., Lee, A., et al.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. of the National Academy of Sciences of the United States of America 102(21), 7426 (2005)

    Article  Google Scholar 

  16. Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. Journal of Computer and System Sciences 74(8), 1289–1308 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Comer, D.: Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11(2), 121–137 (1979)

    Article  MATH  Google Scholar 

  18. SogouP2.0, http://www.sogou.com/labs/dl/p2.html

  19. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. Journal of Advances in Neural Information Processing Systems 2, 849–856 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, Z., Shao, J., Wu, F. (2012). LuSH: A Generic High-Dimensional Index Framework. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33050-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33049-0

  • Online ISBN: 978-3-642-33050-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics