International Conference on Multimedia Modeling

MultiMedia Modeling pp 104-115 | Cite as

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9517)

Abstract

Similarity search is a fundamental problem in various multimedia database applications. Due to the phenomenon of “curse of dimensionality”, the performance of many access methods decreases significantly when the dimensionality increases. Approximate similarity search is an alternative solution, and Locality Sensitive Hashing (LSH) is the most popular method for it. Nevertheless, LSH needs to verify a large number of points to get good-enough results, which incurs plenty of I/O cost. In this paper, we propose a new scheme called SortedKey and Early stop LSH (SELSH), which extends the previous SortingKeys-LSH (SK-LSH). SELSH uses a linear order to sort all the compound hash keys. Moreover, during query processing an early stop condition and a limited page number are used to determine whether a page needs to be accessed. Our experiments demonstrate the superiority of the proposed method against two state-of-the-art methods, C2LSH and SK-LSH.

Keywords

Approximate similarity search LSH Sorted keys Early stop condition 

References

  1. 1.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Berchtold, S., Böhm, C., Kriegel, H.: The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2–4, 1998, Seattle, pp. 142–153 (1998)Google Scholar
  3. 3.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRefGoogle Scholar
  4. 4.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, 25–29 August, 1997, pp. 426–435 (1997)Google Scholar
  5. 5.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, 8–11 June, 2004, pp. 253–262 (2004)Google Scholar
  6. 6.
    Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, 20–24 May, 2012, pp. 541–552 (2012)Google Scholar
  7. 7.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, 7–10 September, 1999, pp. 518–529 (1999)Google Scholar
  8. 8.
    Günther, O.: The design of the cell tree: an object-oriented index structure for geometric databases. In: Proceedings of the Fifth International Conference on Data Engineering, Los Angeles, 6–10 February, 1989, pp. 598–605 (1989)Google Scholar
  9. 9.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, 23–26 May, 1998, pp. 604–613 (1998)Google Scholar
  10. 10.
    Jagadish, H.V., Ooi, B.C., Tan, K., Yu, C., Zhang, R.: iDistance: an adaptive b\({}^{\text{+ }}\)-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)CrossRefGoogle Scholar
  11. 11.
    Liu, Y., Cui, J., Huang, Z., Li, H., Shen, H.T.: SK-LSH: an efficient index structure for approximate nearest neighbor search. PVLDB 7(9), 745–756 (2014)Google Scholar
  12. 12.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September, 2007, pp. 950–961 (2007)Google Scholar
  13. 13.
    Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z.: Inductive hashing on manifolds. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, 23–28 June, 2013, pp. 1562–1569 (2013)Google Scholar
  14. 14.
    Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z., Shen, H.T.: Hashing on nonlinear manifolds. IEEE Trans. Image Process. 24(6), 1839–1851 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)Google Scholar
  16. 16.
    Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3) (2010)Google Scholar
  17. 17.
    Weber, R., Böhm, K., Schek, H.: Interactive-time similarity search for large image collections using parallel va-files. In: ICDE. p. 197 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations