Skip to main content

Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data

  • Conference paper
Similarity Search and Applications (SISAP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8821))

Included in the following conference series:

Abstract

Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces 33(3), 273–321 (September 2001)

    Google Scholar 

  2. Akune, F., Valle, E., Torres, R.: MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases. In: 20th Int. Conf. on Pattern Recognition, pp. 4145–4148. IEEE (August 2010)

    Google Scholar 

  3. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. of 13th Ann. ACM Symp. on Theory of Comp., pp. 604–613 (1998)

    Google Scholar 

  4. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. of the 25th Int. Conf. on Very Large Data Bases, pp. 518–529 (1999)

    Google Scholar 

  5. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. of the 20th Ann. Symp. on Computational Geometry, p. 253 (2004)

    Google Scholar 

  6. Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms 31(11), 1348–1358 (August 2010)

    Google Scholar 

  7. Kang, B., Jung, K.: Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets. In: NIPS Workshop on Big Learning (BigLearn), Lake Tahoe, Nevada, pp. 1–8 (2012)

    Google Scholar 

  8. Tellez, E.S., Chavez, E.: On locality sensitive hashing in metric spaces. In: Proc. of the Third Int. Conf. on Similarity Search and Applications, SISAP 2010, pp. 67–74. ACM, New York (2010)

    Chapter  Google Scholar 

  9. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  10. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proc. of the 33rd Int. Conf. on Very large data bases. VLDB 2007, pp. 950–961. VLDB Endowment (2007)

    Google Scholar 

  11. Joly, A., Buisson, O.: A posteriori multi-probe locality sensitive hashing. In: Proc. of the 16th ACM Int. Conf. on Multimedia, MM 2008, pp. 209–218. ACM, New York (2008)

    Google Scholar 

  12. Novak, D., Batko, M.: Metric Index: An Efficient and Scalable Solution for Similarity Search. In: 2009 Second Int. Workshop on Similarity Search and Applications, pp. 65–73. IEEE Computer Society (August 2009)

    Google Scholar 

  13. Novak, D., Kyselak, M., Zezula, P.: On locality-sensitive indexing in generic metric spaces. In: Proc. of the Third Int. Conf. on Similarity Search and Applications, SISAP 2010, pp. 59–66. ACM Press, New York (2010)

    Chapter  Google Scholar 

  14. Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The Effectiveness of Lloyd-Type Methods for the k-Means Problem. In: Focs, pp. 165–176. IEEE (December 2006)

    Google Scholar 

  15. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symp. on Discrete Algorithms, SODA 2007, Philadelphia, PA, USA, pp. 1027–1035 (2007)

    Google Scholar 

  16. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, 9th edn. Wiley-Interscience, New York (1990)

    Book  Google Scholar 

  17. Paterlini, A.A., Nascimento, M.A., Junior, C.T.: Using Pivots to Speed-Up k-Medoids Clustering 2(2), 221–236 (June 2011)

    Google Scholar 

  18. Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering 36(2), 3336–3341 (2009)

    Google Scholar 

  19. Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007), http://www.sisap.org/Metric_Space_Library.html

  20. Jegou, H., Tavenard, R., Douze, M., Amsaleg, L.: Searching in one billion vectors: Re-rank with source coding. In: ICASSP, pp. 861–864. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Silva, E., Teixeira, T., Teodoro, G., Valle, E. (2014). Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds) Similarity Search and Applications. SISAP 2014. Lecture Notes in Computer Science, vol 8821. Springer, Cham. https://doi.org/10.1007/978-3-319-11988-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11988-5_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11987-8

  • Online ISBN: 978-3-319-11988-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics