Exploring the Hamming Distance in Distributed Infrastructures for Similarity Search

da Silva Villaça, Rodolfo; Pasquini, Rafael; de Paula, Luciano Bernardes; Magalhães, Maurício Ferreira

doi:10.1007/978-3-319-09177-8_1

Rodolfo da Silva Villaça⁸,
Rafael Pasquini⁹,
Luciano Bernardes de Paula¹⁰ &
…
Maurício Ferreira Magalhães¹¹

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 4))

3530 Accesses

Abstract

Nowadays, the amount of data available on the Internet is over Zettabytes (ZB). Such condition defines a scenario known in the literature as Big Data. Although traditional databases are very efficient for finding and retrieving specific content, they are inefficient on Big Data scenario, since the great majority of such data are unstructured and scattered across the Internet. In this way, new databases are required in order to support similarity search. In order to handle such challenging scenario, the proposal in this chapter is to explore the Hamming similarity existent between content identifiers that are generated using the Random Hyperplane Hashing function. Such identifiers provide the basis for building distributed infrastructures that facilitate the similarity search. In this chapter, we present two different approaches: a P2P solution (Hamming DHT) and a Data Center solution (HCube). Evaluations are presented and indicate that both are capable of improving the recall in a similarity search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards Load Balance and Maintenance in a Structured P2P Network for Locality Sensitive Hashing

NearBucket-LSH: Efficient Similarity Search in P2P Networks

LSH-based distributed similarity indexing with load balancing in high-dimensional space

Article 30 October 2019

References

Gantz, J., Reinsel, D.: The Digital Universe Decade - Are You Ready? http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf (2010) (Online; Acesso em 2 de Março de 2013)
The Apache Software Foundation: Apache\(\textsuperscript{\textregistered}\) Hadoop, http://hadoop.apache.org/ (2013) (Online; Acesso em 5 de Março de 2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: STOC 1998: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM, New York (1998)
Google Scholar
Charikar, M.S.: Similarity Estimation Techniques from Rounding Algorithms. In: STOC 2002: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 380–388 (2002)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Villaça, R., de Paula, L.B., Pasquini, R., Magalhães, M.F.: Hamming DHT: Taming the Similarity Search. In: Proceedings of the 10th Annual IEEE Consumer Communications and Networking Conference, CCNC 2013. IEEE Communications Society, Las Vegas (2013)
Google Scholar
Villaça, R., Pasquini, R., de Paula, L.B., Magalhães, M.F.: HCube: A Server-centric Data Center Structure for Similarity Search. In: Proceedings of the 27th International Conference on Advanced Information Networking and Applications, AINA 2013. IEEE Computer Society, Barcelona (2013)
Google Scholar
Desai, A., Singh, H., Pudi, V.: DISC: Data-Intensive Similarity Measure for Categorical Data. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS (LNAI), vol. 6635, pp. 469–481. Springer, Heidelberg (2011)
Chapter Google Scholar
Lee, D., Park, J., Shim, J., Lee, S.: Efficient Filtering Techniques for Cosine Similarity Joins. INFORMATION-An International Interdisciplinary Journal 14, 1265 (2011)
Google Scholar
Lawder, J.: The application of Space-filling Curves to the Storage and Retrieval of Multi-dimensional Data. PhD thesis, University of London, London (December 1999)
Google Scholar
Zhang, C., Xiao, W., Tang, D., Tang, J.: P2P-based multidimensional indexing methods: A survey. J. Syst. Softw. 84(12), 2348–2362 (2011)
Article Google Scholar
Olszak, A.: Hycube: a dht routing system based on a hierarchical hypercube geometry. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part II. LNCS, vol. 6068, pp. 260–269. Springer, Heidelberg (2010)
Chapter Google Scholar
Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33, 89–94 (2003)
Article Google Scholar
Bhattacharya, I., Kashyap, S., Parthasarathy, S.: Similarity Searching in Peer-to-Peer Databases. In: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, ICDCS 2005, pp. 329–338 (June 2005)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proc. of the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2006, vol. 7. USENIX, Berkeley (2006)
Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Article Google Scholar
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications. IEEE/ACM Trans. Netw. 11(1), 17–32 (2003)
Article Google Scholar
de Paula, L.B., Villaça, R.S., Magalhães, M.F.: Analysis of Concept Similarity Methods Applied to an LSH Function. In: COMPSAC 2011: Computer Software and Applications Conference. IEEE, Munich (2011)
Google Scholar
Faloutsos, C.: Gray Codes for Partial Match and Range Queries. IEEE Trans. Software Eng. 14(10), 1381–1393 (1988)
Article MATH MathSciNet Google Scholar
Pasquini, R.: Proposta de Roteamento Plano Baseado em uma Métrica de OU-Exclusivo e Visibilidade Local. Phd. thesis, Faculdade de Engenharia Eletrica e Computação. Universidade Estadual de Campinas, Campinas, SP (June 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Electronics (DCEL), Federal University of Espírito Santo (UFES), São Mateus/ES, Brazil
Rodolfo da Silva Villaça
Faculty of Computing (FACOM), Federal University of Uberlândia (UFU), Uberlândia/MG, Brazil
Rafael Pasquini
Federal Institute of Education, Science and Technology of São Paulo (IFSP), Bragança Paulista/SP, Brazil
Luciano Bernardes de Paula
School of Computing and Electrical Engineering (FEEC), State University of Campinas (UNICAMP), Campinas/SP, Brazil
Maurício Ferreira Magalhães

Authors

Rodolfo da Silva Villaça
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Pasquini
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Bernardes de Paula
View author publications
You can also search for this author in PubMed Google Scholar
Maurício Ferreira Magalhães
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodolfo da Silva Villaça .

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa
Fukuoka Institute of Technology (FIT), Fukuoka, Fukuoka, Japan
Leonard Barolli
University of Salerno, Salerno, Italy
Admir Barolli
Canadian Institute of Technology, Tirana, Albania
Petraq Papajorgji

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

da Silva Villaça, R., Pasquini, R., de Paula, L.B., Magalhães, M.F. (2015). Exploring the Hamming Distance in Distributed Infrastructures for Similarity Search. In: Xhafa, F., Barolli, L., Barolli, A., Papajorgji, P. (eds) Modeling and Processing for Next-Generation Big-Data Technologies. Modeling and Optimization in Science and Technologies, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09177-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-09177-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09176-1
Online ISBN: 978-3-319-09177-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Exploring the Hamming Distance in Distributed Infrastructures for Similarity Search

Abstract

Access this chapter

Preview

Similar content being viewed by others

Towards Load Balance and Maintenance in a Structured P2P Network for Locality Sensitive Hashing

NearBucket-LSH: Efficient Similarity Search in P2P Networks

LSH-based distributed similarity indexing with load balancing in high-dimensional space

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Exploring the Hamming Distance in Distributed Infrastructures for Similarity Search

Abstract

Access this chapter

Preview

Similar content being viewed by others

Towards Load Balance and Maintenance in a Structured P2P Network for Locality Sensitive Hashing

NearBucket-LSH: Efficient Similarity Search in P2P Networks

LSH-based distributed similarity indexing with load balancing in high-dimensional space

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation