Metric-Based Similarity Search in Unstructured Peer-to-Peer Systems
Abstract
Peer-to-peer systems constitute a promising solution for deploying novel applications, such as distributed image retrieval. Efficient search over widely distributed multimedia content requires techniques for distributed retrieval based on generic metric distance functions. In this paper, we propose a framework for distributed metric-based similarity search, where each participating peer stores its own data autonomously. In order to establish a scalable and efficient search mechanism, we adopt a super-peer architecture, where super-peers are responsible for query routing. We propose the construction of metric routing indices suitable for distributed similarity search in metric spaces. Furthermore, we present a query routing algorithm that exploits pruning techniques to selectively direct queries to super-peers and peers with relevant data. We study the performance of the proposed framework using both synthetic and real data demonstrate its scalability over a wide range of experimental setups.
Keywords
Query Processing Similarity Search Range Query Distribute Hash Table Query ObjectPreview
Unable to display preview. Download preview PDF.
References
- 1.Banaei-Kashani, F., Shahabi, C.: SWAM: a family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of CIKM 2004, pp. 304–313 (2004)Google Scholar
- 2.Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubský, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools Appl. 47(3), 599–629 (2010)CrossRefGoogle Scholar
- 3.Batko, M., Gennaro, C., Zezula, P.: A Scalable Nearest Neighbor Search in P2P Systems. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 79–92. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 4.Batko, M., Novak, D., Falchi, F., Zezula, P.: On scalability of the similarity search in the world of peers. In: Proceedings of International Conference on Scalable Information Systems (InfoScale), vol. 20 (2006)Google Scholar
- 5.Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of WWW 2005, pp. 651–660 (2005)Google Scholar
- 6.Bharambe, A.R., Agrawal, M., Seshan, S.: Mercury: supporting scalable multi-attribute range queries. In: Proceedings of SIGCOMM 2004, pp. 353–366 (2004)Google Scholar
- 7.Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys (CSUR) 33(3), 273–321 (2001)CrossRefGoogle Scholar
- 8.Ciaccia, P., Patella, M.: Bulk loading the M-tree. In: Proceedings of Australasian Database Conference (ADC), pp. 15–26 (1998)Google Scholar
- 9.Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 426–435 (1997)Google Scholar
- 10.Crainiceanu, A., Linga, P., Gehrke, J., Shanmugasundaram, J.: P-tree: a P2P index for resource discovery applications. In: Proceedings of WWW 2004 (2004)Google Scholar
- 11.Crainiceanu, A., Linga, P., Machanavajjhala, A., Gehrke, J., Shanmugasundaram, J.: P-ring: An efficient and robust p2p range index structure. In: Proceedings of SIGMOD, pp. 223–234 (2007)Google Scholar
- 12.Datta, A., Hauswirth, M., John, R., Schmidt, R., Aberer, K.: Range queries in trie-structured overlays. In: Proceedings of P2P 2005, pp. 57–66 (2005)Google Scholar
- 13.Dohnal, V., Sedmidubsky, J., Zezula, P., Novak, D.: Similarity searching: Towards bulk-loading peer-to-peer networks. In: Proceedings of International Workshop on Similarity Search and Applications (SISAP), pp. 87–94 (2008)Google Scholar
- 14.Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Peer-to-peer similarity search in metric spaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 986–997 (2007)Google Scholar
- 15.Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Efficient range query processing in metric spaces over highly distributed data. Distributed and Parallel Databases 26(2-3), 155–180 (2009)CrossRefGoogle Scholar
- 16.Doulkeridis, C., Vlachou, A., Nørvåg, K., Kotidis, Y., Vazirgiannis, M.: Efficient search based on content similarity over self-organizing p2p networks. Peer-to-Peer Networking and Applications 3(1), 67–79 (2010)CrossRefGoogle Scholar
- 17.Falchi, F., Gennaro, C., Zezula, P.: A Content–Addressable Network for Similarity Search in Metric Spaces. In: Moro, G., et al. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 18.Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of VLDB 2004, pp. 444–455 (2004)Google Scholar
- 19.Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS) 28(4), 517–580 (2003)CrossRefGoogle Scholar
- 20.Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: An adaptive B + -tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS) 30(2), 364–397 (2005)CrossRefGoogle Scholar
- 21.Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: a balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB 2005, pp. 661–672 (2005)Google Scholar
- 22.Jagadish, H.V., Ooi, B.C., Vu, Q.H., Zhang, R., Zhou, A.: VBI-tree: A peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of ICDE 2006, vol. 34 (2006)Google Scholar
- 23.Kalnis, P., Ng, W.S., Ooi, B.C., Tan, K.-L.: Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1), 57–72 (2006)CrossRefGoogle Scholar
- 24.Liu, B., Lee, W.-C., Lee, D.L.: Supporting complex multi-dimensional queries in P2P systems. In: Proceedings of ICDCS 2005, pp. 155–164 (2005)Google Scholar
- 25.Novak, D., Batko, M., Zezula, P.: Large-scale similarity data management with distributed metric index. In: Information Processing and Management (2011)Google Scholar
- 26.Novak, D., Zezula, P.: M-Chord: a scalable distributed similarity search structure. In: Proceedings of International Conference on Scalable Information Systems (InfoScale), vol. 19 (2006)Google Scholar
- 27.Ntarmos, N., Pitoura, T., Triantafillou, P.: Range Query Optimization Leveraging Peer Heterogeneity in DHT Data Networks. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 111–122. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 28.Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), pp. 161–172 (2001)Google Scholar
- 29.Shen, H.T., Shu, Y., Yu, B.: Efficient semantic-based content search in P2P network. IEEE Trans. Knowl. Data Eng. 16(7), 813–826 (2004)CrossRefGoogle Scholar
- 30.Shu, Y., Ooi, B.C., Tan, K.-L., Zhou, A.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of P2P 2005, pp. 173–180 (2005)Google Scholar
- 31.Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), pp. 149–160 (2001)Google Scholar
- 32.Vlachou, A., Doulkeridis, C., Kotidis, Y.: Peer-to-Peer Similarity Search Based on M-Tree Indexing. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 269–275. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 33.Vlachou, A., Doulkeridis, C., Mavroeidis, D., Vazirgiannis, M.: Designing a Peer-to-Peer Architecture for Distributed Image Retrieval. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) AMR 2007. LNCS, vol. 4918, pp. 182–195. Springer, Heidelberg (2008)CrossRefGoogle Scholar