Advertisement

Similarity Grid for Searching in Metric Spaces

  • Michal Batko
  • Claudio Gennaro
  • Pavel Zezula
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3664)

Abstract

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM TOIS 21(2), 192–227 (2003)CrossRefGoogle Scholar
  2. 2.
    Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. In: Proc. of the XXI Conference of the Chilean Computer Science Society (SCCC 2001), pp. 33–40 (2001)Google Scholar
  3. 3.
    Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Proximity searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)CrossRefGoogle Scholar
  4. 4.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. of 23rd International Conference on Very Large Data Bases (VLDB), pp. 426–435 (1997)Google Scholar
  5. 5.
    Devine, R.: Design and implementation of DDH: A distributed dynamic hashing algorithm. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 101–114. Springer, Heidelberg (1993)Google Scholar
  6. 6.
    Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools and Applications 21(1), 9–13 (2003)CrossRefGoogle Scholar
  7. 7.
    Gennaro, C., Savino, P., Zezula, P.: Similarity search in metric databases through hashing. In: Proc. of the 3rd Work. on Multimedia Inf. Retrieval, pp. 1–5 (2001)Google Scholar
  8. 8.
    Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems 28(4), 517–580 (2003)CrossRefGoogle Scholar
  9. 9.
    Johnson, T., Krishna, P.: Lazy updates for distributed search structure. In: Proc. of the ACM SIGMOD, vol. 22(2), pp. 337–346 (1993)Google Scholar
  10. 10.
    Kröll, B., Widmayer, P.: Distributing a search tree among a growing number of processors. In: Proc. of the ACM SIGMOD, vol. 23(2), pp. 265–276 (1994)Google Scholar
  11. 11.
    Litwin, W., Neimat, M., Schneider, D.A.: LH* - a scalable, distributed data structure. ACM Transactions on Database Systems 21(4), 480–525 (1996)CrossRefGoogle Scholar
  12. 12.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. In: Proc. of ACM SIGCOMM 2001, pp. 161–172 (2001)Google Scholar
  13. 13.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proc. of Conference on Applications, tech., archit., and protocols for computer communications, pp. 175–186 (2003)Google Scholar
  14. 14.
    Uhlmann, J.K.: Satisfying general proximity / similarity queries with metric trees. IPL: Information Processing Letters 40, 175–179 (1991)zbMATHCrossRefGoogle Scholar
  15. 15.
    Zezula, P., Savino, P., Rabitti, F., Amato, G., Ciaccia, P.: Processing m-trees with parallel resources. In: Proc. of the 8th International Workshop on Research Issues in Data Engineering (RIDE 1998), Orlando, FL, February 1998, pp. 147–154 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Michal Batko
    • 1
  • Claudio Gennaro
    • 2
  • Pavel Zezula
    • 1
  1. 1.Masaryk UniversityBrnoCzech Republic
  2. 2.ISTI-CNRPisaItaly

Personalised recommendations