Rank Aggregation of Candidate Sets for Efficient Similarity Search

  • David Novak
  • Pavel Zezula
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8645)

Abstract

Many current applications need to organize data with respect to mutual similarity between data objects. Generic similarity retrieval in large data collections is a tough task that has been drawing researchers’ attention for two decades. A typical general strategy to retrieve the most similar objects to a given example is to access and then refine a candidate set of objects; the overall search costs (and search time) then typically correlate with the candidate set size. We propose a generic approach that combines several independent indexes by aggregating their candidate sets in such a way that the resulting candidate set can be one or two orders of magnitude smaller (while keeping the answer quality). This achievement comes at the expense of higher computational costs of the ranking algorithm but experiments on two real-life and one artificial datasets indicate that the overall gain can be significant.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amato, G., Gennaro, C., Savino, P.: MI-File: Using inverted files for scalable approximate similarity search. In: Multimedia Tools and Appl., pp. 1–30 (2012)Google Scholar
  2. 2.
    Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools and Appl. 47(3), 599–629 (2010)CrossRefGoogle Scholar
  3. 3.
    Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proc. ACM Int. Conference on Multimedia Retrieval, pp. 1–8 (2011)Google Scholar
  5. 5.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR 0905.4 (2009)Google Scholar
  6. 6.
    Chávez, E., Figueroa, K., Navarro, G.: Effective Proximity Retrieval by Ordering Permutations. IEEE Tran.,on Pattern Anal.,& Mach.,Intel. 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  7. 7.
    Edsberg, O., Hetland, M.L.: Indexing inexact proximity search with distance regression in pivot space. In: Proceedings of SISAP 2010, pp. 51–58. ACM Press, NY (2010)Google Scholar
  8. 8.
    Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5), 889–902 (2012)CrossRefGoogle Scholar
  9. 9.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Alg., Phil., USA, pp. 28–36 (2003)Google Scholar
  10. 10.
    Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of ACM SIGMOD 2003, pp. 301–312. ACM Press, New York (2003)CrossRefGoogle Scholar
  11. 11.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB 1999, pp. 518–529. Morgan Kaufmann (1999)Google Scholar
  12. 12.
    Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)CrossRefGoogle Scholar
  13. 13.
    Novak, D., Kyselak, M., Zezula, P.: On locality- sensitive indexing in generic metric spaces. In: Proc. of SISAP 2010, pp. 59–66. ACM Press (2010)Google Scholar
  14. 14.
    Novak, D., Zezula, P.: Performance Study of Independent Anchor Spaces for Similarity Searching. The Computer Journal, 1–15 (October 2013)Google Scholar
  15. 15.
    Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms 7(1), 36–48 (2009)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Skala, M.: Counting distance permutations. Journal of Discrete Algorithms 7(1), 49–61 (2009)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • David Novak
    • 1
  • Pavel Zezula
    • 1
  1. 1.Masaryk UniversityBrnoCzech Republic

Personalised recommendations