Advertisement

Fast Similarity Search with the Earth Mover’s Distance via Feasible Initialization and Pruning

  • Merih Seran Uysal
  • Kai Driessen
  • Tobias Brockhoff
  • Thomas Seidl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10609)

Abstract

The Earth Mover’s Distance (EMD) is a similarity measure successfully applied to multidimensional distributions in numerous domains. Although the EMD yields very effective results, its high computational time complexity still remains a real bottleneck. Existing approaches used within a filter-and-refine framework aim at reducing the number of exact distance computations to alleviate query time cost. However, the refinement phase in which the exact EMD is computed dominates the overall query processing time. To this end, we propose to speed up the refinement phase by applying a novel feasible initialization technique (INIT) for the EMD computation which reutilizes the state-of-the-art lower bound IM-Sig. Our experimental evaluation over three real-world datasets points out the efficiency of our approach (This work is partially based on [12]).

Keywords

Earth Mover’s Distance Similarity search Lower bound Filter distance Initialization Refinement phase 

References

  1. 1.
    Assent, I., Wenning, A., Seidl, T.: Approximation techniques for indexing the earth mover’s distance in multimedia databases. In: ICDE, p. 11 (2006)Google Scholar
  2. 2.
    Cohen, S.D., Guibas, L.J.: The earth mover’s distance: lower bounds and invariance under translation, Technical report. Stanford University (1997)Google Scholar
  3. 3.
    Gondzio, J.: Interior point methods 25 years later. EJOR 218(3), 587–601 (2012)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Hillier, F., Lieberman, G.: Introduction to Linear Programming. McGraw-Hill, New York (1990)MATHGoogle Scholar
  5. 5.
    Hinneburg, A., Lehner, W.: Database support for 3D-protein data set analysis. In: SSDBM, pp. 161–170 (2003)Google Scholar
  6. 6.
    Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML, pp. 957–966 (2015)Google Scholar
  7. 7.
    Lehmann, T., et al.: Content-based image retrieval in medical applications. Methods Inf. Med. 43(4), 354–361 (2004)Google Scholar
  8. 8.
    Pele, O., Werman, M.: A linear time histogram metric for improved SIFT matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 495–508. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88690-7_37 CrossRefGoogle Scholar
  9. 9.
    Rubner, Y., Tomasi, C., Guibas, L.: A metric for distributions with applications to image databases. In: ICCV, pp. 59–66 (1998)Google Scholar
  10. 10.
    Ruttenberg, B.E., Singh, A.K.: Indexing the earth mover’s distance using normal distributions. PVLDB 5(3), 205–216 (2011)Google Scholar
  11. 11.
    Seidl, T., Kriegel, H.: Optimal multi-step k-nearest neighbor search. In: SIGMOD, pp. 154–165 (1998)Google Scholar
  12. 12.
    Uysal, M.S.: Efficient Similarity Search in Large Multimedia Databases. Apprimus Verlag (2017)Google Scholar
  13. 13.
    Uysal, M.S., et al.: Efficient filter approximation using the EMD in very large multimedia databases with feature signatures. In: CIKM, pp. 979–988 (2014)Google Scholar
  14. 14.
    Vanderbei, R.J., Progr, L.: Foundations and Extensions. Springer, US (2014)Google Scholar
  15. 15.
    Vandersmissen, B., et al.: The rise of mobile and social short-form video: an in-depth measurement study of vine. In: SoMuS, vol. 1198, pp. 1–10 (2014)Google Scholar
  16. 16.
    Wichterich, M., et al.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)Google Scholar
  17. 17.
    Xu, J., Zhang, Z., et al.: Efficient and effective similarity search over probabilistic data based on earth mover’s distance. PVLDB 3(1), 758–769 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Merih Seran Uysal
    • 1
  • Kai Driessen
    • 1
  • Tobias Brockhoff
    • 1
  • Thomas Seidl
    • 2
  1. 1.Data Management and Exploration GroupRWTH Aachen UniversityAachenGermany
  2. 2.Database Systems GroupLMU MunichMunichGermany

Personalised recommendations