Advertisement

Approximation-Based Efficient Query Processing with the Earth Mover’s Distance

  • Merih Seran UysalEmail author
  • Daniel Sabinasz
  • Thomas Seidl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9643)

Abstract

The Earth Mover’s Distance (EMD) is an effective distance-based similarity measure which determines the dissimilarity between data objects by the minimum amount of work required to transform one signature into another one. Although the EMD has been proven to reflect the human perceptual similarity very well in prevalent applications and domains, its high computational time complexity hinders its application to large-scale datasets where the user is rather interested in receiving an answer from the underlying application within a short period of time than requesting an exact and complete query result set. To this end, we propose to improve the efficiency of the query processing with the EMD on signature databases by utilizing signature compression approximations. We introduce an efficient signature compression algorithm to alleviate query computation cost. Furthermore, we theoretically explicate and analyze the approximation-based EMD and the relationship between the proposal and the original EMD. Moreover, our extensive experiments on 4 real world datasets point out the accuracy and efficiency of our approach.

Keywords

Earth mover’s distance Similarity search Approximate query processing Signature databases 

Notes

Acknowledgments

This work is funded by DFG grant SE 1039/7-1.

References

  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, David B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)Google Scholar
  2. 2.
    Assent, I., Kremer, H., Seidl, T.: Speeding up complex video copy detection queries. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5981, pp. 307–321. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Assent, I., Wenning, A., Seidl, T.: Approximation techniques for indexing the earth mover’s distance in multimedia databases. In: ICDE, p. 11 (2006)Google Scholar
  4. 4.
    Barrio, P., Gravano, L., Develder, C.: Ranking deep web text collections for scalable information extraction. In: CIKM, pp. 153–162 (2015)Google Scholar
  5. 5.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, vol. 23, no. 2, pp. 419–429 (1994)Google Scholar
  6. 6.
    Hillier, F., Lieberman, G.: Introduction to Linear Programming. McGraw-Hill, New York (1990)zbMATHGoogle Scholar
  7. 7.
    Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)Google Scholar
  8. 8.
    Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E.L., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: VLDB, pp. 215–226 (1996)Google Scholar
  9. 9.
    Kriegel, H.-P., Kröger, P., Kunath, P., Renz, M.: Generalizing the optimality of multi-step k-nearest neighbor query processing. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 75–92. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Lehmann, T., et al.: Content-based image retrieval in medical applications. Methods Inf. Med. 43(4), 354–361 (2004)Google Scholar
  11. 11.
    Lehmann, T., et al.: IRMA project site (2009). http://www.irma-project.org/datasets
  12. 12.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)Google Scholar
  13. 13.
    Redi, M., OHare, N., Schifanella, R., Trevisiol, M., Jaimes, A.: 6 seconds of sound and vision: Creativity in micro-videos. In: CVPR, pp. 4272–4279 (2014)Google Scholar
  14. 14.
    Rubner, Y., Tomasi, C., Guibas, L.: A metric for distributions with applications to image databases. In: ICCV, pp. 59–66 (1998)Google Scholar
  15. 15.
    Ruttenberg, B.E., Singh, A.K.: Indexing the earth mover’s distance using normal distributions. PVLDB 5(3), 205–216 (2011)Google Scholar
  16. 16.
    Seidl, T., Kriegel, H.: Optimal multi-step k-nearest neighbor search. In: SIGMOD, pp. 154–165 (1998)Google Scholar
  17. 17.
    Solomon, J., Rustamov, R., Guibas, L., Butscher, A.: Earth mover’s distances on discrete surfaces. ACM Trans. Graph. 33(4), 67:1–67:12 (2014)CrossRefGoogle Scholar
  18. 18.
    Tang, Y., Cai, L.H., Mamoulis, N., Cheng, R.: Earth mover’s distance based similarity search at scale. PVLDB 7(4), 313–324 (2013)Google Scholar
  19. 19.
    Uysal, M.S., Beecks, C., Sabinasz, D., Seidl, T.: FELICITY: A flexible video similarity search framework using the earth mover’s distance. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) Similarity Search and Applications. LNCS, vol. 9371, pp. 347–350. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  20. 20.
    Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient filter approximation using the Earth Mover’s Distance in very large multimedia databases with feature signatures. In: CIKM, pp. 979–988 (2014)Google Scholar
  21. 21.
    Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient similarity search in scientific databases with feature signatures. In: SSDBM, pp. 30:1–30:12 (2015)Google Scholar
  22. 22.
    Uysal, M.S., Beecks, C., Seidl, T.: On efficient content-based near-duplicate video detection. In: CBMI, pp. 1–6 (2015)Google Scholar
  23. 23.
    Uysal, M.S., et al.: Large-scale efficient and effective video similarity search. In: LSDS-IR@CIKM, pp. 3–8 (2015)Google Scholar
  24. 24.
    Wichterich, M., Assent, I., et al.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)Google Scholar
  25. 25.
    Xu, J., Zhang, Z., et al.: Efficient and effective similarity search over probabilistic data based on earth mover’s distance. PVLDB 3(1), 758–769 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Merih Seran Uysal
    • 1
    Email author
  • Daniel Sabinasz
    • 1
  • Thomas Seidl
    • 2
  1. 1.Data Management and Exploration GroupRWTH Aachen UniversityAachenGermany
  2. 2.Database Systems GroupLudwig-Maximilians-Universität (LMU) MunichMunichGermany

Personalised recommendations