Skip to main content

Approximation-Based Efficient Query Processing with the Earth Mover’s Distance

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9643))

Included in the following conference series:

Abstract

The Earth Mover’s Distance (EMD) is an effective distance-based similarity measure which determines the dissimilarity between data objects by the minimum amount of work required to transform one signature into another one. Although the EMD has been proven to reflect the human perceptual similarity very well in prevalent applications and domains, its high computational time complexity hinders its application to large-scale datasets where the user is rather interested in receiving an answer from the underlying application within a short period of time than requesting an exact and complete query result set. To this end, we propose to improve the efficiency of the query processing with the EMD on signature databases by utilizing signature compression approximations. We introduce an efficient signature compression algorithm to alleviate query computation cost. Furthermore, we theoretically explicate and analyze the approximation-based EMD and the relationship between the proposal and the original EMD. Moreover, our extensive experiments on 4 real world datasets point out the accuracy and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, David B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)

    Google Scholar 

  2. Assent, I., Kremer, H., Seidl, T.: Speeding up complex video copy detection queries. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5981, pp. 307–321. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Assent, I., Wenning, A., Seidl, T.: Approximation techniques for indexing the earth mover’s distance in multimedia databases. In: ICDE, p. 11 (2006)

    Google Scholar 

  4. Barrio, P., Gravano, L., Develder, C.: Ranking deep web text collections for scalable information extraction. In: CIKM, pp. 153–162 (2015)

    Google Scholar 

  5. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, vol. 23, no. 2, pp. 419–429 (1994)

    Google Scholar 

  6. Hillier, F., Lieberman, G.: Introduction to Linear Programming. McGraw-Hill, New York (1990)

    MATH  Google Scholar 

  7. Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)

    Google Scholar 

  8. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E.L., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: VLDB, pp. 215–226 (1996)

    Google Scholar 

  9. Kriegel, H.-P., Kröger, P., Kunath, P., Renz, M.: Generalizing the optimality of multi-step k-nearest neighbor query processing. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 75–92. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Lehmann, T., et al.: Content-based image retrieval in medical applications. Methods Inf. Med. 43(4), 354–361 (2004)

    Google Scholar 

  11. Lehmann, T., et al.: IRMA project site (2009). http://www.irma-project.org/datasets

  12. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)

    Google Scholar 

  13. Redi, M., OHare, N., Schifanella, R., Trevisiol, M., Jaimes, A.: 6 seconds of sound and vision: Creativity in micro-videos. In: CVPR, pp. 4272–4279 (2014)

    Google Scholar 

  14. Rubner, Y., Tomasi, C., Guibas, L.: A metric for distributions with applications to image databases. In: ICCV, pp. 59–66 (1998)

    Google Scholar 

  15. Ruttenberg, B.E., Singh, A.K.: Indexing the earth mover’s distance using normal distributions. PVLDB 5(3), 205–216 (2011)

    Google Scholar 

  16. Seidl, T., Kriegel, H.: Optimal multi-step k-nearest neighbor search. In: SIGMOD, pp. 154–165 (1998)

    Google Scholar 

  17. Solomon, J., Rustamov, R., Guibas, L., Butscher, A.: Earth mover’s distances on discrete surfaces. ACM Trans. Graph. 33(4), 67:1–67:12 (2014)

    Article  Google Scholar 

  18. Tang, Y., Cai, L.H., Mamoulis, N., Cheng, R.: Earth mover’s distance based similarity search at scale. PVLDB 7(4), 313–324 (2013)

    Google Scholar 

  19. Uysal, M.S., Beecks, C., Sabinasz, D., Seidl, T.: FELICITY: A flexible video similarity search framework using the earth mover’s distance. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) Similarity Search and Applications. LNCS, vol. 9371, pp. 347–350. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  20. Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient filter approximation using the Earth Mover’s Distance in very large multimedia databases with feature signatures. In: CIKM, pp. 979–988 (2014)

    Google Scholar 

  21. Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient similarity search in scientific databases with feature signatures. In: SSDBM, pp. 30:1–30:12 (2015)

    Google Scholar 

  22. Uysal, M.S., Beecks, C., Seidl, T.: On efficient content-based near-duplicate video detection. In: CBMI, pp. 1–6 (2015)

    Google Scholar 

  23. Uysal, M.S., et al.: Large-scale efficient and effective video similarity search. In: LSDS-IR@CIKM, pp. 3–8 (2015)

    Google Scholar 

  24. Wichterich, M., Assent, I., et al.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)

    Google Scholar 

  25. Xu, J., Zhang, Z., et al.: Efficient and effective similarity search over probabilistic data based on earth mover’s distance. PVLDB 3(1), 758–769 (2010)

    Google Scholar 

Download references

Acknowledgments

This work is funded by DFG grant SE 1039/7-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Merih Seran Uysal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Uysal, M.S., Sabinasz, D., Seidl, T. (2016). Approximation-Based Efficient Query Processing with the Earth Mover’s Distance. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32049-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32048-9

  • Online ISBN: 978-3-319-32049-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics