Abstract
The Earth Mover’s Distance (EMD) is an effective distance-based similarity measure which determines the dissimilarity between data objects by the minimum amount of work required to transform one signature into another one. Although the EMD has been proven to reflect the human perceptual similarity very well in prevalent applications and domains, its high computational time complexity hinders its application to large-scale datasets where the user is rather interested in receiving an answer from the underlying application within a short period of time than requesting an exact and complete query result set. To this end, we propose to improve the efficiency of the query processing with the EMD on signature databases by utilizing signature compression approximations. We introduce an efficient signature compression algorithm to alleviate query computation cost. Furthermore, we theoretically explicate and analyze the approximation-based EMD and the relationship between the proposal and the original EMD. Moreover, our extensive experiments on 4 real world datasets point out the accuracy and efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, David B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)
Assent, I., Kremer, H., Seidl, T.: Speeding up complex video copy detection queries. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5981, pp. 307–321. Springer, Heidelberg (2010)
Assent, I., Wenning, A., Seidl, T.: Approximation techniques for indexing the earth mover’s distance in multimedia databases. In: ICDE, p. 11 (2006)
Barrio, P., Gravano, L., Develder, C.: Ranking deep web text collections for scalable information extraction. In: CIKM, pp. 153–162 (2015)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD, vol. 23, no. 2, pp. 419–429 (1994)
Hillier, F., Lieberman, G.: Introduction to Linear Programming. McGraw-Hill, New York (1990)
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E.L., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: VLDB, pp. 215–226 (1996)
Kriegel, H.-P., Kröger, P., Kunath, P., Renz, M.: Generalizing the optimality of multi-step k-nearest neighbor query processing. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 75–92. Springer, Heidelberg (2007)
Lehmann, T., et al.: Content-based image retrieval in medical applications. Methods Inf. Med. 43(4), 354–361 (2004)
Lehmann, T., et al.: IRMA project site (2009). http://www.irma-project.org/datasets
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)
Redi, M., OHare, N., Schifanella, R., Trevisiol, M., Jaimes, A.: 6 seconds of sound and vision: Creativity in micro-videos. In: CVPR, pp. 4272–4279 (2014)
Rubner, Y., Tomasi, C., Guibas, L.: A metric for distributions with applications to image databases. In: ICCV, pp. 59–66 (1998)
Ruttenberg, B.E., Singh, A.K.: Indexing the earth mover’s distance using normal distributions. PVLDB 5(3), 205–216 (2011)
Seidl, T., Kriegel, H.: Optimal multi-step k-nearest neighbor search. In: SIGMOD, pp. 154–165 (1998)
Solomon, J., Rustamov, R., Guibas, L., Butscher, A.: Earth mover’s distances on discrete surfaces. ACM Trans. Graph. 33(4), 67:1–67:12 (2014)
Tang, Y., Cai, L.H., Mamoulis, N., Cheng, R.: Earth mover’s distance based similarity search at scale. PVLDB 7(4), 313–324 (2013)
Uysal, M.S., Beecks, C., Sabinasz, D., Seidl, T.: FELICITY: A flexible video similarity search framework using the earth mover’s distance. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) Similarity Search and Applications. LNCS, vol. 9371, pp. 347–350. Springer, Heidelberg (2015)
Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient filter approximation using the Earth Mover’s Distance in very large multimedia databases with feature signatures. In: CIKM, pp. 979–988 (2014)
Uysal, M.S., Beecks, C., Schmücking, J., Seidl, T.: Efficient similarity search in scientific databases with feature signatures. In: SSDBM, pp. 30:1–30:12 (2015)
Uysal, M.S., Beecks, C., Seidl, T.: On efficient content-based near-duplicate video detection. In: CBMI, pp. 1–6 (2015)
Uysal, M.S., et al.: Large-scale efficient and effective video similarity search. In: LSDS-IR@CIKM, pp. 3–8 (2015)
Wichterich, M., Assent, I., et al.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)
Xu, J., Zhang, Z., et al.: Efficient and effective similarity search over probabilistic data based on earth mover’s distance. PVLDB 3(1), 758–769 (2010)
Acknowledgments
This work is funded by DFG grant SE 1039/7-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Uysal, M.S., Sabinasz, D., Seidl, T. (2016). Approximation-Based Efficient Query Processing with the Earth Mover’s Distance. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-32049-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32048-9
Online ISBN: 978-3-319-32049-6
eBook Packages: Computer ScienceComputer Science (R0)