Similarity Joins and Beyond: An Extended Set of Binary Operators with Order
Similarity joins are troublesome database operators that often produce results much larger than the user really needs or expects. In order to return the similar elements, similarity joins also require sorting during the retrieval process, although order is a concept not supported in the relational model. This paper proposes a solution to solve those two issues extending the similarity join concept to a broader set of binary operators, which aims at retrieving the most similar pairs and embedding the sorting operation only as an internal processing step, so as to comply with the relational theory. Additionally, our extension allows to explore another useful condition not previously considered in the similarity retrieval: the negation of predicates. Experiments performed on real and synthetic data show that our operators are fast enough to be used in real applications and scale well both for multidimensional and non-dimensional metric data.
KeywordsSimilarity search Similarity joins Query operators
Unable to display preview. Download preview PDF.
- 2.Carvalho, L.O., Oliveira, W.D., Pola, I.R.V., Traina, A.J.M., Traina Jr, C.: A ‘wider’ concept for similarity joins. Journal of Information and Data Management 5(3), 210–223 (2014)Google Scholar
- 3.Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proc. 22nd Int. Conf. on Data Engineering, p. 12 (2006)Google Scholar
- 8.Garcia-Molina, H., Ullman, J.D., Widom, J.: Database systems: the complete book. Pearson (2009)Google Scholar
- 12.Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 106–112. Springer, Heidelberg (2014) Google Scholar
- 13.Searcóid, M.Ó.: Metric spaces. Springer (2007)Google Scholar