Advertisement

Similarity Joins and Beyond: An Extended Set of Binary Operators with Order

  • Luiz Olmes Carvalho
  • Lucio F. D. Santos
  • Willian D. Oliveira
  • Agma Juci Machado Traina
  • Caetano TrainaJr.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9371)

Abstract

Similarity joins are troublesome database operators that often produce results much larger than the user really needs or expects. In order to return the similar elements, similarity joins also require sorting during the retrieval process, although order is a concept not supported in the relational model. This paper proposes a solution to solve those two issues extending the similarity join concept to a broader set of binary operators, which aims at retrieving the most similar pairs and embedding the sorting operation only as an internal processing step, so as to comply with the relational theory. Additionally, our extension allows to explore another useful condition not previously considered in the similarity retrieval: the negation of predicates. Experiments performed on real and synthetic data show that our operators are fast enough to be used in real applications and scale well both for multidimensional and non-dimensional metric data.

Keywords

Similarity search Similarity joins Query operators 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the kdd process. Knowledge and Information Systems 6(6), 728–749 (2004)CrossRefGoogle Scholar
  2. 2.
    Carvalho, L.O., Oliveira, W.D., Pola, I.R.V., Traina, A.J.M., Traina Jr, C.: A ‘wider’ concept for similarity joins. Journal of Information and Data Management 5(3), 210–223 (2014)Google Scholar
  3. 3.
    Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proc. 22nd Int. Conf. on Data Engineering, p. 12 (2006)Google Scholar
  4. 4.
    Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified framework for answering k closest pairs queries and variants. IEEE Trans. on Knowledge and Data Engineering 26(11), 2610–2624 (2014)CrossRefGoogle Scholar
  5. 5.
    Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using ed-index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  6. 6.
    Fredriksson, K., Braithwaite, B.: Quicker range- and k-NN joins in metric spaces. Information Systems 52, 189–204 (2014). doi: 10.1016/j.is.2014.09.006 CrossRefGoogle Scholar
  7. 7.
    Gao, Y., Chen, L., Li, X., Yao, B., Chen, G.: Efficient k-closest pair queries in general metric spaces. The VLDB Journal 24(3), 415–439 (2015)CrossRefGoogle Scholar
  8. 8.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database systems: the complete book. Pearson (2009)Google Scholar
  9. 9.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. Computing Surveys 40(4), 395–420 (2008)CrossRefGoogle Scholar
  10. 10.
    Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. on Database Systems 33(2), 7:1–7:38 (2008)CrossRefGoogle Scholar
  11. 11.
    Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. Journal of Discrete Algorithms 7(1), 18–35 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 106–112. Springer, Heidelberg (2014) Google Scholar
  13. 13.
    Searcóid, M.Ó.: Metric spaces. Springer (2007)Google Scholar
  14. 14.
    Silva, Y.N., Aref, W.G., Larson, P.A., Pearson, S., Ali, M.H.: Similarity queries: their conceptual evaluation, transformations, and processing. The VLDB Journal 22(3), 395–420 (2013)CrossRefGoogle Scholar
  15. 15.
    Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. on Database Systems 36(3), 15:1–15:41 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Luiz Olmes Carvalho
    • 1
  • Lucio F. D. Santos
    • 1
  • Willian D. Oliveira
    • 1
  • Agma Juci Machado Traina
    • 1
  • Caetano TrainaJr.
    • 1
  1. 1.Institute of Mathematics and Computer SciencesUniversity of São PauloSão Carlos - SPBrazil

Personalised recommendations