Skip to main content

Similarity Joins and Beyond: An Extended Set of Binary Operators with Order

  • Conference paper
  • First Online:
Book cover Similarity Search and Applications (SISAP 2015)

Abstract

Similarity joins are troublesome database operators that often produce results much larger than the user really needs or expects. In order to return the similar elements, similarity joins also require sorting during the retrieval process, although order is a concept not supported in the relational model. This paper proposes a solution to solve those two issues extending the similarity join concept to a broader set of binary operators, which aims at retrieving the most similar pairs and embedding the sorting operation only as an internal processing step, so as to comply with the relational theory. Additionally, our extension allows to explore another useful condition not previously considered in the similarity retrieval: the negation of predicates. Experiments performed on real and synthetic data show that our operators are fast enough to be used in real applications and scale well both for multidimensional and non-dimensional metric data.

The authors are grateful to FAPESP, CNPQ, CAPES and Rescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3) for their financial support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the kdd process. Knowledge and Information Systems 6(6), 728–749 (2004)

    Article  Google Scholar 

  2. Carvalho, L.O., Oliveira, W.D., Pola, I.R.V., Traina, A.J.M., Traina Jr, C.: A ‘wider’ concept for similarity joins. Journal of Information and Data Management 5(3), 210–223 (2014)

    Google Scholar 

  3. Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proc. 22nd Int. Conf. on Data Engineering, p. 12 (2006)

    Google Scholar 

  4. Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified framework for answering k closest pairs queries and variants. IEEE Trans. on Knowledge and Data Engineering 26(11), 2610–2624 (2014)

    Article  Google Scholar 

  5. Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using ed-index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Fredriksson, K., Braithwaite, B.: Quicker range- and k-NN joins in metric spaces. Information Systems 52, 189–204 (2014). doi:10.1016/j.is.2014.09.006

    Article  Google Scholar 

  7. Gao, Y., Chen, L., Li, X., Yao, B., Chen, G.: Efficient k-closest pair queries in general metric spaces. The VLDB Journal 24(3), 415–439 (2015)

    Article  Google Scholar 

  8. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database systems: the complete book. Pearson (2009)

    Google Scholar 

  9. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. Computing Surveys 40(4), 395–420 (2008)

    Article  Google Scholar 

  10. Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. on Database Systems 33(2), 7:1–7:38 (2008)

    Article  Google Scholar 

  11. Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. Journal of Discrete Algorithms 7(1), 18–35 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 106–112. Springer, Heidelberg (2014)

    Google Scholar 

  13. Searcóid, M.Ó.: Metric spaces. Springer (2007)

    Google Scholar 

  14. Silva, Y.N., Aref, W.G., Larson, P.A., Pearson, S., Ali, M.H.: Similarity queries: their conceptual evaluation, transformations, and processing. The VLDB Journal 22(3), 395–420 (2013)

    Article  Google Scholar 

  15. Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. on Database Systems 36(3), 15:1–15:41 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luiz Olmes Carvalho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Carvalho, L.O., Santos, L.F.D., Oliveira, W.D., Traina, A.J.M., Traina, C. (2015). Similarity Joins and Beyond: An Extended Set of Binary Operators with Order. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25087-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25086-1

  • Online ISBN: 978-3-319-25087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics