Diversity in Similarity Joins

  • Lucio F. D. SantosEmail author
  • Luiz Olmes Carvalho
  • Willian D. Oliveira
  • Agma J. M. Traina
  • Caetano TrainaJr.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9371)


With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.


Similarity joins Result diversification Query processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Böhm, C., Braunmüller, B., Krebs, F., Kriegel, H.P.: Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data. In: ACM SIGMOD Record, vol. 30(2), pp. 379–388 (2001)Google Scholar
  2. 2.
    Boim, R., Milo, T., Novgorodov, S.: Diversification and refinement in collaborative filtering recommender. In: Proc. 20th CIKM, pp. 739–744 (2011)Google Scholar
  3. 3.
    Dittrich, J.P., Seeger, B.: Gess: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In: Proc. 7th ACM SIGKDD, pp. 47–56 (2001)Google Scholar
  4. 4.
    Dohnal, V., Gennaro, C., Zezula, P.: Similarity join in metric spaces using eD-index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  5. 5.
    Dou, Z., Hu, S., Chen, K., Song, R., Wen, J.: Multi-dimensional search result diversification. In: Proc. 4th WSDM, pp. 475–484 (2011)Google Scholar
  6. 6.
    Drosou, M., Pitoura, E.: Disc diversity: result diversification based on dissimilarity and coverage. Proc. VLDB Endowment 6(1), 13–24 (2012)CrossRefGoogle Scholar
  7. 7.
    Fredriksson, K., Braithwaite, B.: Quicker range- and k-NN joins in metric spaces. Information Systems 52, 189–204 (2015)Google Scholar
  8. 8.
    Jacox, E.H., Samet, H.: Metric space similarity joins. ACM TODS 33(2), 7:1–7:38 (2008)CrossRefGoogle Scholar
  9. 9.
    Kalashnikov, D.V.: Super-ego: fast multidimensional similarity join. The VLDB Journal 22(4), 395–420 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. Journal of Discrete Algorithms 7(1), 18–35 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 106–112. Springer, Heidelberg (2014) Google Scholar
  12. 12.
    Santos, L.F.D., Oliveira, W.D., Ferreira, M.R.P., Traina, A.J.M., Traina Jr., C.: Parameter-free and domain-independent similarity search with diversity. In: Proc. 25th SSDBM, pp. 5:1–5:12 (2013)Google Scholar
  13. 13.
    Silva, Y.N., Aref, W.G., Larson, P.A., Pearson, S., Ali, M.H.: Similarity queries: their conceptual evaluation, transformations, and processing. The VLDB Journal 22(3), 395–420 (2013)CrossRefGoogle Scholar
  14. 14.
    Skopal, T., Dohnal, V., Batko, M., Zezula, P.: Distinct nearest neighbors queries for similarity search in very large multimedia databases. In: Proc. 11th WIDM, pp. 11–14 (2009)Google Scholar
  15. 15.
    Stricker, M., Orengo, M.: Similarity of color images. In: Proc. 3rd SPIE, pp. 381–392 (1995)Google Scholar
  16. 16.
    Van Leuken, R.H., Garcia, L., Olivares, X., Van Zwol, R.: Visual diversification of image search results. In: Proc. 18th Int. Conf. on WWW, pp. 341–350 (2009)Google Scholar
  17. 17.
    Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina Jr., C., Tsotras, V.J.: On query result diversification. In: Proc. 27th ICDE, pp. 1163–1174 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Lucio F. D. Santos
    • 1
    Email author
  • Luiz Olmes Carvalho
    • 1
  • Willian D. Oliveira
    • 1
  • Agma J. M. Traina
    • 1
  • Caetano TrainaJr.
    • 1
  1. 1.Institute of Mathematics and Computer SciencesUniversity of São PauloSão CarlosBrazil

Personalised recommendations