World Wide Web

, Volume 17, Issue 3, pp 285–309 | Cite as

Efficient top-k similarity join processing over multi-valued objects

  • Wenjie ZhangEmail author
  • Liming Zhan
  • Ying Zhang
  • Muhammad Aamir Cheema
  • Xuemin Lin


The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects \(\mathcal U\) and \(\mathcal V\), a top-k similarity join returns k pairs of most similar objects from \(\mathcal U \times \mathcal V\). In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, ϕ-quantile distance and ϕ-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.


Query processing Joins Multi-valued objects 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)Google Scholar
  2. 2.
    Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)Google Scholar
  3. 3.
    Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-k pairs in multidimensional space. In: ICDE (2011)Google Scholar
  4. 4.
    Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: CIKM (2006)Google Scholar
  5. 5.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn., chapter 9: medians and order statistics. In: The MIT Press (2009)Google Scholar
  6. 6.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD (2000)Google Scholar
  7. 7.
    Elmasri, R., Navathe, S.: Fundamentals of database systems, 6th edn. (2011)Google Scholar
  8. 8.
    Guntzer, M.M., Jungnickel, D.: Approximate minimization algorithms for the 0/1 knapsack and subset-sum problem. In: Operations Research Letters (2000)Google Scholar
  9. 9.
    Han, W.S., Kim, J., Lee, B.S., Tao, Y., Rantzau, R., Markl, V.: Cost-based predictive spatiotemporal join. In: TKDE (2009)Google Scholar
  10. 10.
    Hjaltason, G., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD (1998)Google Scholar
  11. 11.
    Huang, Y.W., Ning, J., Rundensteiner, E.A.: Spatial joins using r-trees: breadth-first traversal with global optimizations. In: VLDB (1997)Google Scholar
  12. 12.
    Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. In: TKDE (1996)Google Scholar
  13. 13.
    Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity search on uncertain data. In: DASFAA (2006)Google Scholar
  14. 14.
    Lee, M.J., Whang, K.Y., Han, W.S., I.-Y, S.: Transform-space view: performing spatial join in the transform space using original-space indexes. In: TKDE (2006)Google Scholar
  15. 15.
    Lin, X., Zhang, Y., Zhang, W., Cheema, M.A.: Stochastic skyline operator. In: ICDE (2011)Google Scholar
  16. 16.
    Ljosa, V., Singh, A.K.: Top-k spatial join of probabilistic objects. In: ICDE (2008)Google Scholar
  17. 17.
    Mamoulis, N., Papadias, D.: Multiway spatial joins. In: TODS (2001)Google Scholar
  18. 18.
    Meester, R.: A natural introduction to probability theory. Springer (2008)Google Scholar
  19. 19.
    Musial, K., Budka, M., Juszczyszyn, K.: Creation and growth of online social network. In: World Wide Web Journal (2012)Google Scholar
  20. 20.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP operations in spatial data warehouses. In: SSTD (2001)Google Scholar
  21. 21.
    Rigaux, P., Scholl, M., Voisard, A.: Spatial databases: with applications to gis. Morgan Kaufmann (2002)Google Scholar
  22. 22.
    Sankaranarayanan, J., Alborzi, H., Samet, H.: Distance join queries on spatial networks. In: GIS (2006)Google Scholar
  23. 23.
    Shen, Z., Cheema, M.A., Lin, X., Zhang, W., Wang, H.: Efficiently monitoring top-k pairs over sliding windows. In: ICDE, pp. 798–809 (2012)Google Scholar
  24. 24.
    Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. In: World Wide Web Journal (2009)Google Scholar
  25. 25.
    Yiu, M.L., Mamoulis, N., Tao, Y.: Efficient quantile retrieval on multi-dimensional data. In: EDBT (2006)Google Scholar
  26. 26.
    Zhang, W., Lin, X., Cheema, M.A., Zhang, Y., Wang, W.: Quantile-based knn over multi-valued objects. In: ICDE (2010)Google Scholar
  27. 27.
    Zhang, R., Lin, D., Ramamohanarao, K., Bertino, E.: Continuous intersection joins over moving objects. In: ICDE (2008)Google Scholar
  28. 28.
    Zheng, K., Fung, P., Zhou, X.: K nearest neighbor search for fuzzy objects. In: SIGMOD (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Wenjie Zhang
    • 1
    Email author
  • Liming Zhan
    • 1
  • Ying Zhang
    • 1
  • Muhammad Aamir Cheema
    • 1
  • Xuemin Lin
    • 1
  1. 1.School of Computer Science & EngineeringUniversity of New South WalesSydneyAustralia

Personalised recommendations