Skip to main content

Top-k Similarity Join over Multi-valued Objects

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7238))

Included in the following conference series:

  • 1630 Accesses

Abstract

The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects \(\mathcal U\) and \(\mathcal V\), a top-k similarity join returns k pairs of most similar objects from \(\mathcal U \times \mathcal V\). In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join queries over multi-valued objects. We apply quantile based distance to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE 2001 (2001)

    Google Scholar 

  2. Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD 1993 (1993)

    Google Scholar 

  3. Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-k pairs in multidimensional space. In: ICDE 2011 (2011)

    Google Scholar 

  4. Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: CIKM 2006 (2006)

    Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. Medians and order statistics, 2nd edn., ch. 9. The MIT Press

    Google Scholar 

  6. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD 2000 (2000)

    Google Scholar 

  7. Elmasri, R., Navathe, S.: Fundamentals of database systems, 6th edn. (2011)

    Google Scholar 

  8. Han, W.-S., Kim, J., Lee, B.S., Tao, Y., Rantzau, R., Markl, V.: Cost-based predictive spatiotemporal join. In: TKDE 2009 (2009)

    Google Scholar 

  9. Hjaltason, G., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD 1998 (1998)

    Google Scholar 

  10. Huang, Y.-W., Ning, J., Rundensteiner, E.A.: Spatial joins using r-trees: Breadth-first traversal with global optimizations. In: VLDB 1997 (1997)

    Google Scholar 

  11. Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic Similarity Search on Uncertain Data. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 295–309. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Lee, M.-J., Whang, K.-Y., Han, W.-S., Transform-space, S.I.-Y.: view: Performing spatial join in the transform space using original-space indexes. In: TKDE 2006 (2006)

    Google Scholar 

  13. Lin, X., Zhang, Y., Zhang, W., Cheema, M.A.: Stochastic skyline operator. In: ICDE 2011 (2011)

    Google Scholar 

  14. Ljosa, V., Singh, A.K.: Top-k spatial join of probabilistic objects. In: ICDE 2008 (2008)

    Google Scholar 

  15. Meester, R.: A Natural Introduction to Probability Theory (2004)

    Google Scholar 

  16. Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP Operations in Spatial Data Warehouses. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD 2001. LNCS, vol. 2121, pp. 443–459. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  17. Rigaux, P., Scholl, M., Voisard, A.: Spatial databases: With applications to gis (2001)

    Google Scholar 

  18. Sankaranarayanan, J., Alborzi, H., Samet, H.: Distance join queries on spatial networks. In: GIS 2006 (2006)

    Google Scholar 

  19. Yiu, M.L., Mamoulis, N., Tao, Y.: Efficient Quantile Retrieval on Multi-Dimensional Data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 167–185. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Zhang, R., Lin, D., Ramamohanarao, K., Bertino, E.: Continuous intersection joins over moving objects. In: ICDE 2008 (2008)

    Google Scholar 

  21. Zhang, W., Lin, X., Cheema, M.A., Zhang, Y., Wang, W.: Quantile-based knn over multi-valued objects. In: ICDE 2010 (2010)

    Google Scholar 

  22. Zheng, K., Fung, P., Zhou, X.: K nearest neighbor search for fuzzy objects. In: SIGMOD 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, W., Xu, J., Liang, X., Zhang, Y., Lin, X. (2012). Top-k Similarity Join over Multi-valued Objects. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29038-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29038-1_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29037-4

  • Online ISBN: 978-3-642-29038-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics