Probabilistic Similarity Join on Uncertain Data

  • Hans-Peter Kriegel
  • Peter Kunath
  • Martin Pfeifle
  • Matthias Renz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3882)


An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the new set contains pairs of objects of the two original sets. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. In this paper, we propose to express the similarity between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating these probabilistic distance functions directly into the join algorithms the full information provided by these functions is exploited. The resulting probabilistic similarity join assigns to each object pair a probability value indicating the likelihood that the object pair belongs to the result set. As the computation of these probability values is very expensive, we introduce an efficient join processing strategy exemplarily for the distance-range join. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic similarity join. The experiments show that we can achieve high quality join results with rather low computational cost.


Feature Vector Protein Data Bank Query Processing Index Structure Object Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3D shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–228. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  3. 3.
    Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.-P.: High Performance Clustering Based on the Similarity Join. In: CIKM 2000 (2000)Google Scholar
  4. 4.
    Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins Using R-trees. In: SIGMOD 1993 (1993)Google Scholar
  5. 5.
    van den Bercken, J., Seeger, B., Widmayer, P.: A General Approach to Bulk Loading Multidimensional Index Structures. In: VLDB 1997 (1997)Google Scholar
  6. 6.
    Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., Tasumi, M.: The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures. Journal of Molecular Biology 112 (1977)Google Scholar
  7. 7.
    Bracewell, R.: The Impulse Symbol. Ch. 5 in The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill, New York (1999)Google Scholar
  8. 8.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD 2003 (2003)Google Scholar
  9. 9.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering (2004)Google Scholar
  10. 10.
    Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984 (1984)Google Scholar
  12. 12.
    Huang, Y.-W., Jing, N., Rundensteiner, E.A.: Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations. In: VLDB 1997Google Scholar
  13. 13.
    Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: PKDD 2004Google Scholar
  14. 14.
    Kamel I., Faloutsos C.: Hilbert R-tree: AnImproved R-tree using Fractals. In: VLDB 1994 (1994)Google Scholar
  15. 15.
    Koudas, N., Sevcik, K.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. In: ICDE 1998 (1998)Google Scholar
  16. 16.
    Koudas, N., Sevcik, K.: Size Separation Spatial Join. In: SIGMOD 1997 (1997)Google Scholar
  17. 17.
    Kriegel, H.-P., Brecheisen, S., Kröger, P., Pfeifle, M., Schubert, M.: Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. In: SIGMOD 2003 (2003)Google Scholar
  18. 18.
    Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Approximated Clustering of Distributed High-Dimensional Data. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 432–441. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Lo, M.-L., Ravishankar, C.V.: Spatial Joins UsingSeeded Trees. In: SIGMOD 1994 (1994)Google Scholar
  20. 20.
    Lo, M.-L., Ravishankar, C.V.: Spatial Hash Joins. In: SIGMOD 1996 (1996)Google Scholar
  21. 21.
    McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., vol. 1 (1967)Google Scholar
  22. 22.
    Motro, A.: Management of Uncertainty in Database Systems. In: Kim, W. (ed.) Modern Database Systems, Addison Wesley, Reading (1995)Google Scholar
  23. 23.
    Patel, J.M., DeWitt, D.J.: Partition Based Spatial-Merge Join. In: SIGMOD 1996 (1996)Google Scholar
  24. 24.
    Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD 1998 (1998)Google Scholar
  25. 25.
    Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: ICDE 1997 (1997)Google Scholar
  26. 26.
    Wolfson, O., Sistla, A.P., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3) (1999)Google Scholar
  27. 27.
    Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004, pp. 443–454 (2004)Google Scholar
  28. 28.
    Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A literature survey. ACM Computational Survey 35(4) (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hans-Peter Kriegel
    • 1
  • Peter Kunath
    • 1
  • Martin Pfeifle
    • 1
  • Matthias Renz
    • 1
  1. 1.University of MunichGermany

Personalised recommendations