Abstract
An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the new set contains pairs of objects of the two original sets. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. In this paper, we propose to express the similarity between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating these probabilistic distance functions directly into the join algorithms the full information provided by these functions is exploited. The resulting probabilistic similarity join assigns to each object pair a probability value indicating the likelihood that the object pair belongs to the result set. As the computation of these probability values is very expensive, we introduce an efficient join processing strategy exemplarily for the distance-range join. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic similarity join. The experiments show that we can achieve high quality join results with rather low computational cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3D shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–228. Springer, Heidelberg (1999)
Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.-P.: High Performance Clustering Based on the Similarity Join. In: CIKM 2000 (2000)
Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins Using R-trees. In: SIGMOD 1993 (1993)
van den Bercken, J., Seeger, B., Widmayer, P.: A General Approach to Bulk Loading Multidimensional Index Structures. In: VLDB 1997 (1997)
Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., Tasumi, M.: The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures. Journal of Molecular Biology 112 (1977)
Bracewell, R.: The Impulse Symbol. Ch. 5 in The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill, New York (1999)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD 2003 (2003)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering (2004)
Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)
Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984 (1984)
Huang, Y.-W., Jing, N., Rundensteiner, E.A.: Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations. In: VLDB 1997
Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: PKDD 2004
Kamel I., Faloutsos C.: Hilbert R-tree: AnImproved R-tree using Fractals. In: VLDB 1994 (1994)
Koudas, N., Sevcik, K.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. In: ICDE 1998 (1998)
Koudas, N., Sevcik, K.: Size Separation Spatial Join. In: SIGMOD 1997 (1997)
Kriegel, H.-P., Brecheisen, S., Kröger, P., Pfeifle, M., Schubert, M.: Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. In: SIGMOD 2003 (2003)
Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Approximated Clustering of Distributed High-Dimensional Data. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 432–441. Springer, Heidelberg (2005)
Lo, M.-L., Ravishankar, C.V.: Spatial Joins UsingSeeded Trees. In: SIGMOD 1994 (1994)
Lo, M.-L., Ravishankar, C.V.: Spatial Hash Joins. In: SIGMOD 1996 (1996)
McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., vol. 1 (1967)
Motro, A.: Management of Uncertainty in Database Systems. In: Kim, W. (ed.) Modern Database Systems, Addison Wesley, Reading (1995)
Patel, J.M., DeWitt, D.J.: Partition Based Spatial-Merge Join. In: SIGMOD 1996 (1996)
Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD 1998 (1998)
Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: ICDE 1997 (1997)
Wolfson, O., Sistla, A.P., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3) (1999)
Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004, pp. 443–454 (2004)
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A literature survey. ACM Computational Survey 35(4) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kriegel, HP., Kunath, P., Pfeifle, M., Renz, M. (2006). Probabilistic Similarity Join on Uncertain Data. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_22
Download citation
DOI: https://doi.org/10.1007/11733836_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)