Probabilistic Similarity Join on Uncertain Data

Kriegel, Hans-Peter; Kunath, Peter; Pfeifle, Martin; Renz, Matthias

doi:10.1007/11733836_22

Hans-Peter Kriegel¹⁹,
Peter Kunath¹⁹,
Martin Pfeifle¹⁹ &
…
Matthias Renz¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1152 Accesses
66 Citations

Abstract

An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the new set contains pairs of objects of the two original sets. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. In this paper, we propose to express the similarity between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating these probabilistic distance functions directly into the join algorithms the full information provided by these functions is exploited. The resulting probabilistic similarity join assigns to each object pair a probability value indicating the likelihood that the object pair belongs to the result set. As the computation of these probability values is very expensive, we introduce an efficient join processing strategy exemplarily for the distance-range join. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic similarity join. The experiments show that we can achieve high quality join results with rather low computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
MATH Google Scholar
Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3D shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–228. Springer, Heidelberg (1999)
Chapter Google Scholar
Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.-P.: High Performance Clustering Based on the Similarity Join. In: CIKM 2000 (2000)
Google Scholar
Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins Using R-trees. In: SIGMOD 1993 (1993)
Google Scholar
van den Bercken, J., Seeger, B., Widmayer, P.: A General Approach to Bulk Loading Multidimensional Index Structures. In: VLDB 1997 (1997)
Google Scholar
Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanovichi, T., Tasumi, M.: The Protein Data Bank: a Computer-based Archival File for Macromolecular Structures. Journal of Molecular Biology 112 (1977)
Google Scholar
Bracewell, R.: The Impulse Symbol. Ch. 5 in The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill, New York (1999)
Google Scholar
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD 2003 (2003)
Google Scholar
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering (2004)
Google Scholar
Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic Spatial Queries on Existentially Uncertain Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 400–417. Springer, Heidelberg (2005)
Chapter Google Scholar
Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984 (1984)
Google Scholar
Huang, Y.-W., Jing, N., Rundensteiner, E.A.: Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations. In: VLDB 1997
Google Scholar
Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: PKDD 2004
Google Scholar
Kamel I., Faloutsos C.: Hilbert R-tree: AnImproved R-tree using Fractals. In: VLDB 1994 (1994)
Google Scholar
Koudas, N., Sevcik, K.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. In: ICDE 1998 (1998)
Google Scholar
Koudas, N., Sevcik, K.: Size Separation Spatial Join. In: SIGMOD 1997 (1997)
Google Scholar
Kriegel, H.-P., Brecheisen, S., Kröger, P., Pfeifle, M., Schubert, M.: Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. In: SIGMOD 2003 (2003)
Google Scholar
Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Approximated Clustering of Distributed High-Dimensional Data. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 432–441. Springer, Heidelberg (2005)
Chapter Google Scholar
Lo, M.-L., Ravishankar, C.V.: Spatial Joins UsingSeeded Trees. In: SIGMOD 1994 (1994)
Google Scholar
Lo, M.-L., Ravishankar, C.V.: Spatial Hash Joins. In: SIGMOD 1996 (1996)
Google Scholar
McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., vol. 1 (1967)
Google Scholar
Motro, A.: Management of Uncertainty in Database Systems. In: Kim, W. (ed.) Modern Database Systems, Addison Wesley, Reading (1995)
Google Scholar
Patel, J.M., DeWitt, D.J.: Partition Based Spatial-Merge Join. In: SIGMOD 1996 (1996)
Google Scholar
Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD 1998 (1998)
Google Scholar
Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: ICDE 1997 (1997)
Google Scholar
Wolfson, O., Sistla, A.P., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3) (1999)
Google Scholar
Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004, pp. 443–454 (2004)
Google Scholar
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A literature survey. ACM Computational Survey 35(4) (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Munich, Germany
Hans-Peter Kriegel, Peter Kunath, Martin Pfeifle & Matthias Renz

Authors

Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kunath
View author publications
You can also search for this author in PubMed Google Scholar
Martin Pfeifle
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Renz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, Singapore
Mong Li Lee
School of Computing, National University of Singapore, Singapore
Kian-Lee Tan
School of Engineering and Technology, Asian Institute of Technology, P.O. Box 4, 12120, Klong Luang, Pathum Thani, Thailand
Vilas Wuwongse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kriegel, HP., Kunath, P., Pfeifle, M., Renz, M. (2006). Probabilistic Similarity Join on Uncertain Data. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_22

Download citation

DOI: https://doi.org/10.1007/11733836_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics