Skip to main content

H-Tree: A Hybrid Structure for Confidence Computation in Probabilistic Databases

  • Conference paper
Web Technologies and Applications (APWeb 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

  • 2122 Accesses

Abstract

Probabilistic database has become a popular tool for uncertain data management. Most work in the area is focused on efficient query processing and has two main directions, accurate or approximate evaluation. In recent work for conjunctive query without self-joins on a tuple-independent probabilistic database, query evaluation is equivalent to computing marginal probabilities of boolean formulas associated with query results. If formulas can be factorized into a read-once form where every variable appears at most once, confidence computation is reduced to a tractable problem that can be evaluated in linear time. Otherwise, it is regarded as a NP-hard problem and need to be evaluated approximately. In this paper, we propose a framework that evaluates both tractable and NP-hard conjunctive queries efficiently. First, we develop a novel structure H-tree, where boolean formulas are decomposed to small partitions which are either read-once or NP-hard. Then we propose algorithms for building H-tree and parallelizing (approximate) confidence computation. We also propose fundamental theorems to ensure the correctness of our approaches. Performance experiments demonstrate the benefits of H-tree, especially for approximate confidence evaluation on NP-hard queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB 2006, Seoul, Korea (2006)

    Google Scholar 

  2. Dalvi, N., Suciu, D.: Efficient Query Evaluation on Probabilistic Databases. In: VLDB 2004, Toronto, Canada (2004)

    Google Scholar 

  3. Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and Simple Relational Processing of Uncertain Data. In: ICDE 2008, Cancún, México (2008)

    Google Scholar 

  4. Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: SIGMOD 2005, Baltimore, Maryland, USA (2005)

    Google Scholar 

  5. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE 2007, Istanbul, Turkey (2007)

    Google Scholar 

  6. Koch, C.: Approximating predicates and expressive queries on probabilistic databases. In: PODS 2008, Vancouver, BC, Canada (2008)

    Google Scholar 

  7. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C.M., Haas, P.J.: MCDB: A monte carlo approach to managing uncertain data. In: SIGMOD 2008, Vancouver, BC, Canada (2008)

    Google Scholar 

  8. Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: ICDE 2010, Long Beach, California, USA (2010)

    Google Scholar 

  9. Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT 2011, Uppsala, Sweden (2011)

    Google Scholar 

  10. Dalvi, N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: PODS 2007, Beijing, China (2007)

    Google Scholar 

  11. Ré, C., Suciu, D.: Materialized views in probabilistic databases: for information exchange and query optimization. In: VLDB 2007, Vienna, Austria (2007)

    Google Scholar 

  12. Olteanu, D., Huang, J., Koch, C.: SPROUT: Lazy vs. eager query plans for tuple-independent probabilistic databases. In: ICDE 2009, Shanghai, China (2009)

    Google Scholar 

  13. Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. Proceedings of the VLDB Endowment 3(1-2), 1068–1079 (2010)

    Google Scholar 

  14. Roy, S., Perduca, V., Tannen, V.: Faster query answering in probabilistic databases using read-once functions. In: ICDT 2011, Uppsala, Sweden (2011)

    Google Scholar 

  15. Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for rubost query evaluation in probabilistic databases. In: SIGMOD 2011, Athens, Greece (2011)

    Google Scholar 

  16. Sen, P., Deshpande, A., Getoor, L.: Prdb: managing and exploiting rich correlations in probabilistic databases. Proceedings of the VLDB Endowment 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  17. Golumbic, M.C., Mintz, A., Rotics, U.: Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees. Discrete Applied Mathematics 154(10), 1465–1477 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Q., Qin, B., Wang, S. (2012). H-Tree: A Hybrid Structure for Confidence Computation in Probabilistic Databases. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics