Journal of Intelligent Information Systems

, Volume 25, Issue 3, pp 293–332 | Cite as

A Framework for Management of Semistructured Probabilistic Data

Article

Abstract

This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.

probabilistic databases query algebras data models semistructured data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.Google Scholar
  2. Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.Google Scholar
  3. Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.Google Scholar
  4. Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.Google Scholar
  5. Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.Google Scholar
  6. Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.Google Scholar
  7. Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).Google Scholar
  8. Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).Google Scholar
  9. Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).Google Scholar
  10. Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).Google Scholar
  11. de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.MathSciNetGoogle Scholar
  12. de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.MathSciNetGoogle Scholar
  13. Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).Google Scholar
  14. Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).Google Scholar
  15. Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.CrossRefMathSciNetGoogle Scholar
  16. Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.CrossRefMathSciNetGoogle Scholar
  17. Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).Google Scholar
  18. Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).Google Scholar
  19. Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.CrossRefGoogle Scholar
  20. Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.CrossRefGoogle Scholar
  21. Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.CrossRefGoogle Scholar
  22. Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.CrossRefGoogle Scholar
  23. Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.Google Scholar
  24. Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.Google Scholar
  25. Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.Google Scholar
  26. Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.Google Scholar
  27. Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.CrossRefMathSciNetMATHGoogle Scholar
  28. Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.CrossRefMathSciNetMATHGoogle Scholar
  29. Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.Google Scholar
  30. Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.Google Scholar
  31. Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.Google Scholar
  32. Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.Google Scholar
  33. Kanne, C.-Ch. and Moerkotte, G. (2000). Efficient storage of XML data. In Proc., ICDE (pp. 198).Google Scholar
  34. Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.CrossRefGoogle Scholar
  35. Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.CrossRefGoogle Scholar
  36. Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.CrossRefGoogle Scholar
  37. Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.CrossRefGoogle Scholar
  38. Ng, R. and Subrahmanian, V.S. (1993). Probabilistic Logic Programming. Information and Computation, 101(2), 150–201.MathSciNetGoogle Scholar
  39. Nierman, A. and Jagadish, H.V. (2002). ProTDB: Probabilistic Data in XML. In Proc. of the 28th VLDB Conference.Google Scholar
  40. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.Google Scholar
  41. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.Google Scholar
  42. Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.Google Scholar
  43. Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.Google Scholar
  44. Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.Google Scholar
  45. Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.Google Scholar
  46. Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).Google Scholar
  47. Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).Google Scholar
  48. Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.Google Scholar
  49. Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.Google Scholar
  50. Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.Google Scholar
  51. Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.Google Scholar
  52. Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.CrossRefGoogle Scholar
  53. Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.CrossRefGoogle Scholar
  54. Zimányi, E. (1997). Query Evaluation in Probabilistic Relational Databases. Theoretical Computer Science, 171, 179–219.MathSciNetMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA
  2. 2.Department of Computer ScienceUniversity of KentuckyLexingtonUSA

Personalised recommendations