Skip to main content
Log in

A Framework for Management of Semistructured Probabilistic Data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.

  • Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.

  • Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.

    Google Scholar 

  • Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.

    Google Scholar 

  • Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.

  • Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.

  • Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).

  • Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).

  • Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).

  • Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).

  • de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.

    MathSciNet  Google Scholar 

  • de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.

    MathSciNet  Google Scholar 

  • Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).

  • Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).

  • Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.

    Article  MathSciNet  Google Scholar 

  • Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.

    Article  MathSciNet  Google Scholar 

  • Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).

  • Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).

  • Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.

    Article  Google Scholar 

  • Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.

    Article  Google Scholar 

  • Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.

    Article  Google Scholar 

  • Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.

    Article  Google Scholar 

  • Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.

  • Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.

  • Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.

  • Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.

  • Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.

    Article  MathSciNet  MATH  Google Scholar 

  • Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.

    Article  MathSciNet  MATH  Google Scholar 

  • Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.

  • Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.

  • Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.

  • Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.

  • Kanne, C.-Ch. and Moerkotte, G. (2000). Efficient storage of XML data. In Proc., ICDE (pp. 198).

  • Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.

    Article  Google Scholar 

  • Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.

    Article  Google Scholar 

  • Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.

    Article  Google Scholar 

  • Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.

    Article  Google Scholar 

  • Ng, R. and Subrahmanian, V.S. (1993). Probabilistic Logic Programming. Information and Computation, 101(2), 150–201.

    MathSciNet  Google Scholar 

  • Nierman, A. and Jagadish, H.V. (2002). ProTDB: Probabilistic Data in XML. In Proc. of the 28th VLDB Conference.

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.

  • Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.

    Google Scholar 

  • Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.

    Google Scholar 

  • Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.

  • Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.

  • Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).

  • Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).

  • Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.

    Google Scholar 

  • Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.

    Google Scholar 

  • Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.

  • Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.

  • Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.

    Article  Google Scholar 

  • Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.

    Article  Google Scholar 

  • Zimányi, E. (1997). Query Evaluation in Probabilistic Relational Databases. Theoretical Computer Science, 171, 179–219.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Dekhtyar.

Additional information

Work performed while a Ph.D. student at the University of Kentucky.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, W., Dekhtyar, A. & Goldsmith, J. A Framework for Management of Semistructured Probabilistic Data. J Intell Inf Syst 25, 293–332 (2005). https://doi.org/10.1007/s10844-005-0197-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-005-0197-8

Navigation