Abstract
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.
Similar content being viewed by others
References
Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.
Abiteboul, S., Buneman, P., and Suciu, D. (1990). Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann.
Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.
Barbará, D., Garcia-Molina, H., and Porter, D. (1992). The Management of Probabilistic Data. IEEE Trans. on Knowledge and Data Engineering, 4, 487–502.
Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.
Bray, T., Paoli, J., and Spreberg-McQueen, C.M. (Eds.). (1998). Extensible Markup Language (XML) 1.0. World Wide Web Consortium Recommendation, 19980210.
Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).
Buneman, P. (1997). Semistructured Data. In Proc. PODS'97 (pp. 117–121).
Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).
Cavallo, R. and Pittarelli, M. (1987). The Theory of Probabilistic Databases. In Proc. VLDB'87 (pp. 71–81).
de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.
de Campos, L.M., Huete, J.F., and Moral, S. (1994). Probability Intervals: A Tool for Uncertain Reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), 167–196.
Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).
Dekhtyar, A., Goldsmith, J., and Hawkes, S. (2001). Semistructured Probabilistic Databases. In Proc. Statistical and Scientific Database Management Systems, (pp. 36–45).
Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.
Dekhtyar, A. and Subrahmanian, V.S. (2000). Hybrid Probabilistic Logic Programs. Journal of Logic Programming, 43(3), 187–250.
Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).
Deutsch, A., Fernandez, M., and Suciu, D. (1999). Storing Semi-Structured Data Using STORED. In Proc. ACM SIGMOD (pp. 431–442).
Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.
Dey, D. and Sarkar, S. (1996). A Probabilistic Relational Model and Algebra. ACM Transactions on Database Systems, 21(3), 339–369.
Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.
Dey, D. and Sarkar, S. (1998). PSQL: A Query Language for Probabilistic Relational Data. Data and Knowledge Engineering, 28, 107–120.
Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.
Eiter, T., Lu, J., Lukasiewicz, T., and Subrahmanian, V.S. (2001). Probabilistic Object Bases. ACM Transactions on Database Systems.
Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.
Florescu, D. and Kossmann, D. (1999). A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. Technical Report 3680, INRIA Technical Report.
Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.
Halpern, J. (1990). An Analysis of First-Order Logics of Probability. Artificial Intelligence, 46(3), 311–350.
Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.
Hawkes, S. and Dekhtyar, A. (2001). Designing Markup Languages for Probabilistic Information, University of Kentucky Tech. Report, TR 319-01.
Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.
Hung, E., Getoor, Lise, and Subrahmanian, V.S. (2003). Probabilistic interval XML. In Proc. of the Ninth International Conference on Database Theory.
Kanne, C.-Ch. and Moerkotte, G. (2000). Efficient storage of XML data. In Proc., ICDE (pp. 198).
Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.
Kornatzky, E. and Shimony, S.E. (1994). A Probabilistic Object Data Model. Data and Knowledge Engineering, 12, 143–166.
Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.
Lakshmanan, V.S., Leone, N., Ross, R. and Subrahmanian, V.S. (1997). Probview: A Flexible Probabilistic Database System. ACM Transactions on Database Systems, 22(3), 419–469.
Ng, R. and Subrahmanian, V.S. (1993). Probabilistic Logic Programming. Information and Computation, 101(2), 150–201.
Nierman, A. and Jagadish, H.V. (2002). ProTDB: Probabilistic Data in XML. In Proc. of the 28th VLDB Conference.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.
Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.
Pittarelli, M. (1994). An Algebra for Probabilistic Databases. IEEE Transaction on Knowledge and Data Engineering, 6(2), 293–303.
Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.
Russell, S.J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.
Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).
Suciu, D. (1998). Semistructured Data and XML. In Proc. 5th. Intl. Conf. on Foundation of Data Organization (pp. 1–12).
Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.
Tian, F., DeWitt, D.J., Chen, J. and Zhang, C. (2002). The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31(1), 5–10.
Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.
Zhao, W., Dekhtyar, A. and Goldsmith, J. (2003). Representing Probabilistic Information in XML. Technical Report 770–03, Department of Computer Science, University of Kentucky.
Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.
Zhao, W., Dekhtyar, A. and Goldsmith, J. (2004). Databases for Interval Probabilities. International Journal of Intelligent Systems, 19, 1–27.
Zimányi, E. (1997). Query Evaluation in Probabilistic Relational Databases. Theoretical Computer Science, 171, 179–219.
Author information
Authors and Affiliations
Corresponding author
Additional information
Work performed while a Ph.D. student at the University of Kentucky.
Rights and permissions
About this article
Cite this article
Zhao, W., Dekhtyar, A. & Goldsmith, J. A Framework for Management of Semistructured Probabilistic Data. J Intell Inf Syst 25, 293–332 (2005). https://doi.org/10.1007/s10844-005-0197-8
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10844-005-0197-8