Querying Factorized Probabilistic Triple Databases

  • Denis Krompaß
  • Maximilian Nickel
  • Volker Tresp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8797)

Abstract

An increasing amount of data is becoming available in the form of large triple stores, with the Semantic Web’s linked open data cloud (LOD) as one of the most prominent examples. Data quality and completeness are key issues in many community-generated data stores, like LOD, which motivates probabilistic and statistical approaches to data representation, reasoning and querying. In this paper we address the issue from the perspective of probabilistic databases, which account for uncertainty in the data via a probability distribution over all database instances. We obtain a highly compressed representation using the recently developed RESCAL approach and demonstrate experimentally that efficient querying can be obtained by exploiting inherent features of RESCAL via sub-query approximations of deterministic views.

Keywords

Probabilistic Databases Tensor Factorization RESCAL Querying Extensional Query Evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: AAAI (2011)Google Scholar
  2. 2.
    Boulos, J., Dalvi, N.N., Mandhani, B., Mathur, S., Ré, C., Suciu, D.: Mystiq: a system for finding more answers by using probabilities. In: SIGMOD Conference, pp. 891–893 (2005)Google Scholar
  3. 3.
    Calì, A., Lukasiewicz, T., Predoiu, L., Stuckenschmidt, H.: Tightly integrated probabilistic description logic programs for representing ontology mappings. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 178–198. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    da Costa, P.C.G., Laskey, K.B., Laskey, K.J.: Pr-owl: A bayesian ontology language for the semantic web. In: da Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005 - 2007. LNCS (LNAI), vol. 5327, pp. 88–107. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Dalvi, N.N., Re, C., Suciu, D.: Queries and materialized views on probabilistic databases. J. Comput. Syst. Sci. 77(3), 473–490 (2011)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Ding, Z., Peng, Y., Pan, R.: A bayesian approach to uncertainty modelling in owl ontology. In: Proceedings of the International Conference on Advances in Intelligent Systems - Theory and Applications (2004)Google Scholar
  7. 7.
    Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. Research Report MPI-I-2012-5-002, Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany (June 2012)Google Scholar
  8. 8.
    Franz, T., Schultz, A., Sizov, S., Staab, S.: Triplerank: Ranking semantic web data by tensor decomposition. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 213–228. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Giugno, R., Lukasiewicz, T.: P-\(\mathcal{SHOQ}({\bf D})\): A probabilistic extension of \(\mathcal{SHOQ}({\bf D})\) for probabilistic ontologies in the semantic web. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 86–97. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Huang, J., Antova, L., Koch, C., Olteanu, D.: Maybms: a probabilistic database management system. In: SIGMOD Conference (2009)Google Scholar
  11. 11.
    Jenatton, R., Roux, N.L., Bordes, A., Obozinski, G.: A latent factor model for highly multi-relational data. In: NIPS (2012)Google Scholar
  12. 12.
    Kolda, T.G., Bader, B.W., Kenny, J.P.: Higher-order web link analysis using multilinear algebra. In: ICDM, pp. 242–249 (2005)Google Scholar
  13. 13.
    Laub, A.J.: Matrix analysis - for scientists and engineers. SIAM (2005)Google Scholar
  14. 14.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal (2014)Google Scholar
  15. 15.
    Lukasiewicz, T.: Expressive probabilistic description logics. Artif. Intell. 172(6-7), 852–883 (2008)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Mutsuzaki, M., Theobald, M., de Keijzer, A., Widom, J., Agrawal, P., Benjelloun, O., Sarma, A.D., Murthy, R., Sugihara, T.: Trio-one: Layering uncertainty and lineage on a conventional dbms (demo). In: CIDR, pp. 269–274 (2007)Google Scholar
  17. 17.
    Nickel, M.: Tensor factorization for relational learning. PhDThesis, p. 48, 49, 74, Ludwig-Maximilian-University of Munich (August 2013)Google Scholar
  18. 18.
    Nickel, M., Tresp, V.: Logistic tensor factorization for multi-relational data. In: Structured Learning: Inferring Graphs from Structured and Unstructured Inputs, ICML WS (2013)Google Scholar
  19. 19.
    Nickel, M., Tresp, V., Kriegel, H.-P.: A three-way model for collective learning on multi-relational data. In: ICML, pp. 809–816 (2011)Google Scholar
  20. 20.
    Nickel, M., Tresp, V., Kriegel, H.-P.: Factorizing yago: scalable machine learning for linked data. In: WWW, pp. 271–280 (2012)Google Scholar
  21. 21.
    Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)Google Scholar
  22. 22.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)Google Scholar
  23. 23.
    Rendle, S., Marinho, L.B., Nanopoulos, A., Schmidt-Thieme, L.: Learning optimal ranking with tensor factorization for tag recommendation. In: KDD, pp. 727–736 (2009)Google Scholar
  24. 24.
    Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: HLT-NAACL, pp. 74–84 (2013)Google Scholar
  25. 25.
    Christopher, R., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)Google Scholar
  26. 26.
    Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S.E., Shah, R.: Orion 2.0: native support for uncertain data. In: SIGMOD Conference, pp. 1239–1242 (2008)Google Scholar
  27. 27.
    Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)Google Scholar
  28. 28.
    Theobald, M., De Raedt, L., Dylla, M., Kimmig, A., Miliaraki, I.: 10 years of probabilistic querying - what next? In: Catania, B., Guerrini, G., Pokorný, J. (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 1–13. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  29. 29.
    Tresp, V., Huang, Y., Bundschus, M., Rettinger, A.: Materializing and querying learned knowledge. In: First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS 2009) (2009)Google Scholar
  30. 30.
    Wermser, H., Rettinger, A., Tresp, V.: Modeling and learning context-aware recommendation scenarios using tensor decomposition. In: ASONAM, pp. 137–144 (2011)Google Scholar
  31. 31.
    Yang, Y., Calmet, J.: Ontobayes: An ontology-driven uncertainty model. In: CIMCA/IAWTIC, pp. 457–463 (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Denis Krompaß
    • 1
  • Maximilian Nickel
    • 2
    • 3
  • Volker Tresp
    • 1
    • 4
  1. 1.Ludwig Maximilian UniversityMunichGermany
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA
  3. 3.Istituto Italiano di TecnologiaGenovaItaly
  4. 4.Siemens AG, Corporate TechnologyMunichGermany

Personalised recommendations