Theory of Computing Systems

, Volume 57, Issue 4, pp 892–926 | Cite as

Certain Answers over Incomplete XML Documents: Extending Tractability Boundary

  • Amélie Gheerbrant
  • Leonid LibkinEmail author


Previous studies of incomplete XML documents have identified three main sources of incompleteness – in structural information, data values, and labeling – and addressed data complexity of answering analogs of unions of conjunctive queries under the open world assumption. It is known that structural incompleteness leads to intractability, while incompleteness in data values and labeling still permits efficient computation of certain answers. The goal of this paper is to provide a detailed picture of the complexity of query answering over incomplete XML documents. We look at more expressive languages, at other semantic assumptions, and at both data and combined complexity of query answering, to see whether some well-behaving tractable classes have been missed. To incorporate non-positive features into query languages, we look at a gentle way of introducing negation via Boolean combinations of existential positive queries, as well as the analog of relational calculus. We also look at the closed world assumption which, due to the hierarchical structure of XML, has two variations. For all combinations of languages and semantics of incompleteness we determine data and combined complexity of computing certain answers. We show that structural incompleteness leads to intractability under all assumptions, while by dropping it we can recover efficient evaluation algorithms for some queries that go beyond those previously studied. In the process, we also establish a new result about relational query answering over incomplete databases, showing that for Boolean combinations of conjunctive queries, certain answers can be found in polynomial time.


Incomplete information XML Query answering Complexity 



We are grateful to Tony Tan for extensive discussions, and for help with pictures. Work partially supported by EPSRC grants G049165 and J015377.


  1. 1.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kauffman, San Mateo (1999)Google Scholar
  2. 2.
    Abiteboul, S. , Duschka, O. : Complexity of answering queries using materialized views. In: PODS 1998, pp. 254–263. (1998)Google Scholar
  3. 3.
    Abiteboul, S., Kanellakis, P, Grahne, G.: On the representation and querying of sets of possible worlds. TCS 78(1), 158–187 (1991)MathSciNetGoogle Scholar
  4. 4.
    Abiteboul, S., Segoufin, L., Vianu, V.: Representing and querying XML with incomplete information. ACM TODS 3(1), 208–254 (2006)CrossRefGoogle Scholar
  5. 5.
    Antova, L., Jansen, T., Koch, C. , Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE′ 08, pp. 983–992 (2008)Google Scholar
  6. 6.
    Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of Data Exchange. Cambridge University Press, Cambridge (2014)Google Scholar
  7. 7.
    Arenas, M., Libkin, L.: XML data exchange: consistency and query answering. J. ACM 55(2)(2008)Google Scholar
  8. 8.
    Barceló, P., Libkin, L., Poggi, A., Sirangelo, C.: XML with incomplete information. J. ACM 58, 1 (2010)CrossRefGoogle Scholar
  9. 9.
    Björklund, H, Martens, W., Schwentick, T.: Conjunctive query containment over trees. J. Comput. Syst. Sci. 7(3), 450–472 (2011)CrossRefGoogle Scholar
  10. 10.
    Calì, A. , Lembo, D. , Rosati, R. : On the decidability and complexity of query answering over inconsistent and incomplete databases. In: PODS′ 03, pp. 260–271 (2003)Google Scholar
  11. 11.
    Calvanese, D., De Giacomo, G., Lenzerini, M.: Semi-structured data with constraints and incomplete information. In: Description Logics (1998)Google Scholar
  12. 12.
    Calvanese, D., De Giacomo, G., Lenzerini, M.: Representing and reasoning on XML documents: a description logic approach. J. Log. Comput 9, 295–318 (1999)zbMATHMathSciNetCrossRefGoogle Scholar
  13. 13.
    David, C. , Libkin, L. , Murlak, F. : Certain answers for XML queries. In: PODS 2010, pages 191-202. (2010)Google Scholar
  14. 14.
    David, C. , Gheerbrant, A. , Libkin, L. , Martens, W. : Containment of pattern-based queries over data trees. In: ICDT 2013, pp. 201–212 (2013)Google Scholar
  15. 15.
    Eiter, T., Gottlob, G., Mannila, H.: Disjunctive datalog. ACM Trans. Database Syst. 22(3), 364–418 (1997)CrossRefGoogle Scholar
  16. 16.
    Gheerbrant, A. , Libkin, L. , Tan, T. : On the complexity of query answering over incomplete XML documents. In: ICDT 2012, pp. 169–181 (2012)Google Scholar
  17. 17.
    Gottlob, G., Koch, C., Schulz, K.: Conjunctive queries over trees. J. ACM 53(2), 238–272 (2006)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Grahne, G: The Problem of Incomplete Information in Relational Databases. Springer, New York (1991)zbMATHCrossRefGoogle Scholar
  19. 19.
    Imieliński, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)zbMATHCrossRefGoogle Scholar
  20. 20.
    Kimelfeld, B., Sagiv, Y.: Modeling and querying probabilistic XML data. SIGMOD Record 37(4), 69–77 (2008)CrossRefGoogle Scholar
  21. 21.
    Lenzerini, M. : Data integration: a theoretical perspective. In: PODS′ 02, pp. 233–246 (2002)Google Scholar
  22. 22.
    Libkin, L. : Incomplete information and certain answers in general data models. In: PODS′ 11, pp. 59–70 (2011)Google Scholar
  23. 23.
    Reiter, R.: On closed world databases. In: Gallaire, H., Minker, J. (eds.) Logic and Databases, pp 55–76. Plenum, New York (1978)Google Scholar
  24. 24.
    Rosati, R.: On the finite controllability of conjunctive query answering in databases under open-world assumption. J. Comput. Syst. Sci. 77(3), 572–594 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  25. 25.
    Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and difference operators. J. ACM 27(4), 633–655 (1980)zbMATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Suciu, D., Olteanu, D., Re, C., Koch, C.: Probabilistic Databases. Morgan & Claypool (2011)Google Scholar
  27. 27.
    van der Meyden, R.: The complexity of querying indefinite data about linearly ordered domains. J. Comput. Syst. Sci. 54(1), 113–135 (1997)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of EdinburghEdinburghUK
  2. 2.Université Paris–DiderotParisFrance

Personalised recommendations