The VLDB Journal

, Volume 18, Issue 1, pp 233–254 | Cite as

Containment of partially specified tree-pattern queries in the presence of dimension graphs

  • Dimitri Theodoratos
  • Pawel Placek
  • Theodore Dalamagas
  • Stefanos Souldatos
  • Timos Sellis
Regular Paper

Abstract

Nowadays, huge volumes of data are organized or exported in tree-structured form. Querying capabilities are provided through tree-pattern queries. The need for querying tree-structured data sources when their structure is not fully known, and the need to integrate multiple data sources with different tree structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In this paper, we consider a query language that allows the partial specification of a tree pattern. Queries in this language range from structureless keyword-based queries to completely specified tree patterns. To support the evaluation of partially specified queries, we use semantically rich constructs, called dimension graphs, which abstract structural information of the tree-structured data. We address the problem of query containment in the presence of dimension graphs and we provide necessary and sufficient conditions for query containment. As checking query containment can be expensive, we suggest two heuristic approaches for query containment in the presence of dimension graphs. Our approaches are based on extracting structural information from the dimension graph that can be added to the queries while preserving equivalence with respect to the dimension graph. We considered both cases: extracting and storing different types of structural information in advance, and extracting information on-the-fly (at query time). Both approaches are implemented, validated, and compared through experimental evaluation.

Keywords

Tree-structured data Partial tree-pattern query Query containment XML 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    XML Path Language (XPath). World Wide Web Consortium site, W3C XPath: http://www.w3.org/TR/xpath20
  2. 2.
    XML Query (XQuery). World Wide Web Consortium site, W3C XQuery: http://www.w3.org/XML/Query
  3. 3.
    Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Minimization of tree pattern queries. In: Proceedings of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 497–508, Santa Barbara (2001)Google Scholar
  4. 4.
    Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Proc. of the 8th Intl. Conf. on Extending Database Technology, Prague (2002)Google Scholar
  5. 5.
    Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: Flexpath: flexible structure and full-text querying for xml. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 83–94 (2004)Google Scholar
  6. 6.
    Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of path summaries in an XML query optimizer supporting multiple access methods. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 133–144 (2005)Google Scholar
  7. 7.
    Benedikt, M., Fundulaki, I.: Xml subtree queries: specification and composition. In: Proc. of the Intl. Workshop on Database Programming Languages (DBPL’05), pp. 138–153, Trondheim (2005)Google Scholar
  8. 8.
    Chen, L., Rundensteiner, E.A.: Xquery containment in presence of variable binding dependencies. In: Proc. of the 14th Intl. Conf. on World Wide Web, pp. 288–297 (2005)Google Scholar
  9. 9.
    Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale xml repository. In: Proc. of the 27th Intl. Conf. on Very Large Data Bases (2001)Google Scholar
  10. 10.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: a semantic search engine for XML. In: Proc. of the 29th Intl. Conf. on Very Large Data Bases (2003)Google Scholar
  11. 11.
    Deutsch, A., Tannen, V.: Containment and integrity constraints for xpath. In: Proc. of the 8th Intl. Workshop on Knowledge Representation meets Databases (2001)Google Scholar
  12. 12.
    Dong, X., Halevy, A.Y., Tatarinov, I.: Containment of nested XML queries. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 132–143 (2004)Google Scholar
  13. 13.
    Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into xml query processing. Comput. Netw. 33(1–6), 119–135 (2000)CrossRefGoogle Scholar
  14. 14.
    Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd Intl. Conf. on Very large Databases, pp. 436–445 (1997)Google Scholar
  15. 15.
    Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of the ACM SIGMOD Intl. Conf. on Management of Data, Madison, pp. 287–298 (2002)Google Scholar
  16. 16.
    Hidders, J.: Satisfiability of XPath expressions. In: Proc. of the 9th Intl. Workshop on Database Programming Languages, pp. 21–36 (2003)Google Scholar
  17. 17.
    Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp. 367–378 (2003)Google Scholar
  18. 18.
    Kanza, Y., Sagiv, Y.: Flexible queries over semistructured data. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2001)Google Scholar
  19. 19.
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Madison, pp. 133–144 (2002)Google Scholar
  20. 20.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proc. of the 18th Intl. Conf. on Data Engineering, pp. 129–140 (2002)Google Scholar
  21. 21.
    Lakshmanan, L.V., Wang, H.W., Zhao, Z.J.: Answering tree pattern queries using views. In: Proc. of the 32nd Intl. Conf. on Very Large Data Bases (2006)Google Scholar
  22. 22.
    Lakshmanan, L.V.S., Ramesh, G., Wang, H.W., Zhao, Z.J.: On testing satisfiability of tree pattern queries. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 120–130 (2004)Google Scholar
  23. 23.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 72–83 (2004)Google Scholar
  24. 24.
    Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 329–340 (2007)Google Scholar
  25. 25.
    Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: Proc. of the 21st ACM Symp. on Principles of Database Systems, pp. 65–76 (2002)Google Scholar
  26. 26.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Proc. of the 9th Intl. Conf. on Database Theory, pp. 277–295 (1999)Google Scholar
  27. 27.
    Neven, F., Schwentick, T.: XPath containment in the presence of disjunction, DTDs, and variables. In: Proc. of the 13th Intl. Conf. on Database Theory, Sienna, pp. 315–329 (2003)Google Scholar
  28. 28.
    Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: SIGMOD Conference, pp. 455–466 (1999)Google Scholar
  29. 29.
    Polyzotis, N., Garofalakis, M.: Statistical synopsis for graph-structured XML databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Madison (2002)Google Scholar
  30. 30.
    Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML query answers. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Paris, pp. 263–274 (2004)Google Scholar
  31. 31.
    Ramanan, P.: Efficient algorithms for minimizing tree pattern queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, Madison, pages 299–309 (2002)Google Scholar
  32. 32.
    Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML documents made easy: nearest concept queries. In: Proc. of the 17th Intl. Conf. on Data Engineering, pp. 321–329 (2001)Google Scholar
  33. 33.
    Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic querying of tree-structured data sources using partially specified tree-patterns. In: Proc. of the 14th ACM Intl. Conf. on Information and Knowledge Management, pp. 712–719 (2005)Google Scholar
  34. 34.
    Theodoratos, D., Dalamagas, T., Placek, P., Souldatos, S., Sellis, T.: Containment of partially specified tree-pattern queries. In: Proc. of the Intl. Conference on Scientific and Statistical Databases, pp. 3–12 (2006)Google Scholar
  35. 35.
    Theodoratos, D., Souldatos, S., Dalamagas, T., Placek, P., Sellis, T.: Heuristic containment check of partial tree-pattern queries in the presence of index graphs. In: Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Management, pp. 445–454 (2006)Google Scholar
  36. 36.
    Wood, P.T.: Minimising simple XPath expressions. In: Informal Proc. of the 4th Intl. Workshop on the Web and Databases, pp. 13–18 (2001)Google Scholar
  37. 37.
    Wood, P.T.: Containment for XPath fragments under DTD constraints. In: Proc. of the 13th Intl. Conf. on Database Theory, Sienna, pp. 300–314 (2003)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Dimitri Theodoratos
    • 1
  • Pawel Placek
    • 1
  • Theodore Dalamagas
    • 2
  • Stefanos Souldatos
    • 2
  • Timos Sellis
    • 2
  1. 1.New Jersey Institute of TechnologyNewarkUSA
  2. 2.National Technical University of AthensAthensGreece

Personalised recommendations