The VLDB Journal

, Volume 17, Issue 3, pp 355–377 | Cite as

Enabling Schema-Free XQuery with meaningful query focus

Regular Paper

Abstract

The widespread adoption of XML holds the promise that document structure can be exploited to specify precise database queries. However, users may have only a limited knowledge of the XML structure, and may be unable to produce a correct XQuery expression, especially in the context of a heterogeneous information collection. The default is to use keyword-based search and we are all too familiar with how difficult it is to obtain precise answers by these means. We seek to address these problems by introducing the notion of Meaningful Query Focus (MQF) for finding related nodes within an XML document. MQF enables users to take full advantage of the preciseness and efficiency of XQuery without requiring (perfect) knowledge of the document structure. Such a Schema-Free XQuery is potentially of value not just to casual users with partial knowledge of schema, but also to experts working in data integration or data evolution. In such a context, a schema-free query, once written, can be applied universally to multiple data sources that supply similar content under different schemas, and applied “forever” as these schemas evolve. Our experimental evaluation found that it is possible to express a wide variety of queries in a schema-free manner and efficiently retrieve correct results over a broad diversity of schemas. Furthermore, the evaluation of a schema-free query is not expensive: using a novel stack-based algorithm we developed for computing MQF, the overhead is from 1 to 4 times the execution time of an equivalent schema-aware query. The evaluation cost of schema-free queries can be further reduced by as much as 68% using a selectivity-based algorithm we develop to enable the integration of MQF operation into the query pipeline.

Keywords

Hierachical Semi-structured XML Schema Query language XQuery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Aditya, B. et al.: BANKS: Browsing and keyword searching in relational databases. VLDB (2002)Google Scholar
  6. 6.
    Agrawal, S. et al.: DBXplorer: a system for keyword-based search over relational databases. ICDE (2002)Google Scholar
  7. 7.
    Al-Khalifa, S. et al.: Structural joins: A primitive for efficient XML query pattern matching. ICDE (2001)Google Scholar
  8. 8.
    Al-Khalifa, S. et al.: Querying structured text in an XML database. SIGMOD (2003)Google Scholar
  9. 9.
    Amer-Yahai, S. et al.: FleXPath: Flexible structure and full-text querying for XML. SIGMOD (2004)Google Scholar
  10. 10.
    Amer-Yahia, S. et al.: TeXQuery: A full-text search extension to XQuery. WWW (2004)Google Scholar
  11. 11.
    Bruno, N. et al.: Holistic twig joins: Optimal XML pattern matching. SIGMOD (2002)Google Scholar
  12. 12.
    Burton-Jones, A. et al.: A heuristic-based methodology for semantic augmentation of user queries on the Web. ER (2003)Google Scholar
  13. 13.
    Carmel, D. et al.: Searching XML documents via XML fragments. SIGIR (2003)Google Scholar
  14. 14.
    Chamberlin, D.: XQuery: An XML query language. IBM Syst. J. 41, 597–615 (2003)CrossRefGoogle Scholar
  15. 15.
    Chien, S.-Y. et al.: Efficient structural joins on indexed XML documents. VLDB (2002)Google Scholar
  16. 16.
    Chinenyanga, T.T., Kushmerick, N.: Expressive and efficient ranked querying of XML data. WebDB (2001)Google Scholar
  17. 17.
    Cohen, S. et al.: XSEarch: A semantic search engine for XML. VLDB (2003)Google Scholar
  18. 18.
    Deerwester, S. et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. (1990)Google Scholar
  19. 19.
    Florescu, D. et al.: Integrating keyword search into XML query processing. Comput. Netw. 33, 119–135 (2000)CrossRefGoogle Scholar
  20. 20.
    Fuhr, N., Großjohann, K.: XIRQL: An extension of XQL for information retrieval. SIGIR (2000)Google Scholar
  21. 21.
    Goldman, R. et al.: Proximity search in databases. VLDB (1998)Google Scholar
  22. 22.
    Guo, L. et al.: XRANK: Ranked keyword search over XML documents. SIGMOD (2003)Google Scholar
  23. 23.
    Halevy, A. et al.: Crossing the structure chasm. CIDR (2003)Google Scholar
  24. 24.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Hristidis, V. et al.: Keyword proximity search on XML graphs. ICDE (2003)Google Scholar
  26. 26.
    Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. VLDB (2002)Google Scholar
  27. 27.
    Jagadish, H.V. et al.: TIMBER: A native XML database. VLDB J. 11(4), 274–291 (2002)MATHCrossRefGoogle Scholar
  28. 28.
    Ley, M.: DBLP bibliography (2003)Google Scholar
  29. 29.
    Li, Y. et al.: NaLIX: An interactive natural language interface for querying XML. SIGMOD (2005)Google Scholar
  30. 30.
    Quass, D. et al.: Querying semistructured heterogeneous information. DOOD (1995)Google Scholar
  31. 31.
    Resnik, P.S.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural langauge. J. Artif. Intell. Res. 11, 95–130 (1999)MATHGoogle Scholar
  32. 32.
    Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Schlieder, T.: Similarity search in {XML} data using cost-based query tranformations. SIGMOD (2001)Google Scholar
  34. 34.
    Schmidt, A. et al.: Querying XML documents made easy: Nearest concept queries. ICDE (2001)Google Scholar
  35. 35.
    Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. EDBT (2002)Google Scholar
  36. 36.
    W3C: XML Query Use Cases. W3C Working Draft. Available at http://www.w3.org/TR/xquery-use-cases/ (2003)
  37. 37.
    W3C: XML Schema. W3C Recommendation. Available at http://www.w3.org/XML/Schema (2003)
  38. 38.
    W3C: XQuery 1.0. W3C Working Draft. Available at http://www.w3.org/TR/xquery/ (2004)
  39. 39.
    W3C: XQuery 1.0 and XPath 2.0 Full-Text. W3C Working Draft. Available at http://www.w3.org/TR/xquery-full-text/ (2005)
  40. 40.
    Wen, Z.: New algorithms for the LCA problem and the binary tree reconstruction problem. Inf. Process. 51(1), 11–16 (1994)Google Scholar
  41. 41.
    Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. SIGMOD (2005)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.Department of EECSUniversity of MichiganAnn ArborUSA

Personalised recommendations