Schema-Driven Evaluation of Approximate Tree-Pattern Queries

  • Torsten Schlieder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2287)

Abstract

We present a simple query language for XML, which supports hierarchical, Boolean-connected query patterns. The interpretation of a query is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between the query and the data and is used to rank the results. We introduce two polynomial-time algorithms that efficiently find the best n answers to the query: The first algorithm finds all approximate results, sorts them by increasing cost, and prunes the result list after the n then try. The second algorithm uses a structural summary -the schema- of the database to estimate the best k transformed queries, which in turn are executed against the database. We compare both approaches and show that the schema-based evaluation outperforms the pruning approach for small values of n. The pruning strategy is the better choice if n is close to the total number of approximate results for the query.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Aboulnaga, J.F. Naughton, and C. Zhang. Generating synthetic complexstructured XML data. In Proceedings of WebDB’01, 2001.Google Scholar
  2. 2.
    A. Apostolico and Z. Galil, editors. Pattern Matching Algorithms, Chapter 14: Approximate Tree Pattern Matching. Oxford University Press, 1997.Google Scholar
  3. 3.
    R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman, 1999.Google Scholar
  4. 4.
    The Berkeley DB. Sleepycat Software Inc., 2000. http://www.sleepycat.com.
  5. 5.
    A. Bonifati and S. Ceri. Comparative analysis of five XML query languages. SIGMOD Record, 29(1), 2000.Google Scholar
  6. 6.
    T.T. Chinenyanga and N. Kushmerick. Expressive retrieval from XML documents. In Proceedings of SIGIR, 2001.Google Scholar
  7. 7.
    N. Fuhr and K. Groβjohann. XIRQL: A query language for information retrieval in XML documents. In Proceedings of SIGIR, 2001.Google Scholar
  8. 8.
    R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured data. In Proceedings of VLDB, 1997.Google Scholar
  9. 9.
    T. Jiang, L. Wang, and K. Zhang. Alignment of trees-an alternative to tree edit. In Proceedings of Combinatorial Pattern Matching, 1994.Google Scholar
  10. 10.
    P. Kilpeläinen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, 1992.Google Scholar
  11. 11.
    J. Robie, J. Lapp, and D. Schach. XML query language (XQL), 1998. http://www.w3.org/TandS/QL/QL98/pp/xql.html.
  12. 12.
    T. Schlieder. ApproXQL: Design and implementation of an approximate pattern matching language for XML. Report B 01-02, Freie Universität Berlin, 2001.Google Scholar
  13. 13.
    T. Schlieder. Schema-driven evaluation of ApproXQL queries. Report B 02-01, Freie Universität Berlin, 2002.Google Scholar
  14. 14.
    K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM, 26(3):422–433, 1979.MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    A. Theobald and G. Weikum. Adding relevance to XML. In Proceedings of WebDB’00, 2000.Google Scholar
  16. 16.
    K. Zhang. A new editing based distance between unordered labeled trees. In Proceedings of Combinatorial Pattern Matching, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Torsten Schlieder
    • 1
  1. 1.Institute of Computer ScienceFreie Universität BerlinBerlin

Personalised recommendations