Schema-Driven Evaluation of Approximate Tree-Pattern Queries
We present a simple query language for XML, which supports hierarchical, Boolean-connected query patterns. The interpretation of a query is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between the query and the data and is used to rank the results. We introduce two polynomial-time algorithms that efficiently find the best n answers to the query: The first algorithm finds all approximate results, sorts them by increasing cost, and prunes the result list after the n then try. The second algorithm uses a structural summary -the schema- of the database to estimate the best k transformed queries, which in turn are executed against the database. We compare both approaches and show that the schema-based evaluation outperforms the pruning approach for small values of n. The pruning strategy is the better choice if n is close to the total number of approximate results for the query.
Unable to display preview. Download preview PDF.
- 1.A. Aboulnaga, J.F. Naughton, and C. Zhang. Generating synthetic complexstructured XML data. In Proceedings of WebDB’01, 2001.Google Scholar
- 2.A. Apostolico and Z. Galil, editors. Pattern Matching Algorithms, Chapter 14: Approximate Tree Pattern Matching. Oxford University Press, 1997.Google Scholar
- 3.R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman, 1999.Google Scholar
- 4.The Berkeley DB. Sleepycat Software Inc., 2000. http://www.sleepycat.com.
- 5.A. Bonifati and S. Ceri. Comparative analysis of five XML query languages. SIGMOD Record, 29(1), 2000.Google Scholar
- 6.T.T. Chinenyanga and N. Kushmerick. Expressive retrieval from XML documents. In Proceedings of SIGIR, 2001.Google Scholar
- 7.N. Fuhr and K. Groβjohann. XIRQL: A query language for information retrieval in XML documents. In Proceedings of SIGIR, 2001.Google Scholar
- 8.R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured data. In Proceedings of VLDB, 1997.Google Scholar
- 9.T. Jiang, L. Wang, and K. Zhang. Alignment of trees-an alternative to tree edit. In Proceedings of Combinatorial Pattern Matching, 1994.Google Scholar
- 10.P. Kilpeläinen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, 1992.Google Scholar
- 11.J. Robie, J. Lapp, and D. Schach. XML query language (XQL), 1998. http://www.w3.org/TandS/QL/QL98/pp/xql.html.
- 12.T. Schlieder. ApproXQL: Design and implementation of an approximate pattern matching language for XML. Report B 01-02, Freie Universität Berlin, 2001.Google Scholar
- 13.T. Schlieder. Schema-driven evaluation of ApproXQL queries. Report B 02-01, Freie Universität Berlin, 2002.Google Scholar
- 15.A. Theobald and G. Weikum. Adding relevance to XML. In Proceedings of WebDB’00, 2000.Google Scholar
- 16.K. Zhang. A new editing based distance between unordered labeled trees. In Proceedings of Combinatorial Pattern Matching, 1993.Google Scholar