Tree Pattern Relaxation
Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns. We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers. We are interested in the problem of finding answers whose scores are at least as large as a given threshold. We design data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process. We develop an optimization that exploits scores of intermediate results to improve query evaluation efficiency. Finally, we show experimentally that our techniques outperform rewriting-based and post-pruning strategies.
Unable to display preview. Download preview PDF.
- 1.S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: Efficient matching of XML query patterns. In Proceedings of ICDE, 2002.Google Scholar
- 2.S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu. XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery.
- 3.Y. Chiaramella, P. Mulhem, and F. Fourel. A model for multimedia information retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow. www.dcs.gla.ac.uk/fermi/tech_reports/reports/fermi96-4.ps.gz.
- 5.C. Delobel and M. C. Rousset. A uniform approach for querying large treestructured data through a mediated schema. In Proceedings of International Workshop on Foundations of Models for Information Integration (FMII), 2001.Google Scholar
- 6.A. Deutch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. In Proceedings of WWW, 1999.Google Scholar
- 7.C. Faloutsos. Access methods for text. ACM Computing Surveys, 17(1), 1985.Google Scholar
- 8.N. Fuhr and K. Grossjohann. XIRQL: An extension of XQL for information retrieval. In Proceedings of SIGIR, 2001.Google Scholar
- 9.Y. Hayashi, J. Tomita, and G. Kikui. Searching text-rich XML documents with relevance ranking. In Proceedings of SIGIR Workshop on XML and Information Retrieval, 2000.Google Scholar
- 10.Y. Kanza and Y. Sagiv. Flexible queries over semistructured data. In Proceedings of PODS, 2001.Google Scholar
- 12.S. Myaeng, D.-H. Jang, M.-S. Kim, and Z.-C. Zhoo. A flexible Model for retrieval of SGML documents. In Proceedings of SIGIR, 1998.Google Scholar
- 13.M. Persin. Document filtering for fast ranking. In Proceedings of SIGIR, 1994.Google Scholar
- 14.J. Robie, J. Lapp, and D. Schach. XML query language (XQL). Available from http://www.w3.org/TandS/QL/QL98/pp/xql.html.
- 16.T. Schlieder. Similarity search in XML data using cost-based query Transformations. In Proceedings of SIGMOD WebDB Workshop, 2001.Google Scholar
- 17.A. Theobald and G. Weikum. Adding relevance to XML. In Proceedings of SIGMOD WebDB Workshop, 2000.Google Scholar
- 18.H. Turtle and J. Flood. Query evaluation: Strategies and optimization. Information Processing & Management, Nov. 1995.Google Scholar
- 19.C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. In Proceedings of SIGMOD, 2001.Google Scholar
- 20.K. Zhang and D. Shasha. Tree pattern matching. Pattern Matching Algorithms, Apostolico and Galil (Eds.), Oxford University Press, 1997.Google Scholar