Skip to main content

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3453))

Abstract

We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis “//” in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aboulnaga, A., Alameldeen, A.R., Naughton, J.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: Proc. the 27th Int’l Conf. on Very Large Data Bases(VLDB), Rome, Italy, September 11-14 , pp. 591–600 (2001)

    Google Scholar 

  2. Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Minimization of Tree Pattern Queries. In: Proc. 2001 ACM SIGMOD Int’l Conf. on Management of Data, Santa Barbara, California, May 21-24 , pp. 497–508.(2001)

    Google Scholar 

  3. Altinel, M., Franklin, M.J.: Efficient Filtering of XML Documents for Selective Dissemination of Information. In: Proc. the 26th Int’l Conf. on Very Large Data Bases(VLDB), Cairo, Egypt, September 10-14, pp. 53–64 (2000)

    Google Scholar 

  4. Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. the 18th Int’l Conf. on Data Engineering (ICDE), San Jose, California, February 26 - March 1, pp. 141–152 (2002)

    Google Scholar 

  5. Bremer, J.-M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: Proc. the Fifth Int’l Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, pp. 1–6 (2002)

    Google Scholar 

  6. Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: Proc. 2002 ACM SIGMOD Int’l Conf. on Management of Data, Madison, Wisconsin, June 3-6, pp. 310–321 (2002)

    Google Scholar 

  7. Clark, J., DeRose, S.: XML Path Language (XPath), W3C Recommendation (November 1999), http://www.w3.org/TR/xpath

  8. Cooper, B.F., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A Fast Index for Semistructured Data. In: Proc. the 27th Int’l Conf. on Very Large Data Bases(VLDB), Rome, Italy, September 11-14, pp. 341–350 (2001)

    Google Scholar 

  9. Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: Proc. the 9th WWW Conference/Computer Networks, Amsterdam, NL (May 2000), pp. 119–135 (2000)

    Google Scholar 

  10. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. 2003 ACM SIGMOD Int’l Conf. on Management of Data, San Diego, California, June 9-12, pp. 16–27 (2003)

    Google Scholar 

  11. Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proc. the 23th Int’l Conf. on Very Large Data Bases(VLDB), Athens, Greece, August 26-29, pp. 436–445 (1997)

    Google Scholar 

  12. Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A.N., Tian, F., Viglas, S., Wang, Y., Naughton, J.F., DeWitt, D.J.: Mixed Mode XML Query Processing. In: Proc. the 29th Int’l Conf. on Very Large Data Bases(VLDB), Berlin, Germany, September 9-12, pp. 225–236 (2003)

    Google Scholar 

  13. Ives, Z., Levy, A., Weld, D.: Efficient Evaluation of Regular Path Expressions on Streaming XML Data, Technical Report UW-CSE-2000-05-02, University of Washington (2000)

    Google Scholar 

  14. Jiang, H., Lu, H., Wang, W., Xu Yu, J.: Path Materialization Revisited: An Efficient Storage Model for XML Data. In: Proc. the 13th Australasian Database Conference (ADC), Melbourne, Australia, January 28 - February 1, pp. 85–94 (2002)

    Google Scholar 

  15. Jiang, H., Lu, H., Wang, W., Xu Yu, J.: XParent: An Efficient RDBMS-Based XML Database System. In: Proc. the 18th Int’l Conf. on Data Engineering (ICDE), San Jose, California, February 26 - March 1, pp. 335–336 (2002)

    Google Scholar 

  16. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic Twig Joins on Indexed XML Documents. In: Proc. the 29th Int’l Conf. on Very Large Data Bases(VLDB), Berlin, Germany, September 9-12, pp. 273–284 (2003)

    Google Scholar 

  17. Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proc. the 27th Int’l Conf. on Very Large Data Bases(VLDB), Rome, Italy, September 11-14, pp. 361–370 (2001)

    Google Scholar 

  18. Mandreoli, F., Martoglia, R., Tiberio, P.: Searching Similar (Sub)Sentences for Example-Based Machine Translation. In: Proc. 2002 Italian Symposium on Sistemi Evoluti per Basi di Dati (SEBD 2002), Isola d’Elba, Italy, (June 2002)

    Google Scholar 

  19. McHugh, J., Widom, J.: Query Optimization for XML. In: Proc. the 25th Int’l Conf. on Very Large Data Bases (VLDB), Edinburgh, Scotland, UK, September 7-10, pp. 315–326 (1999)

    Google Scholar 

  20. Naughton, J., et al.: The Niagara Internet Query System. IEEE Data Engineering Bulletin 24(2), 27–33 (2001)

    Google Scholar 

  21. Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph-structured XML Databases. In: Proc. 2002 ACM SIGMOD Int’l Conf. on Management of Data, Madison, Wisconsin, June 3-6, pp. 358–369 (2002)

    Google Scholar 

  22. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  23. Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and Querying Ordered XML Using a Relational Database System. In: Proc. 2002 ACM SIGMOD Int’l Conf. on Management of Data, Madison, Wisconsin, June 3-6, pp. 204–215 (2002)

    Google Scholar 

  24. Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A Path-based Approach to Storage and Retrieval of XML Documents using Relational Databases. ACM Trans. on Internet Technology(TOIT) 1(1), 110–141 (2001)

    Article  Google Scholar 

  25. Park, Y., Whang, K.-Y., Lee, B., Han, W.: Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques, Technical Report CS-TR-2004-212, Department of Computer Science, KAIST (December 2004), Also, available on AITrc Technical Report No. 04-11-048, http://aitrc.kaist.ac.kr/util_tr.htm (December 28, 2004)

  26. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohmann, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proc. 2001 ACM SIGMOD Int’l Conf. on Management of Data, Santa Barbara, California, May 21-24, pp. 425–436 (2001)

    Google Scholar 

  27. ReGet Deluxe 3.3 Beta (build 173), http://deluxe.reget.com/en/

  28. Teleport Pro Version 1.29, http://www.tenmax.com/teleport/pro/home.htm

  29. Xyleme, http://www.xyleme.com

  30. eXtensible Markup Language(XML), http://www.w3.org/XML/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Park, YH., Whang, KY., Lee, B.S., Han, WS. (2005). Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_11

Download citation

  • DOI: https://doi.org/10.1007/11408079_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25334-1

  • Online ISBN: 978-3-540-32005-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics