Skip to main content

On the Effectiveness of Flexible Querying Heuristics for XML Data

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 4704)

Abstract

The ability to perform effective XML data retrieval in the absence of schema knowledge has recently received considerable attention. The majority of relevant proposals employs heuristics that identify groups of meaningfully related nodes using information extracted from the input data. These heuristics are employed to effectively prune the search space of all possible node combinations and their popularity is evident by the large number of such heuristics and the systems that use them. However, a comprehensive study detailing the relative merits of these heuristics has not been performed thus far. One of the challenges in performing this study is the fact that these techniques have been proposed within different and not directly comparable contexts. In this paper, we attempt to fill this gap. In particular, we first abstract the common selection problem that is tackled by the relatedness heuristics and show how each heuristic addresses this problem. We then identify data categories where the assumptions made by each heuristic are valid and draw insights on their possible effectiveness. Our findings can help systems implementors understand the strengths and weaknesses of each heuristic and provide simple guidelines for the applicability of each one.

Keywords

  • User Query
  • Real World Entity
  • Optional Node
  • Node Combination
  • Relatedness Heuristic

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amato, G., Debole, F., Rabiti, F., Savino, P., Zezula, P.: A Signature-Based Approach for Efficient Relationship Search on XML Data Collections. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 82–96. Springer, Heidelberg (2004)

    Google Scholar 

  2. Amer-Yahia, S., Lakshmanan, L.V., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: Proc. of SIGMOD, Paris, France, pp. 83–94 (2004)

    Google Scholar 

  3. Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection Semantics for Keyword Search in XML. In: Proc. of CIKM, Bremen, Germany (2005)

    Google Scholar 

  4. Cohen, S., Kanza, Y., Sagiv, Y.: Generating Relations from XML Documents. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, Springer, Heidelberg (2002)

    Google Scholar 

  5. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proc. of VLDB, Berlin, Germany, pp. 45–56 (2003)

    Google Scholar 

  6. Delobel, C., Rousset, M.-C.: A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. In: Foundations of Models For Information Integration Workshop (FMII) (2001)

    Google Scholar 

  7. Graupmann, J., Schenkel, R., Weikum, G.: The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents. In: Proc. of VLDB, Trondheim, Norway, pp. 529–540 (2005)

    Google Scholar 

  8. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proc. of SIGMOD, San Diego, USA, pp. 16–27 (2003)

    Google Scholar 

  9. He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: Ranked Keyword Searches on Graphs. In: Proc. of SIGMOD, Beijing, China (2007)

    Google Scholar 

  10. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of ICDE, Bangalore, India (2003)

    Google Scholar 

  11. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proc. of VLDB, Toronto, Canada, pp. 72–83 (2004)

    Google Scholar 

  12. Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search. In: Proc. of SIGMOD, Beijing, China (2007)

    Google Scholar 

  13. Saito, T., Morishita, S.: Amoeba Join: Overcoming Structural Fluctuations in XML Data. In: Proc. of WebDB, Chicago, USA, pp. 38–43 (2006)

    Google Scholar 

  14. Schmidt, A., Kersten, M., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of ICDE, Heidelberg, Germany, pp. 321–329 (2001)

    Google Scholar 

  15. Sun, C., Chan, C.-Y., Goenka, A.K.: Multiway SLCA-based Keyword Search in XML Data. In: Proc. of WWW, Singapore, Singapore (2007)

    Google Scholar 

  16. Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Proc. of EDBT, Prague, Czech Republic, pp. 477–495 (2002)

    Google Scholar 

  17. Theobald, M., Schenkel, R., Weikum, G.: An Efficient and Versatile Query Engine for TopX Search. In: Proc. of VLDB, Trondheim, Norway, pp. 625–636 (2005)

    Google Scholar 

  18. XQuery 1.0: An XML Query Language, W3C Recommendation (January 2007), See http://www.w3.org/TR/xquery

  19. Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of SIGMOD, Baltimore, USA, pp. 537–538 (2005)

    Google Scholar 

  20. Zhang, S., Dyreson, C.: Symmetrically Exploiting XML. In: Proc. of WWW, Edinburgh, Scotland (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Denilson Barbosa Angela Bonifati Zohra Bellahsène Ela Hunt Rainer Unland

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vagena, Z., Colby, L., Özcan, F., Balmin, A., Li, Q. (2007). On the Effectiveness of Flexible Querying Heuristics for XML Data. In: Barbosa, D., Bonifati, A., Bellahsène, Z., Hunt, E., Unland, R. (eds) Database and XMLTechnologies. XSym 2007. Lecture Notes in Computer Science, vol 4704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75288-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75288-2_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75287-5

  • Online ISBN: 978-3-540-75288-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics