Abstract
The ability to perform effective XML data retrieval in the absence of schema knowledge has recently received considerable attention. The majority of relevant proposals employs heuristics that identify groups of meaningfully related nodes using information extracted from the input data. These heuristics are employed to effectively prune the search space of all possible node combinations and their popularity is evident by the large number of such heuristics and the systems that use them. However, a comprehensive study detailing the relative merits of these heuristics has not been performed thus far. One of the challenges in performing this study is the fact that these techniques have been proposed within different and not directly comparable contexts. In this paper, we attempt to fill this gap. In particular, we first abstract the common selection problem that is tackled by the relatedness heuristics and show how each heuristic addresses this problem. We then identify data categories where the assumptions made by each heuristic are valid and draw insights on their possible effectiveness. Our findings can help systems implementors understand the strengths and weaknesses of each heuristic and provide simple guidelines for the applicability of each one.
Keywords
- User Query
- Real World Entity
- Optional Node
- Node Combination
- Relatedness Heuristic
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amato, G., Debole, F., Rabiti, F., Savino, P., Zezula, P.: A Signature-Based Approach for Efficient Relationship Search on XML Data Collections. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 82–96. Springer, Heidelberg (2004)
Amer-Yahia, S., Lakshmanan, L.V., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: Proc. of SIGMOD, Paris, France, pp. 83–94 (2004)
Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection Semantics for Keyword Search in XML. In: Proc. of CIKM, Bremen, Germany (2005)
Cohen, S., Kanza, Y., Sagiv, Y.: Generating Relations from XML Documents. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, Springer, Heidelberg (2002)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proc. of VLDB, Berlin, Germany, pp. 45–56 (2003)
Delobel, C., Rousset, M.-C.: A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. In: Foundations of Models For Information Integration Workshop (FMII) (2001)
Graupmann, J., Schenkel, R., Weikum, G.: The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents. In: Proc. of VLDB, Trondheim, Norway, pp. 529–540 (2005)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proc. of SIGMOD, San Diego, USA, pp. 16–27 (2003)
He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: Ranked Keyword Searches on Graphs. In: Proc. of SIGMOD, Beijing, China (2007)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of ICDE, Bangalore, India (2003)
Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proc. of VLDB, Toronto, Canada, pp. 72–83 (2004)
Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search. In: Proc. of SIGMOD, Beijing, China (2007)
Saito, T., Morishita, S.: Amoeba Join: Overcoming Structural Fluctuations in XML Data. In: Proc. of WebDB, Chicago, USA, pp. 38–43 (2006)
Schmidt, A., Kersten, M., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of ICDE, Heidelberg, Germany, pp. 321–329 (2001)
Sun, C., Chan, C.-Y., Goenka, A.K.: Multiway SLCA-based Keyword Search in XML Data. In: Proc. of WWW, Singapore, Singapore (2007)
Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Proc. of EDBT, Prague, Czech Republic, pp. 477–495 (2002)
Theobald, M., Schenkel, R., Weikum, G.: An Efficient and Versatile Query Engine for TopX Search. In: Proc. of VLDB, Trondheim, Norway, pp. 625–636 (2005)
XQuery 1.0: An XML Query Language, W3C Recommendation (January 2007), See http://www.w3.org/TR/xquery
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of SIGMOD, Baltimore, USA, pp. 537–538 (2005)
Zhang, S., Dyreson, C.: Symmetrically Exploiting XML. In: Proc. of WWW, Edinburgh, Scotland (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vagena, Z., Colby, L., Özcan, F., Balmin, A., Li, Q. (2007). On the Effectiveness of Flexible Querying Heuristics for XML Data. In: Barbosa, D., Bonifati, A., Bellahsène, Z., Hunt, E., Unland, R. (eds) Database and XMLTechnologies. XSym 2007. Lecture Notes in Computer Science, vol 4704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75288-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-75288-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75287-5
Online ISBN: 978-3-540-75288-2
eBook Packages: Computer ScienceComputer Science (R0)
