On the Effectiveness of Flexible Querying Heuristics for XML Data

  • Zografoula Vagena
  • Latha Colby
  • Fatma Özcan
  • Andrey Balmin
  • Quanzhong Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4704)


The ability to perform effective XML data retrieval in the absence of schema knowledge has recently received considerable attention. The majority of relevant proposals employs heuristics that identify groups of meaningfully related nodes using information extracted from the input data. These heuristics are employed to effectively prune the search space of all possible node combinations and their popularity is evident by the large number of such heuristics and the systems that use them. However, a comprehensive study detailing the relative merits of these heuristics has not been performed thus far. One of the challenges in performing this study is the fact that these techniques have been proposed within different and not directly comparable contexts. In this paper, we attempt to fill this gap. In particular, we first abstract the common selection problem that is tackled by the relatedness heuristics and show how each heuristic addresses this problem. We then identify data categories where the assumptions made by each heuristic are valid and draw insights on their possible effectiveness. Our findings can help systems implementors understand the strengths and weaknesses of each heuristic and provide simple guidelines for the applicability of each one.


User Query Real World Entity Optional Node Node Combination Relatedness Heuristic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amato, G., Debole, F., Rabiti, F., Savino, P., Zezula, P.: A Signature-Based Approach for Efficient Relationship Search on XML Data Collections. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 82–96. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Amer-Yahia, S., Lakshmanan, L.V., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: Proc. of SIGMOD, Paris, France, pp. 83–94 (2004)Google Scholar
  3. 3.
    Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection Semantics for Keyword Search in XML. In: Proc. of CIKM, Bremen, Germany (2005)Google Scholar
  4. 4.
    Cohen, S., Kanza, Y., Sagiv, Y.: Generating Relations from XML Documents. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, Springer, Heidelberg (2002)Google Scholar
  5. 5.
    Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proc. of VLDB, Berlin, Germany, pp. 45–56 (2003)Google Scholar
  6. 6.
    Delobel, C., Rousset, M.-C.: A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. In: Foundations of Models For Information Integration Workshop (FMII) (2001)Google Scholar
  7. 7.
    Graupmann, J., Schenkel, R., Weikum, G.: The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents. In: Proc. of VLDB, Trondheim, Norway, pp. 529–540 (2005)Google Scholar
  8. 8.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proc. of SIGMOD, San Diego, USA, pp. 16–27 (2003)Google Scholar
  9. 9.
    He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: Ranked Keyword Searches on Graphs. In: Proc. of SIGMOD, Beijing, China (2007)Google Scholar
  10. 10.
    Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of ICDE, Bangalore, India (2003)Google Scholar
  11. 11.
    Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: Proc. of VLDB, Toronto, Canada, pp. 72–83 (2004)Google Scholar
  12. 12.
    Liu, Z., Chen, Y.: Identifying Meaningful Return Information for XML Keyword Search. In: Proc. of SIGMOD, Beijing, China (2007)Google Scholar
  13. 13.
    Saito, T., Morishita, S.: Amoeba Join: Overcoming Structural Fluctuations in XML Data. In: Proc. of WebDB, Chicago, USA, pp. 38–43 (2006)Google Scholar
  14. 14.
    Schmidt, A., Kersten, M., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of ICDE, Heidelberg, Germany, pp. 321–329 (2001)Google Scholar
  15. 15.
    Sun, C., Chan, C.-Y., Goenka, A.K.: Multiway SLCA-based Keyword Search in XML Data. In: Proc. of WWW, Singapore, Singapore (2007)Google Scholar
  16. 16.
    Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Proc. of EDBT, Prague, Czech Republic, pp. 477–495 (2002)Google Scholar
  17. 17.
    Theobald, M., Schenkel, R., Weikum, G.: An Efficient and Versatile Query Engine for TopX Search. In: Proc. of VLDB, Trondheim, Norway, pp. 625–636 (2005)Google Scholar
  18. 18.
    XQuery 1.0: An XML Query Language, W3C Recommendation (January 2007), See
  19. 19.
    Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of SIGMOD, Baltimore, USA, pp. 537–538 (2005)Google Scholar
  20. 20.
    Zhang, S., Dyreson, C.: Symmetrically Exploiting XML. In: Proc. of WWW, Edinburgh, Scotland (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Zografoula Vagena
    • 1
  • Latha Colby
    • 1
  • Fatma Özcan
    • 1
  • Andrey Balmin
    • 1
  • Quanzhong Li
    • 1
  1. 1.IBM Almaden Research Center, 650 Harry Road, San Jose, CA 

Personalised recommendations