Highly Heterogeneous XML Collections: How to Retrieve Precise Results?

  • Ismael Sanz
  • Marco Mesiti
  • Giovanna Guerrini
  • Rafael Berlanga Llavori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4027)


Highly heterogeneous XML collections are thematic collections exploiting different structures: the parent-child or ancestor-descendant relationships are not preserved and vocabulary discrepancies in the element names can occur. In this setting current approaches return answers with low precision. By means of similarity measures and semantic inverted indices we present an approach for improving the precision of query answers without compromising performance.


Inverted Index Pattern Index Region Construction Tree Pattern Query Pattern Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amer-Yahia, S., et al.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: VLDB (2005)Google Scholar
  3. 3.
    Buneman, P., et al.: Adding Structure to Unstructured Data. In: ICDT (1997)Google Scholar
  4. 4.
    Damiani, E., Tanca, L.: Blind Queries to XML Data. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 345–356. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Grust, T.: Accelerating XPath Location Steps. In: SIGMOD, pp. 109–120 (2002)Google Scholar
  6. 6.
    Kanza, Y., Sagiv, Y.: Flexible Queries Over Semistructured Data. In: PODS (2001)Google Scholar
  7. 7.
    Kilpeläinen, P.: Tree Matching Problems with Applications to Structured Text Databases. Ph.D thesis, University of Helsinki (1992)Google Scholar
  8. 8.
    Luk, R.W., et al.: A Survey in Indexing and Searching XML Documents. JASIS 53, 415–438 (2002)CrossRefGoogle Scholar
  9. 9.
    Marian, A., et al.: Adaptive Processing of Top-k Queries in XML. In: ICDE (2005)Google Scholar
  10. 10.
    Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: WebDB, pp. 61–66 (2002)Google Scholar
  11. 11.
    Sanz, I., et al.: Approximate Subtree Identification in Heterogeneous XML Documents Collections. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 192–206. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Sanz, I., et al.: Highly Heterogeneous XML Collections: How to find good results? TR University of Genova (2006)Google Scholar
  13. 13.
    Blanken, H.M., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.): Intelligent Search on XML Data. LNCS, vol. 2818, pp. 119–131. Springer, Heidelberg (2003)zbMATHCrossRefGoogle Scholar
  14. 14.
    Schlieder, T., Naumann, F.: Approximate Tree Embedding for Querying XML Data. In: ACM SIGIR Workshop on XML and IR (2000)Google Scholar
  15. 15.
    Schlieder, T.: Schema-Driven Evaluation of Approximate Tree-Pattern Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 514–532. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Shasha, D., et al.: ATreeGrep: Approximate Searching in Unordered Trees. In: 14th Conf. on Scientific and Statistical Database Management, pp. 89–98 (2002)Google Scholar
  17. 17.
    Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  18. 18.
    Wagner, R.A., Fischer, M.J.: The String-to-string Correction Problem. J. of the ACM 21, 168–173 (1974)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ismael Sanz
    • 1
  • Marco Mesiti
    • 2
  • Giovanna Guerrini
    • 3
  • Rafael Berlanga Llavori
    • 1
  1. 1.Universitat Jaume ICastellónSpain
  2. 2.Università di MilanoItaly
  3. 3.Università di GenovaItaly

Personalised recommendations