Advertisement

Edit Distance for XML Information Retrieval: Some Experiments on the Datacentric Track of INEX 2011

  • Cyril Laitang
  • Karen Pinel-Sauvagnat
  • Mohand Boughanem
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7424)

Abstract

In this paper we present our structured information retrieval model based on subgraphs similarity. Our approach combines a content propagation technique which handles sibling relationships with a document query matching process on structure. The latter is based on tree edit distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. As the effectiveness of TED relies both on the input tree and the edit costs, we experimented various subtree extraction techniques as well as different costs based on the DTD associated to the Datacentric collection.

Keywords

Edit Distance Lower Common Ancestor Lower Common Ancestor Intermediate Score Tree Edit Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alilaouar, A., Sedes, F.: Fuzzy querying of XML documents. In: International Conference on Web Intelligence and Intelligent Agent Technology, Compigne, France, pp. 11–14. IEEE/WIC/ACM (September 2005)Google Scholar
  2. 2.
    Barros, E.G., Moro, M.M., Laender, A.H.F.: An Evaluation Study of Search Algorithms for XML Streams. JIDM 1(3), 487–502 (2010)Google Scholar
  3. 3.
    Ben Aouicha, M., Tmar, M., Boughanem, M.: Flexible document-query matching based on a probabilistic content and structure score combination. In: Symposium on Applied Computing (SAC), Sierre, Switzerland. ACM (March 2010)Google Scholar
  4. 4.
    Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Damiani, E., Oliboni, B., Tanca, L.: Fuzzy Techniques for XML Data Smushing. In: Reusch, B. (ed.) Fuzzy Days 2001. LNCS, vol. 2206, pp. 637–652. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Dulucq, S., Touzet, H.: Analysis of Tree Edit Distance Algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 83–95. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM 5, 345 (1962)CrossRefGoogle Scholar
  8. 8.
    Sparck Jones, K.: Index term weighting. Information Storage and Retrieval 9(11), 619–633 (1973)CrossRefGoogle Scholar
  9. 9.
    Klein, P.N.: Computing the Edit-Distance between Unrooted Ordered Trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)Google Scholar
  10. 10.
    Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)MathSciNetGoogle Scholar
  11. 11.
    Mehdad, Y.: Automatic cost estimation for tree edit distance using particle swarm optimization. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 289–292 (2009)Google Scholar
  12. 12.
    Neuhaus, M., Bunke, H.: Automatic learning of cost functions for graph edit distance. Information Science 177(1), 239–247 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Oncina, J., Sebban, M.: Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recogn. 39, 1575–1587 (2006)zbMATHCrossRefGoogle Scholar
  14. 14.
    Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26, 422–433 (1979)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 Data Centric Track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Cyril Laitang
    • 1
  • Karen Pinel-Sauvagnat
    • 1
  • Mohand Boughanem
    • 1
  1. 1.IRIT-SIGToulouse Cedex 9France

Personalised recommendations