Skip to main content

Edit Distance for XML Information Retrieval: Some Experiments on the Datacentric Track of INEX 2011

  • Conference paper
Book cover Focused Retrieval of Content and Structure (INEX 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7424))

Abstract

In this paper we present our structured information retrieval model based on subgraphs similarity. Our approach combines a content propagation technique which handles sibling relationships with a document query matching process on structure. The latter is based on tree edit distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. As the effectiveness of TED relies both on the input tree and the edit costs, we experimented various subtree extraction techniques as well as different costs based on the DTD associated to the Datacentric collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alilaouar, A., Sedes, F.: Fuzzy querying of XML documents. In: International Conference on Web Intelligence and Intelligent Agent Technology, Compigne, France, pp. 11–14. IEEE/WIC/ACM (September 2005)

    Google Scholar 

  2. Barros, E.G., Moro, M.M., Laender, A.H.F.: An Evaluation Study of Search Algorithms for XML Streams. JIDM 1(3), 487–502 (2010)

    Google Scholar 

  3. Ben Aouicha, M., Tmar, M., Boughanem, M.: Flexible document-query matching based on a probabilistic content and structure score combination. In: Symposium on Applied Computing (SAC), Sierre, Switzerland. ACM (March 2010)

    Google Scholar 

  4. Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Damiani, E., Oliboni, B., Tanca, L.: Fuzzy Techniques for XML Data Smushing. In: Reusch, B. (ed.) Fuzzy Days 2001. LNCS, vol. 2206, pp. 637–652. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  6. Dulucq, S., Touzet, H.: Analysis of Tree Edit Distance Algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 83–95. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Floyd, R.W.: Algorithm 97: Shortest path. Commun. ACM 5, 345 (1962)

    Article  Google Scholar 

  8. Sparck Jones, K.: Index term weighting. Information Storage and Retrieval 9(11), 619–633 (1973)

    Article  Google Scholar 

  9. Klein, P.N.: Computing the Edit-Distance between Unrooted Ordered Trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)

    Google Scholar 

  10. Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)

    MathSciNet  Google Scholar 

  11. Mehdad, Y.: Automatic cost estimation for tree edit distance using particle swarm optimization. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 289–292 (2009)

    Google Scholar 

  12. Neuhaus, M., Bunke, H.: Automatic learning of cost functions for graph edit distance. Information Science 177(1), 239–247 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Oncina, J., Sebban, M.: Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recogn. 39, 1575–1587 (2006)

    Article  MATH  Google Scholar 

  14. Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26, 422–433 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  15. Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 Data Centric Track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Laitang, C., Pinel-Sauvagnat, K., Boughanem, M. (2012). Edit Distance for XML Information Retrieval: Some Experiments on the Datacentric Track of INEX 2011. In: Geva, S., Kamps, J., Schenkel, R. (eds) Focused Retrieval of Content and Structure. INEX 2011. Lecture Notes in Computer Science, vol 7424. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35734-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35734-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35733-6

  • Online ISBN: 978-3-642-35734-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics