Skip to main content

Measuring Tree Similarity for Natural Language Processing Based Information Retrieval

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6177))

Abstract

Natural language processing based information retrieval (NIR) aims to go beyond the conventional bag-of-words based information retrieval (KIR) by considering syntactic and even semantic information in documents. NIR is a conceptually appealing approach to IR, but is hard due to the need to measure distance/similarity between structures. We aim to move beyond the state of the art in measuring structure similarity for NIR.

In this paper, a novel tree similarity measurement dtwAcs is proposed in terms of a novel interpretation of trees as multi dimensional sequences. We calculate the distance between trees by the way of computing the distance between multi dimensional sequences, which is conducted by integrating the all common subsequences into the dynamic time warping method. Experimental result shows that dtwAcs outperforms the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mauldin, M.: Retrieval performance in FERRET: a conceptual information retrieval system. In: SIGIR 1991 (1991)

    Google Scholar 

  2. Strzalkowski, T. (ed.): Natural language Information Retrieval. Kluwer, New York (1999)

    MATH  Google Scholar 

  3. Carballo, J.P., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing Management 36(1), 155–178 (2000)

    Article  Google Scholar 

  4. Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Journal of Data and Knowledge Engineering (2002)

    Google Scholar 

  5. Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 26–32. ACM, New York (2003)

    Chapter  Google Scholar 

  7. Strzalkowski, T.: Natural Language Information Retrieval Project Homepage, http://www.cs.albany.edu/tomek

  8. Strzalkowski, T., Perez-Carballo, J., Karlgren, J., Hulth, A., Tapanainen, P., Lahtinen, T.: Natural language information retrieval: TREC-8 report. In: TREC 1999, pp. 381–390 (1999)

    Google Scholar 

  9. Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  10. Chawathe, S.S.: Comparing hierarchical data in external memory. In: VLDB 1999, Edinburgh, UK, pp. 90–101 (1999)

    Google Scholar 

  11. Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. Information System 31(3), 187–228 (2006)

    Article  Google Scholar 

  12. Selkow, S.M.: The tree-to-tree editing problem. Information Processing Letters 6(6), 184–186 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  13. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing 18(6), 1245–1262 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  14. Che, W., Zhang, M., Aw, A., Tan, C., Liu, T., Li, S.: Using a hybrid convolution tree kernel for semantic role labeling. ACM Transactions on Asian Language Information Processing (TALIP) 7(4), 1–23 (2008)

    Article  Google Scholar 

  15. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems 14, pp. 625–632. MIT Press, Cambridge (2001)

    Google Scholar 

  16. Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML 2002: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 291–298. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  17. Moschitti, A.: Making tree kernels practical for natural language learning. In: Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy (2006)

    Google Scholar 

  18. Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Computational Linguistics 34(2), 193–224 (2008)

    Article  MathSciNet  Google Scholar 

  19. Tetsuji, K., Kouichi, H., Hisashi, K., Kiyoko, F.K., Hiroshi, Y.: A spectrum tree kernel. Transactions of the Japanese Society for Artificial Intelligence 22(2), 140–147 (2007)

    Google Scholar 

  20. Vishwanathan, S.V.N., Smola, A.: Fast kernels for string and tree matching. Advances in Neural Information Processing Systems 15 (2003)

    Google Scholar 

  21. Haussler, D.: Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz (1999)

    Google Scholar 

  22. Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: ICML 2009: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 17–24. ACM, New York (2009)

    Google Scholar 

  23. Elzinga, C., Rahmann, S., Wang, H.: Algorithms for subsequence combinatorics. Theoretical Computer Science 409(3), 394–404 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  24. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  25. Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. Neural Information Processing Systems 15, 1441–1448 (2003)

    Google Scholar 

  26. Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)

    MathSciNet  Google Scholar 

  27. Wang, H.: All common subsequences. In: IJCAI 2007: Proceedings of the 20th international joint conference on Artifical intelligence, Hyderabad, India, pp. 635–640 (2007)

    Google Scholar 

  28. Sakoe, H.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 43–49 (1978)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, Z., Wang, H., McClean, S. (2010). Measuring Tree Similarity for Natural Language Processing Based Information Retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13881-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13881-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13880-5

  • Online ISBN: 978-3-642-13881-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics