Measuring Tree Similarity for Natural Language Processing Based Information Retrieval

Lin, Zhiwei; Wang, Hui; McClean, Sally

doi:10.1007/978-3-642-13881-2_2

Zhiwei Lin²⁰,
Hui Wang²⁰ &
Sally McClean²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6177))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1435 Accesses
6 Citations

Abstract

Natural language processing based information retrieval (NIR) aims to go beyond the conventional bag-of-words based information retrieval (KIR) by considering syntactic and even semantic information in documents. NIR is a conceptually appealing approach to IR, but is hard due to the need to measure distance/similarity between structures. We aim to move beyond the state of the art in measuring structure similarity for NIR.

In this paper, a novel tree similarity measurement dtwAcs is proposed in terms of a novel interpretation of trees as multi dimensional sequences. We calculate the distance between trees by the way of computing the distance between multi dimensional sequences, which is conducted by integrating the all common subsequences into the dynamic time warping method. Experimental result shows that dtwAcs outperforms the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mauldin, M.: Retrieval performance in FERRET: a conceptual information retrieval system. In: SIGIR 1991 (1991)
Google Scholar
Strzalkowski, T. (ed.): Natural language Information Retrieval. Kluwer, New York (1999)
MATH Google Scholar
Carballo, J.P., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing Management 36(1), 155–178 (2000)
Article Google Scholar
Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Journal of Data and Knowledge Engineering (2002)
Google Scholar
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 26–32. ACM, New York (2003)
Chapter Google Scholar
Strzalkowski, T.: Natural Language Information Retrieval Project Homepage, http://www.cs.albany.edu/tomek
Strzalkowski, T., Perez-Carballo, J., Karlgren, J., Hulth, A., Tapanainen, P., Lahtinen, T.: Natural language information retrieval: TREC-8 report. In: TREC 1999, pp. 381–390 (1999)
Google Scholar
Bille, P.: A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1-3), 217–239 (2005)
Article MATH MathSciNet Google Scholar
Chawathe, S.S.: Comparing hierarchical data in external memory. In: VLDB 1999, Edinburgh, UK, pp. 90–101 (1999)
Google Scholar
Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. Information System 31(3), 187–228 (2006)
Article Google Scholar
Selkow, S.M.: The tree-to-tree editing problem. Information Processing Letters 6(6), 184–186 (1977)
Article MATH MathSciNet Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing 18(6), 1245–1262 (1989)
Article MATH MathSciNet Google Scholar
Che, W., Zhang, M., Aw, A., Tan, C., Liu, T., Li, S.: Using a hybrid convolution tree kernel for semantic role labeling. ACM Transactions on Asian Language Information Processing (TALIP) 7(4), 1–23 (2008)
Article Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems 14, pp. 625–632. MIT Press, Cambridge (2001)
Google Scholar
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML 2002: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 291–298. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Moschitti, A.: Making tree kernels practical for natural language learning. In: Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy (2006)
Google Scholar
Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Computational Linguistics 34(2), 193–224 (2008)
Article MathSciNet Google Scholar
Tetsuji, K., Kouichi, H., Hisashi, K., Kiyoko, F.K., Hiroshi, Y.: A spectrum tree kernel. Transactions of the Japanese Society for Artificial Intelligence 22(2), 140–147 (2007)
Google Scholar
Vishwanathan, S.V.N., Smola, A.: Fast kernels for string and tree matching. Advances in Neural Information Processing Systems 15 (2003)
Google Scholar
Haussler, D.: Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz (1999)
Google Scholar
Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: ICML 2009: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 17–24. ACM, New York (2009)
Google Scholar
Elzinga, C., Rahmann, S., Wang, H.: Algorithms for subsequence combinatorics. Theoretical Computer Science 409(3), 394–404 (2008)
Article MATH MathSciNet Google Scholar
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)
Article MATH MathSciNet Google Scholar
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. Neural Information Processing Systems 15, 1441–1448 (2003)
Google Scholar
Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)
MathSciNet Google Scholar
Wang, H.: All common subsequences. In: IJCAI 2007: Proceedings of the 20th international joint conference on Artifical intelligence, Hyderabad, India, pp. 635–640 (2007)
Google Scholar
Sakoe, H.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 43–49 (1978)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Engineering, University of Ulster, Northern Ireland, United Kingdom
Zhiwei Lin, Hui Wang & Sally McClean

Authors

Zhiwei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sally McClean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Cardiff University, UK
Christina J. Hopfe & Haijiang Li &
Informatics Research Institute, University of Salford, M5 4WT, Greater Manchester, UK
Yacine Rezgui
Centre National des Arts et Métiers,
Elisabeth Métais
School of Computer Science, Cardiff University, UK
Alun Preece

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, Z., Wang, H., McClean, S. (2010). Measuring Tree Similarity for Natural Language Processing Based Information Retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13881-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-13881-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13880-5
Online ISBN: 978-3-642-13881-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics