Abstract
Question classification plays a key role in question answering systems as the classification result will be useful for effectively locating correct answers. This paper addresses the problem of question classification by syntactic structure. To this end, questions are converted into parsed trees and each corresponding parsed tree is represented as a multi-dimensional sequence (MDS). Under this transformation from questions to MDSs, a new similarity measurement for comparing questions with MDS representations is presented. The new measurement, based on the all common subsequences, is proved to be a kernel, and can be computed in quadratic time. Experiments with kNN and SVM classifiers show that the proposed method is competitive in terms of classification accuracy and efficiency.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In the trees, SBARQ, WHNP, and etc al are tags defined in Penn Treebank II.
- 2.
Available at http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/.
References
Augsten, N., Bhlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 301–312. VLDB Endowment (2005)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems, vol. 14, pp. 625–632. MIT Press (2001)
Croce, D., Basili, R., Moschitti, A.: Semantic tree kernels for statistical natural language learning. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds.) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project, pp. 93–113. Springer International Publishing, Cham (2015)
Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, 1st edn. Addison-Wesley Publishing Company, Reading (2009)
Elzinga, C., Rahmann, S., Wang, H.: Algorithms for subsequence combinatorics. Theor. Comput. Sci. 409(3), 394–404 (2008)
Feng, G., Xiong, K., Tang, Y., Cui, A., Bai, J., Li, H., Yang, Q., Li, M.: Question classification by approximating semantics. In: Proceedings of the 24th International Conference on World Wide Web, pp. 407–417, Companion. ACM, New York (2015)
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Morristown, NJ, USA (2002)
Lin, Z., Wang, H., McClean, S.: Measuring tree similarity for natural language processing based information retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 13–23. Springer, Heidelberg (2010)
Lin, Z., Wang, H., McClean, S.: A multidimensional sequence approach to measuring tree similarity. IEEE Trans. Knowl. Data Eng. 24(2), 197–208 (2012)
Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. J. Data Knowl. Eng. 42(3), 315–325 (2002)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Moschitti, A.: Making tree kernels practical for natural language learning. In: Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy (2006)
Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceeding of the Association for Computational Linguistics, pp. 776–783 (2007)
Pan, Y., Tang, Y., Lin, L., Luo, Y.: Question classification with semantic tree kernel. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 837–838. ACM, New York (2008)
Punyakanok, V., Roth, D., Yih, W.-T.: Mapping dependencies trees: an application to question answering. In: Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics (2004)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Strzalkowski, T. (ed.): Natural language Information Retrieval. Kluwer, New York (1999)
Wang, H.: All common subsequences. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 635–640, Hyderabad, India (2007)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 26–32. ACM, New York (2003)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Acknowledgments
The authors would like to thank anonymous reviewers for their helpful comments to this paper by pointing out relevant literature and a number of annoying flaws in the submission. This paper is partially sponsored by EU DESIREE project (http://www.desiree-project.eu/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lin, Z., Wang, H., McClean, S. (2016). Tree Similarity Measurement for Classifying Questions by Syntactic Structures. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-42297-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)