Tree Similarity Measurement for Classifying Questions by Syntactic Structures

  • Zhiwei LinEmail author
  • Hui Wang
  • Sally McClean
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9773)


Question classification plays a key role in question answering systems as the classification result will be useful for effectively locating correct answers. This paper addresses the problem of question classification by syntactic structure. To this end, questions are converted into parsed trees and each corresponding parsed tree is represented as a multi-dimensional sequence (MDS). Under this transformation from questions to MDSs, a new similarity measurement for comparing questions with MDS representations is presented. The new measurement, based on the all common subsequences, is proved to be a kernel, and can be computed in quadratic time. Experiments with kNN and SVM classifiers show that the proposed method is competitive in terms of classification accuracy and efficiency.


Question classification Tree kernel Tree similarity Tree edit distance 



The authors would like to thank anonymous reviewers for their helpful comments to this paper by pointing out relevant literature and a number of annoying flaws in the submission. This paper is partially sponsored by EU DESIREE project (


  1. 1.
    Augsten, N., Bhlen, M., Gamper, J.: Approximate matching of hierarchical data using pq-grams. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 301–312. VLDB Endowment (2005)Google Scholar
  2. 2.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems, vol. 14, pp. 625–632. MIT Press (2001)Google Scholar
  3. 3.
    Croce, D., Basili, R., Moschitti, A.: Semantic tree kernels for statistical natural language learning. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds.) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project, pp. 93–113. Springer International Publishing, Cham (2015)Google Scholar
  4. 4.
    Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, 1st edn. Addison-Wesley Publishing Company, Reading (2009)Google Scholar
  5. 5.
    Elzinga, C., Rahmann, S., Wang, H.: Algorithms for subsequence combinatorics. Theor. Comput. Sci. 409(3), 394–404 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Feng, G., Xiong, K., Tang, Y., Cui, A., Bai, J., Li, H., Yang, Q., Li, M.: Question classification by approximating semantics. In: Proceedings of the 24th International Conference on World Wide Web, pp. 407–417, Companion. ACM, New York (2015)Google Scholar
  7. 7.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Morristown, NJ, USA (2002)Google Scholar
  8. 8.
    Lin, Z., Wang, H., McClean, S.: Measuring tree similarity for natural language processing based information retrieval. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 13–23. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Lin, Z., Wang, H., McClean, S.: A multidimensional sequence approach to measuring tree similarity. IEEE Trans. Knowl. Data Eng. 24(2), 197–208 (2012)CrossRefGoogle Scholar
  10. 10.
    Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. J. Data Knowl. Eng. 42(3), 315–325 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Moschitti, A.: Making tree kernels practical for natural language learning. In: Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy (2006)Google Scholar
  13. 13.
    Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceeding of the Association for Computational Linguistics, pp. 776–783 (2007)Google Scholar
  14. 14.
    Pan, Y., Tang, Y., Lin, L., Luo, Y.: Question classification with semantic tree kernel. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 837–838. ACM, New York (2008)Google Scholar
  15. 15.
    Punyakanok, V., Roth, D., Yih, W.-T.: Mapping dependencies trees: an application to question answering. In: Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics (2004)Google Scholar
  16. 16.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  17. 17.
    Strzalkowski, T. (ed.): Natural language Information Retrieval. Kluwer, New York (1999)zbMATHGoogle Scholar
  18. 18.
    Wang, H.: All common subsequences. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 635–640, Hyderabad, India (2007)Google Scholar
  19. 19.
    Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 26–32. ACM, New York (2003)Google Scholar
  20. 20.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Computing and EngineeringUlster UniversityColeraineUK

Personalised recommendations