Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

  • Boris A. Galitsky
  • Dmitry Ilvovsky
  • Sergei O. Kuznetsov
  • Fedor Strok
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8323)


We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.


Maximal Clique Parse Tree Suffix Tree Maximal Clique Problem Common Subgraph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chernyak, E.L., Mirkin, B.G.: Computationally refining a taxonomy by using annotated suffix trees over Wikipedia resources. In: International Conference “Dialogue”, RGGU, vol. 12(19) (2013)Google Scholar
  2. 2.
    Galitsky, B.: Natural Language Question Answering System: Technique of Semantic Headers. In: Advanced Knowledge International, Australia (2003)Google Scholar
  3. 3.
    Galitsky, B., de la Rosa, J.L., Dobrocsi, G.: Inferring the semantic properties of sentences by mining syntactic parse trees. Data & Knowledge Engineering 81-82, 21–45 (2012)CrossRefGoogle Scholar
  4. 4.
    Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse Thicket Representation for Multi-sentence Search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS 2013. LNCS, vol. 7735, pp. 153–172. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Galitsky, B., Kuznetsov, S.: Learning communicative actions of conflicting human agents. J. Exp. Theor. Artif. Intell. 20(4), 277–317 (2008)CrossRefzbMATHGoogle Scholar
  6. 6.
    Galitsky, B.: Machine Learning of Syntactic Parse Trees for Search and Classification of Text. In: Engineering Application of AI (2012),
  7. 7.
    Galitsky, B., Ilvovsky, D., Kuznetsov, S.O., Strok, F.: Text Retrieval Efficiency with Pattern Structures on Parse Thickets. In: Workshop “Formal Concept Analysis Meets Information Retrieval" at ECIR 2013, Moscow, Russia (2013)Google Scholar
  8. 8.
    Galitsky, B.: Transfer learning of syntactic structures for building taxonomies for search engines. Engineering Application of AI,
  9. 9.
    Ganter, B., Kuznetsov, S.O.: Pattern Structures and Their Projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Ehrlich, H.-C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: review. Wiley Interdisciplinary Reviews: Computational Molecular Science 1(1), 68–79 (2011)Google Scholar
  11. 11.
    Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society (2002)Google Scholar
  12. 12.
    Blinova, V.G., Dobrynin, D.A., Finn, V.K., Kuznetsov, S.O., Pankratova, E.S.: Toxicology Analysis by Means of the JSM-Method. Bioinformatics 19, 1201–1207 (2003)CrossRefGoogle Scholar
  13. 13.
    Punyakanok, V., Roth, D., Yih, W.: Mapping dependencies trees: an application to question answering. In: Proceedings of AI & Math., Florida, USA (2004)Google Scholar
  14. 14.
    Kuznetsov, S.O., Samokhin, M.V.: Learning Closed Sets of Labeled Graphs for Chemical Applications. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 190–208. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Wu, J., Xuan, Z., Pan, D.: Enhancing text representation for classification tasks with semantic graph structures. International Journal of Innovative Computing, Information and Control (ICIC) 7(5(B))Google Scholar
  16. 16.
    Haussler, D.: Convolution kernels on discrete structures (1999)Google Scholar
  17. 17.
    Kann, V.: On the Approximability of the Maximum Common Subgraph Problem. In: Finkel, A., Jantzen, M. (eds.) STACS 1992. LNCS, vol. 577, pp. 377–388. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  18. 18.
    Lin, J.: Data-Intensive Text Processing with MapReduce (2013),
  19. 19.
  20. 20.
    Dean, J.: Challenges in Building Large-Scale Information Retrieval Systems (2009),
  21. 21.
    Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)Google Scholar
  22. 22.
    Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Empirial Methods in NLP 2004 (2004)Google Scholar
  24. 24.
    Polovina, S., Heaton, J.: An Introduction to Conceptual Graphs. AI Expert, 36–43 (1992)Google Scholar
  25. 25.
    Mann, W.C., Matthiessen, C.M.I.M., Thompson, S.A.: Rhetorical Structure Theory and Text Analysis. In: Mann, W.C., Thompson, S.A. (eds.) Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, pp. 39–78. John Benjamins, Amsterdam (1992)CrossRefGoogle Scholar
  26. 26.
    Searle, J.: Speech acts: An essay in the philosophy of language. Cambridge University, Cambridge (1969)CrossRefGoogle Scholar
  27. 27.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM (ACM) 16(9), 575–577 (1973)CrossRefzbMATHGoogle Scholar
  28. 28.
    Sun, J., Zhang, M., Tan, C.L.: Tree Sequence Kernel for Natural Language. AAAI-25 (2011)Google Scholar
  29. 29.
    Zhang, M., Che, W., Zhou, G., Aw, A., Tan, C., Liu, T., Li, S.: Semantic role labeling using a grammar-driven convolution tree kernel. IEEE Transactions on Audio, Speech, and Language Processing 16(7), 1315–1329 (2008)CrossRefGoogle Scholar
  30. 30.
    Vismara, P., Valery, B.: Finding Maximum Common Connected Subgraphs Using Clique Detection or Constraint Satisfaction Algorithms. In: Thi, H.A.L., Bouvry, P., Dinh, T.P. (eds.) MCO 2008. CCIS, vol. 14, pp. 358–368. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence 18(3), 265–298 (2004)CrossRefGoogle Scholar
  32. 32.
    Montaner, M., Lopez, B., de la Rosa, J.L.: A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Review 19(4), 285–330 (2003)CrossRefGoogle Scholar
  33. 33.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of NIPS, pp. 625–632 (2002)Google Scholar
  34. 34.
    Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4) (2013)Google Scholar
  35. 35.
    Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. Elsevier North-Holland, New York (1970)Google Scholar
  36. 36.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing. An Introduction to Natural Language Processing. Computational Linguistics, and Speech Recognition (2008)Google Scholar
  37. 37.
    Robinson, J.A.: A machine-oriented logic based on the resolution principle. Journal of the Association for Computing Machinery 12, 23–41 (1965)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    Mill, J.S.: A system of logic, ratiocinative and inductive, London (1843)Google Scholar
  39. 39.
    Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)zbMATHGoogle Scholar
  40. 40.
    Finn, V.K.: On the synthesis of cognitive procedures and the problem of induction. NTI Series 2, N1-2, pp. 8–45 (1999)Google Scholar
  41. 41.
    Mitchell, T.: Machine Learning. McGraw Hill (1997)Google Scholar
  42. 42.
    Furukawa, K.: From Deduction to Induction: Logical Perspective. In: Apt, K.R., Marek, V.W., Truszczynski, M., Warren, D.S. (eds.) The Logic Programming Paradigm. Springer (1998)Google Scholar
  43. 43.
    Bhasker, B., Srikumar, K.: Recommender Systems in E-Commerce. CUP (2012) ISBN 978-0-07-068067-8Google Scholar
  44. 44.
    Hennig-Thurau, H., Marchand, A., Marx, P.: Can Automated Group Recommender Systems Help Consumers Make Better Choices? Journal of Marketing 76(5), 89–109 (2012)CrossRefGoogle Scholar
  45. 45.
    Trias i Mansilla, A., de la Rosa i Esteva, J.L.: Asknext: An Agent Protocol for Social Search. Information Sciences 190, 144–161 (2012)CrossRefGoogle Scholar
  46. 46.
    Punyakanok, V., Roth, D., Yih, W.: The Necessity of Syntactic Parsing for Semantic Role Labeling. In: IJCAI 2005 (2005)Google Scholar
  47. 47.
    Domingos, P., Poon, H.: Unsupervised Semantic Parsing. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACL, Singapore (2009)Google Scholar
  48. 48.
    Marcu, D.: From Discourse Structures to Text Summaries. In: Mani, I., Maybury, M. (eds.) Proceedings of ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)Google Scholar
  49. 49.
    Abney, S.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)Google Scholar
  50. 50.
    Byun, H., Lee, S.-W.: Applications of Support Vector Machines for Pattern Recognition: A Survey. In: Lee, S.-W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, pp. 213–236. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  51. 51.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)Google Scholar
  52. 52.
    Sun, J., Zhang, M., Tan, C.: Exploring syntactic structural features for sub-tree alignment using bilingual tree kernels. In: Proceedings of ACL, pp. 306–315 (2010)Google Scholar
  53. 53.
    Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Bersini, H., Saerens, M.: A Graph-Based Approach to Skill Extraction from Text. In: TextGraphs-8, Graph-based Methods for Natural Language Processing. Workshop at EMNLP 2013, Seattle, USA, October 18 (2013)Google Scholar
  54. 54.
    Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Boris A. Galitsky
    • 1
  • Dmitry Ilvovsky
    • 2
  • Sergei O. Kuznetsov
    • 2
  • Fedor Strok
    • 2
  1. 1.eBay IncSan JoseUSA
  2. 2.Higher School of EconomicsMoscowRussia

Personalised recommendations