Abstract
We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chernyak, E.L., Mirkin, B.G.: Computationally refining a taxonomy by using annotated suffix trees over Wikipedia resources. In: International Conference “Dialogue”, RGGU, vol. 12(19) (2013)
Galitsky, B.: Natural Language Question Answering System: Technique of Semantic Headers. In: Advanced Knowledge International, Australia (2003)
Galitsky, B., de la Rosa, J.L., Dobrocsi, G.: Inferring the semantic properties of sentences by mining syntactic parse trees. Data & Knowledge Engineering 81-82, 21–45 (2012)
Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse Thicket Representation for Multi-sentence Search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS 2013. LNCS, vol. 7735, pp. 153–172. Springer, Heidelberg (2013)
Galitsky, B., Kuznetsov, S.: Learning communicative actions of conflicting human agents. J. Exp. Theor. Artif. Intell. 20(4), 277–317 (2008)
Galitsky, B.: Machine Learning of Syntactic Parse Trees for Search and Classification of Text. In: Engineering Application of AI (2012), http://dx.doi.org/10.1016/j.engappai.2012.09.017
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O., Strok, F.: Text Retrieval Efficiency with Pattern Structures on Parse Thickets. In: Workshop “Formal Concept Analysis Meets Information Retrieval" at ECIR 2013, Moscow, Russia (2013)
Galitsky, B.: Transfer learning of syntactic structures for building taxonomies for search engines. Engineering Application of AI, http://dx.doi.org/10.1016/j.engappai.2013.08.010
Ganter, B., Kuznetsov, S.O.: Pattern Structures and Their Projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001)
Ehrlich, H.-C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: review. Wiley Interdisciplinary Reviews: Computational Molecular Science 1(1), 68–79 (2011)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society (2002)
Blinova, V.G., Dobrynin, D.A., Finn, V.K., Kuznetsov, S.O., Pankratova, E.S.: Toxicology Analysis by Means of the JSM-Method. Bioinformatics 19, 1201–1207 (2003)
Punyakanok, V., Roth, D., Yih, W.: Mapping dependencies trees: an application to question answering. In: Proceedings of AI & Math., Florida, USA (2004)
Kuznetsov, S.O., Samokhin, M.V.: Learning Closed Sets of Labeled Graphs for Chemical Applications. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 190–208. Springer, Heidelberg (2005)
Wu, J., Xuan, Z., Pan, D.: Enhancing text representation for classification tasks with semantic graph structures. International Journal of Innovative Computing, Information and Control (ICIC) 7(5(B))
Haussler, D.: Convolution kernels on discrete structures (1999)
Kann, V.: On the Approximability of the Maximum Common Subgraph Problem. In: Finkel, A., Jantzen, M. (eds.) STACS 1992. LNCS, vol. 577, pp. 377–388. Springer, Heidelberg (1992)
Lin, J.: Data-Intensive Text Processing with MapReduce (2013), intool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf
Cascading (2013), en.wikipedia.org/wiki/Cascading , http://www.cascading.org/
Dean, J.: Challenges in Building Large-Scale Information Retrieval Systems (2009), research.google.com/people/jeff/WSDM09-keynote.pdf
Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)
Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Empirial Methods in NLP 2004 (2004)
Polovina, S., Heaton, J.: An Introduction to Conceptual Graphs. AI Expert, 36–43 (1992)
Mann, W.C., Matthiessen, C.M.I.M., Thompson, S.A.: Rhetorical Structure Theory and Text Analysis. In: Mann, W.C., Thompson, S.A. (eds.) Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, pp. 39–78. John Benjamins, Amsterdam (1992)
Searle, J.: Speech acts: An essay in the philosophy of language. Cambridge University, Cambridge (1969)
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM (ACM) 16(9), 575–577 (1973)
Sun, J., Zhang, M., Tan, C.L.: Tree Sequence Kernel for Natural Language. AAAI-25 (2011)
Zhang, M., Che, W., Zhou, G., Aw, A., Tan, C., Liu, T., Li, S.: Semantic role labeling using a grammar-driven convolution tree kernel. IEEE Transactions on Audio, Speech, and Language Processing 16(7), 1315–1329 (2008)
Vismara, P., Valery, B.: Finding Maximum Common Connected Subgraphs Using Clique Detection or Constraint Satisfaction Algorithms. In: Thi, H.A.L., Bouvry, P., Dinh, T.P. (eds.) MCO 2008. CCIS, vol. 14, pp. 358–368. Springer, Heidelberg (2008)
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence 18(3), 265–298 (2004)
Montaner, M., Lopez, B., de la Rosa, J.L.: A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Review 19(4), 285–330 (2003)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of NIPS, pp. 625–632 (2002)
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4) (2013)
Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. Elsevier North-Holland, New York (1970)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. An Introduction to Natural Language Processing. Computational Linguistics, and Speech Recognition (2008)
Robinson, J.A.: A machine-oriented logic based on the resolution principle. Journal of the Association for Computing Machinery 12, 23–41 (1965)
Mill, J.S.: A system of logic, ratiocinative and inductive, London (1843)
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
Finn, V.K.: On the synthesis of cognitive procedures and the problem of induction. NTI Series 2, N1-2, pp. 8–45 (1999)
Mitchell, T.: Machine Learning. McGraw Hill (1997)
Furukawa, K.: From Deduction to Induction: Logical Perspective. In: Apt, K.R., Marek, V.W., Truszczynski, M., Warren, D.S. (eds.) The Logic Programming Paradigm. Springer (1998)
Bhasker, B., Srikumar, K.: Recommender Systems in E-Commerce. CUP (2012) ISBN 978-0-07-068067-8
Hennig-Thurau, H., Marchand, A., Marx, P.: Can Automated Group Recommender Systems Help Consumers Make Better Choices? Journal of Marketing 76(5), 89–109 (2012)
Trias i Mansilla, A., de la Rosa i Esteva, J.L.: Asknext: An Agent Protocol for Social Search. Information Sciences 190, 144–161 (2012)
Punyakanok, V., Roth, D., Yih, W.: The Necessity of Syntactic Parsing for Semantic Role Labeling. In: IJCAI 2005 (2005)
Domingos, P., Poon, H.: Unsupervised Semantic Parsing. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACL, Singapore (2009)
Marcu, D.: From Discourse Structures to Text Summaries. In: Mani, I., Maybury, M. (eds.) Proceedings of ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)
Abney, S.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)
Byun, H., Lee, S.-W.: Applications of Support Vector Machines for Pattern Recognition: A Survey. In: Lee, S.-W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, pp. 213–236. Springer, Heidelberg (2002)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Sun, J., Zhang, M., Tan, C.: Exploring syntactic structural features for sub-tree alignment using bilingual tree kernels. In: Proceedings of ACL, pp. 306–315 (2010)
Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Bersini, H., Saerens, M.: A Graph-Based Approach to Skill Extraction from Text. In: TextGraphs-8, Graph-based Methods for Natural Language Processing. Workshop at EMNLP 2013, Seattle, USA, October 18 (2013)
Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Galitsky, B.A., Ilvovsky, D., Kuznetsov, S.O., Strok, F. (2014). Finding Maximal Common Sub-parse Thickets for Multi-sentence Search. In: Croitoru, M., Rudolph, S., Woltran, S., Gonzales, C. (eds) Graph Structures for Knowledge Representation and Reasoning. Lecture Notes in Computer Science(), vol 8323. Springer, Cham. https://doi.org/10.1007/978-3-319-04534-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-04534-4_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04533-7
Online ISBN: 978-3-319-04534-4
eBook Packages: Computer ScienceComputer Science (R0)