Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

Galitsky, Boris A.; Ilvovsky, Dmitry; Kuznetsov, Sergei O.; Strok, Fedor

doi:10.1007/978-3-319-04534-4_4

Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

Boris A. Galitsky²³,
Dmitry Ilvovsky²⁴,
Sergei O. Kuznetsov²⁴ &
…
Fedor Strok²⁴

Conference paper

858 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8323))

Abstract

We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chernyak, E.L., Mirkin, B.G.: Computationally refining a taxonomy by using annotated suffix trees over Wikipedia resources. In: International Conference “Dialogue”, RGGU, vol. 12(19) (2013)
Google Scholar
Galitsky, B.: Natural Language Question Answering System: Technique of Semantic Headers. In: Advanced Knowledge International, Australia (2003)
Google Scholar
Galitsky, B., de la Rosa, J.L., Dobrocsi, G.: Inferring the semantic properties of sentences by mining syntactic parse trees. Data & Knowledge Engineering 81-82, 21–45 (2012)
Article Google Scholar
Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse Thicket Representation for Multi-sentence Search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS 2013. LNCS, vol. 7735, pp. 153–172. Springer, Heidelberg (2013)
Chapter Google Scholar
Galitsky, B., Kuznetsov, S.: Learning communicative actions of conflicting human agents. J. Exp. Theor. Artif. Intell. 20(4), 277–317 (2008)
Article MATH Google Scholar
Galitsky, B.: Machine Learning of Syntactic Parse Trees for Search and Classification of Text. In: Engineering Application of AI (2012), http://dx.doi.org/10.1016/j.engappai.2012.09.017
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O., Strok, F.: Text Retrieval Efficiency with Pattern Structures on Parse Thickets. In: Workshop “Formal Concept Analysis Meets Information Retrieval" at ECIR 2013, Moscow, Russia (2013)
Google Scholar
Galitsky, B.: Transfer learning of syntactic structures for building taxonomies for search engines. Engineering Application of AI, http://dx.doi.org/10.1016/j.engappai.2013.08.010
Ganter, B., Kuznetsov, S.O.: Pattern Structures and Their Projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001)
Chapter Google Scholar
Ehrlich, H.-C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: review. Wiley Interdisciplinary Reviews: Computational Molecular Science 1(1), 68–79 (2011)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society (2002)
Google Scholar
Blinova, V.G., Dobrynin, D.A., Finn, V.K., Kuznetsov, S.O., Pankratova, E.S.: Toxicology Analysis by Means of the JSM-Method. Bioinformatics 19, 1201–1207 (2003)
Article Google Scholar
Punyakanok, V., Roth, D., Yih, W.: Mapping dependencies trees: an application to question answering. In: Proceedings of AI & Math., Florida, USA (2004)
Google Scholar
Kuznetsov, S.O., Samokhin, M.V.: Learning Closed Sets of Labeled Graphs for Chemical Applications. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 190–208. Springer, Heidelberg (2005)
Chapter Google Scholar
Wu, J., Xuan, Z., Pan, D.: Enhancing text representation for classification tasks with semantic graph structures. International Journal of Innovative Computing, Information and Control (ICIC) 7(5(B))
Google Scholar
Haussler, D.: Convolution kernels on discrete structures (1999)
Google Scholar
Kann, V.: On the Approximability of the Maximum Common Subgraph Problem. In: Finkel, A., Jantzen, M. (eds.) STACS 1992. LNCS, vol. 577, pp. 377–388. Springer, Heidelberg (1992)
Chapter Google Scholar
Lin, J.: Data-Intensive Text Processing with MapReduce (2013), intool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf
Cascading (2013), en.wikipedia.org/wiki/Cascading , http://www.cascading.org/
Dean, J.: Challenges in Building Large-Scale Information Retrieval Systems (2009), research.google.com/people/jeff/WSDM09-keynote.pdf
Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)
Google Scholar
Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Empirial Methods in NLP 2004 (2004)
Google Scholar
Polovina, S., Heaton, J.: An Introduction to Conceptual Graphs. AI Expert, 36–43 (1992)
Google Scholar
Mann, W.C., Matthiessen, C.M.I.M., Thompson, S.A.: Rhetorical Structure Theory and Text Analysis. In: Mann, W.C., Thompson, S.A. (eds.) Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, pp. 39–78. John Benjamins, Amsterdam (1992)
Chapter Google Scholar
Searle, J.: Speech acts: An essay in the philosophy of language. Cambridge University, Cambridge (1969)
Book Google Scholar
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM (ACM) 16(9), 575–577 (1973)
Article MATH Google Scholar
Sun, J., Zhang, M., Tan, C.L.: Tree Sequence Kernel for Natural Language. AAAI-25 (2011)
Google Scholar
Zhang, M., Che, W., Zhou, G., Aw, A., Tan, C., Liu, T., Li, S.: Semantic role labeling using a grammar-driven convolution tree kernel. IEEE Transactions on Audio, Speech, and Language Processing 16(7), 1315–1329 (2008)
Article Google Scholar
Vismara, P., Valery, B.: Finding Maximum Common Connected Subgraphs Using Clique Detection or Constraint Satisfaction Algorithms. In: Thi, H.A.L., Bouvry, P., Dinh, T.P. (eds.) MCO 2008. CCIS, vol. 14, pp. 358–368. Springer, Heidelberg (2008)
Chapter Google Scholar
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence 18(3), 265–298 (2004)
Article Google Scholar
Montaner, M., Lopez, B., de la Rosa, J.L.: A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Review 19(4), 285–330 (2003)
Article Google Scholar
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of NIPS, pp. 625–632 (2002)
Google Scholar
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4) (2013)
Google Scholar
Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. Elsevier North-Holland, New York (1970)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. An Introduction to Natural Language Processing. Computational Linguistics, and Speech Recognition (2008)
Google Scholar
Robinson, J.A.: A machine-oriented logic based on the resolution principle. Journal of the Association for Computing Machinery 12, 23–41 (1965)
Article MATH MathSciNet Google Scholar
Mill, J.S.: A system of logic, ratiocinative and inductive, London (1843)
Google Scholar
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
MATH Google Scholar
Finn, V.K.: On the synthesis of cognitive procedures and the problem of induction. NTI Series 2, N1-2, pp. 8–45 (1999)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill (1997)
Google Scholar
Furukawa, K.: From Deduction to Induction: Logical Perspective. In: Apt, K.R., Marek, V.W., Truszczynski, M., Warren, D.S. (eds.) The Logic Programming Paradigm. Springer (1998)
Google Scholar
Bhasker, B., Srikumar, K.: Recommender Systems in E-Commerce. CUP (2012) ISBN 978-0-07-068067-8
Google Scholar
Hennig-Thurau, H., Marchand, A., Marx, P.: Can Automated Group Recommender Systems Help Consumers Make Better Choices? Journal of Marketing 76(5), 89–109 (2012)
Article Google Scholar
Trias i Mansilla, A., de la Rosa i Esteva, J.L.: Asknext: An Agent Protocol for Social Search. Information Sciences 190, 144–161 (2012)
Article Google Scholar
Punyakanok, V., Roth, D., Yih, W.: The Necessity of Syntactic Parsing for Semantic Role Labeling. In: IJCAI 2005 (2005)
Google Scholar
Domingos, P., Poon, H.: Unsupervised Semantic Parsing. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACL, Singapore (2009)
Google Scholar
Marcu, D.: From Discourse Structures to Text Summaries. In: Mani, I., Maybury, M. (eds.) Proceedings of ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)
Google Scholar
Abney, S.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)
Google Scholar
Byun, H., Lee, S.-W.: Applications of Support Vector Machines for Pattern Recognition: A Survey. In: Lee, S.-W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, pp. 213–236. Springer, Heidelberg (2002)
Chapter Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Google Scholar
Sun, J., Zhang, M., Tan, C.: Exploring syntactic structural features for sub-tree alignment using bilingual tree kernels. In: Proceedings of ACL, pp. 306–315 (2010)
Google Scholar
Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Bersini, H., Saerens, M.: A Graph-Based Approach to Skill Extraction from Text. In: TextGraphs-8, Graph-based Methods for Natural Language Processing. Workshop at EMNLP 2013, Seattle, USA, October 18 (2013)
Google Scholar
Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

eBay Inc, San Jose, CA, USA
Boris A. Galitsky
Higher School of Economics, Moscow, Russia
Dmitry Ilvovsky, Sergei O. Kuznetsov & Fedor Strok

Authors

Boris A. Galitsky
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Ilvovsky
View author publications
You can also search for this author in PubMed Google Scholar
Sergei O. Kuznetsov
View author publications
You can also search for this author in PubMed Google Scholar
Fedor Strok
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRMM, 161, rue ADA, 34392, Montpellier Cedex 5, France
Madalina Croitoru
Fakultät Informatik, Technische Universität Dresden, Nöthnitzer Str. 46, 01187, Dresden, Germany
Sebastian Rudolph
Institute of Information Systems, Vienna University of Technology, Austria
Stefan Woltran
Laboratoire d’Informatique de Paris 6 (LIP6/UPMC), Université Pierre et Marie Curie, 4, place Jussieu, 75005, Paris, France
Christophe Gonzales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galitsky, B.A., Ilvovsky, D., Kuznetsov, S.O., Strok, F. (2014). Finding Maximal Common Sub-parse Thickets for Multi-sentence Search. In: Croitoru, M., Rudolph, S., Woltran, S., Gonzales, C. (eds) Graph Structures for Knowledge Representation and Reasoning. Lecture Notes in Computer Science(), vol 8323. Springer, Cham. https://doi.org/10.1007/978-3-319-04534-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-04534-4_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04533-7
Online ISBN: 978-3-319-04534-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics