Skip to main content

Finding Maximal Common Sub-parse Thickets for Multi-sentence Search

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8323))

Abstract

We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. We provide a detailed illustration of how PTs are built from parse trees and generalized as phrases by computing maximal common subgraphs. The proposed approach is subject to evaluation in the product search and recommendation domain, where search queries include multiple sentences. We draw the comparison for search relevance improvement by pair-wise sentence generalization, phrase-level generalization, and generalizations of PTs as graphs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chernyak, E.L., Mirkin, B.G.: Computationally refining a taxonomy by using annotated suffix trees over Wikipedia resources. In: International Conference “Dialogue”, RGGU, vol. 12(19) (2013)

    Google Scholar 

  2. Galitsky, B.: Natural Language Question Answering System: Technique of Semantic Headers. In: Advanced Knowledge International, Australia (2003)

    Google Scholar 

  3. Galitsky, B., de la Rosa, J.L., Dobrocsi, G.: Inferring the semantic properties of sentences by mining syntactic parse trees. Data & Knowledge Engineering 81-82, 21–45 (2012)

    Article  Google Scholar 

  4. Galitsky, B.A., Kuznetsov, S.O., Usikov, D.: Parse Thicket Representation for Multi-sentence Search. In: Pfeiffer, H.D., Ignatov, D.I., Poelmans, J., Gadiraju, N. (eds.) ICCS 2013. LNCS, vol. 7735, pp. 153–172. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Galitsky, B., Kuznetsov, S.: Learning communicative actions of conflicting human agents. J. Exp. Theor. Artif. Intell. 20(4), 277–317 (2008)

    Article  MATH  Google Scholar 

  6. Galitsky, B.: Machine Learning of Syntactic Parse Trees for Search and Classification of Text. In: Engineering Application of AI (2012), http://dx.doi.org/10.1016/j.engappai.2012.09.017

  7. Galitsky, B., Ilvovsky, D., Kuznetsov, S.O., Strok, F.: Text Retrieval Efficiency with Pattern Structures on Parse Thickets. In: Workshop “Formal Concept Analysis Meets Information Retrieval" at ECIR 2013, Moscow, Russia (2013)

    Google Scholar 

  8. Galitsky, B.: Transfer learning of syntactic structures for building taxonomies for search engines. Engineering Application of AI, http://dx.doi.org/10.1016/j.engappai.2013.08.010

  9. Ganter, B., Kuznetsov, S.O.: Pattern Structures and Their Projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Ehrlich, H.-C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: review. Wiley Interdisciplinary Reviews: Computational Molecular Science 1(1), 68–79 (2011)

    Google Scholar 

  11. Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society (2002)

    Google Scholar 

  12. Blinova, V.G., Dobrynin, D.A., Finn, V.K., Kuznetsov, S.O., Pankratova, E.S.: Toxicology Analysis by Means of the JSM-Method. Bioinformatics 19, 1201–1207 (2003)

    Article  Google Scholar 

  13. Punyakanok, V., Roth, D., Yih, W.: Mapping dependencies trees: an application to question answering. In: Proceedings of AI & Math., Florida, USA (2004)

    Google Scholar 

  14. Kuznetsov, S.O., Samokhin, M.V.: Learning Closed Sets of Labeled Graphs for Chemical Applications. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 190–208. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Wu, J., Xuan, Z., Pan, D.: Enhancing text representation for classification tasks with semantic graph structures. International Journal of Innovative Computing, Information and Control (ICIC) 7(5(B))

    Google Scholar 

  16. Haussler, D.: Convolution kernels on discrete structures (1999)

    Google Scholar 

  17. Kann, V.: On the Approximability of the Maximum Common Subgraph Problem. In: Finkel, A., Jantzen, M. (eds.) STACS 1992. LNCS, vol. 577, pp. 377–388. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  18. Lin, J.: Data-Intensive Text Processing with MapReduce (2013), intool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf

  19. Cascading (2013), en.wikipedia.org/wiki/Cascading , http://www.cascading.org/

  20. Dean, J.: Challenges in Building Large-Scale Information Retrieval Systems (2009), research.google.com/people/jeff/WSDM09-keynote.pdf

  21. Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)

    Google Scholar 

  22. Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Empirial Methods in NLP 2004 (2004)

    Google Scholar 

  24. Polovina, S., Heaton, J.: An Introduction to Conceptual Graphs. AI Expert, 36–43 (1992)

    Google Scholar 

  25. Mann, W.C., Matthiessen, C.M.I.M., Thompson, S.A.: Rhetorical Structure Theory and Text Analysis. In: Mann, W.C., Thompson, S.A. (eds.) Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text, pp. 39–78. John Benjamins, Amsterdam (1992)

    Chapter  Google Scholar 

  26. Searle, J.: Speech acts: An essay in the philosophy of language. Cambridge University, Cambridge (1969)

    Book  Google Scholar 

  27. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM (ACM) 16(9), 575–577 (1973)

    Article  MATH  Google Scholar 

  28. Sun, J., Zhang, M., Tan, C.L.: Tree Sequence Kernel for Natural Language. AAAI-25 (2011)

    Google Scholar 

  29. Zhang, M., Che, W., Zhou, G., Aw, A., Tan, C., Liu, T., Li, S.: Semantic role labeling using a grammar-driven convolution tree kernel. IEEE Transactions on Audio, Speech, and Language Processing 16(7), 1315–1329 (2008)

    Article  Google Scholar 

  30. Vismara, P., Valery, B.: Finding Maximum Common Connected Subgraphs Using Clique Detection or Constraint Satisfaction Algorithms. In: Thi, H.A.L., Bouvry, P., Dinh, T.P. (eds.) MCO 2008. CCIS, vol. 14, pp. 358–368. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  31. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence 18(3), 265–298 (2004)

    Article  Google Scholar 

  32. Montaner, M., Lopez, B., de la Rosa, J.L.: A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Review 19(4), 285–330 (2003)

    Article  Google Scholar 

  33. Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of NIPS, pp. 625–632 (2002)

    Google Scholar 

  34. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4) (2013)

    Google Scholar 

  35. Plotkin, G.D.: A note on inductive generalization. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence, vol. 5, pp. 153–163. Elsevier North-Holland, New York (1970)

    Google Scholar 

  36. Jurafsky, D., Martin, J.H.: Speech and Language Processing. An Introduction to Natural Language Processing. Computational Linguistics, and Speech Recognition (2008)

    Google Scholar 

  37. Robinson, J.A.: A machine-oriented logic based on the resolution principle. Journal of the Association for Computing Machinery 12, 23–41 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  38. Mill, J.S.: A system of logic, ratiocinative and inductive, London (1843)

    Google Scholar 

  39. Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)

    MATH  Google Scholar 

  40. Finn, V.K.: On the synthesis of cognitive procedures and the problem of induction. NTI Series 2, N1-2, pp. 8–45 (1999)

    Google Scholar 

  41. Mitchell, T.: Machine Learning. McGraw Hill (1997)

    Google Scholar 

  42. Furukawa, K.: From Deduction to Induction: Logical Perspective. In: Apt, K.R., Marek, V.W., Truszczynski, M., Warren, D.S. (eds.) The Logic Programming Paradigm. Springer (1998)

    Google Scholar 

  43. Bhasker, B., Srikumar, K.: Recommender Systems in E-Commerce. CUP (2012) ISBN 978-0-07-068067-8

    Google Scholar 

  44. Hennig-Thurau, H., Marchand, A., Marx, P.: Can Automated Group Recommender Systems Help Consumers Make Better Choices? Journal of Marketing 76(5), 89–109 (2012)

    Article  Google Scholar 

  45. Trias i Mansilla, A., de la Rosa i Esteva, J.L.: Asknext: An Agent Protocol for Social Search. Information Sciences 190, 144–161 (2012)

    Article  Google Scholar 

  46. Punyakanok, V., Roth, D., Yih, W.: The Necessity of Syntactic Parsing for Semantic Role Labeling. In: IJCAI 2005 (2005)

    Google Scholar 

  47. Domingos, P., Poon, H.: Unsupervised Semantic Parsing. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACL, Singapore (2009)

    Google Scholar 

  48. Marcu, D.: From Discourse Structures to Text Summaries. In: Mani, I., Maybury, M. (eds.) Proceedings of ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)

    Google Scholar 

  49. Abney, S.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers (1991)

    Google Scholar 

  50. Byun, H., Lee, S.-W.: Applications of Support Vector Machines for Pattern Recognition: A Survey. In: Lee, S.-W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, pp. 213–236. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  51. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    Google Scholar 

  52. Sun, J., Zhang, M., Tan, C.: Exploring syntactic structural features for sub-tree alignment using bilingual tree kernels. In: Proceedings of ACL, pp. 306–315 (2010)

    Google Scholar 

  53. Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Bersini, H., Saerens, M.: A Graph-Based Approach to Skill Extraction from Text. In: TextGraphs-8, Graph-based Methods for Natural Language Processing. Workshop at EMNLP 2013, Seattle, USA, October 18 (2013)

    Google Scholar 

  54. Widlöcher, A., Mathet, Y.: The Glozz platform: a corpus annotation and mining tool. In: ACM Symposium on Document Engineering, pp. 171–180 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Galitsky, B.A., Ilvovsky, D., Kuznetsov, S.O., Strok, F. (2014). Finding Maximal Common Sub-parse Thickets for Multi-sentence Search. In: Croitoru, M., Rudolph, S., Woltran, S., Gonzales, C. (eds) Graph Structures for Knowledge Representation and Reasoning. Lecture Notes in Computer Science(), vol 8323. Springer, Cham. https://doi.org/10.1007/978-3-319-04534-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04534-4_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04533-7

  • Online ISBN: 978-3-319-04534-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics