Skip to main content

Assuring Chatbot Relevance at Syntactic Level

  • Chapter
  • First Online:
Developing Enterprise Chatbots
  • 2226 Accesses

Abstract

In this chapter we implement relevance mechanism based on similarity of parse trees for a number of chatbot components including search. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common sub-trees and performed at a level of paragraphs, sentences, phrases and individual words. We analyze semantic features of such similarity measure and compare it with semantics of traditional anti-unification of terms. Nearest neighbor machine learning is then applied to relate a sentence to a semantic class.

Using syntactic parse tree-based similarity measure instead of bag-of-words and keyword frequency approaches, we expect to detect a weak semantic signal otherwise unobservable. The proposed approach is evaluated in four distinct domains where a lack of semantic information makes classification of sentences rather difficult. We describe a toolkit which is a part of Apache Software Foundation project OpenNLP.chatbot, designed to aid search engineers and chatbot designers in tasks requiring text relevance assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allen JF (1987) Natural language understanding. Benjamin Cummings, Menlo Park

    MATH  Google Scholar 

  • Abney S (1991) Parsing by chunks. In: Principle-based parsing. Kluwer Academic Publishers, pp 257–278

    Google Scholar 

  • Aleman-Meza B. Halaschek C, Arpinar I Sheth A (2003) A Context-Aware Semantic Association Ranking. In: Proceedings of first int’l workshop Semantic Web and Databases (SWDB ‘03), pp. 33-50.

    Google Scholar 

  • Amiridze N, Kutsia T (2018) Anti-unification and natural language processing fifth workshop on natural language and computer science, NLCS’18, EasyChair Preprint no. 203

    Google Scholar 

  • Banko M, Cafarella J, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: Proceedings of the twentieth international joint conference on artificial intelligence. AAAI Press, Hyderabad, pp 2670–2676

    Google Scholar 

  • Bar-Haim R, Dagan I, Greental I, Shnarch E (2005) Semantic inference at the lexical-syntactic level AAAI-05.

    Google Scholar 

  • Bunke H (2003) Graph-based tools for data mining and machine learning. Lect Notes Comput Sci 2734/2003:7–19

    Article  Google Scholar 

  • Cardie C, Mooney RJ (1999) Machine learning and natural language, Mach Learn 1(5)

    Google Scholar 

  • Carreras X, Marquez L (2004) Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the eighth conference on computational natural language learning. ACL, Boston, pp 89–97

    Google Scholar 

  • Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1)

    Google Scholar 

  • Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL02

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Pighin D, Moschitti A (2009) Reverse engineering of tree kernel feature spaces. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 111–120

    Google Scholar 

  • de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language, Proc AAAI-05

    Google Scholar 

  • Ding L, Finin T, Joshi A, Pan R, Cost RS, Peng Y, Reddivari P, Doshi V, Sachs J (2004) Swoogle: a search and metadata engine for the semantic web. In: Proceeding of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp 652–659

    Google Scholar 

  • Ducheyne S (2008) J.S. Mill’s canons of induction: from true causes to provisional ones. History and Philosophy of Logic 29(4):361–376

    Article  Google Scholar 

  • Durme BV, Huang Y, Kupsc A, Nyberg E (2003) Towards light semantic processing for question answering. HLT Workshop on Text Meaning

    Google Scholar 

  • Dzikovska M., Swift M, Allen J, William de Beaumont W (2005) Generic parsing for multi-domain semantic interpretation. International Workshop on Parsing Technologies (Iwpt05), Vancouver BC.

    Google Scholar 

  • Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press Professional, Inc., San Diego

    MATH  Google Scholar 

  • Galitsky B, Josep Lluis de la Rosa, Gabor Dobrocsi (2011a) Building integrated opinion delivery environment. FLAIRS-24, West Palm Beach FL May 2011

    Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011b) Using generalization of syntactic parse trees for taxonomy capture on the web. ICCS:104–117

    Google Scholar 

  • Galitsky BA, G Dobrocsi, JL De La Rosa, SO Kuznetsov (2010) From generalization of syntactic parse trees to conceptual graphs. International Conference on Conceptual Structures, 185-190.

    Google Scholar 

  • Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Australia

    Google Scholar 

  • Galitsky B, Kuznetsov SO (2008) Learning communicative actions of conflicting human agents. J Exp Theor Artif Intell 20(4):277–317

    Article  Google Scholar 

  • Galitsky B, D Usikov (2008) Programming spatial algorithms in natural language. AAAI Workshop Technical Report WS-08-11.–Palo Alto, pp 16–24

    Google Scholar 

  • Galitsky B, González MP, Chesñevar CI (2009) A novel approach for classifying customer complaints through graphs similarities in argumentative dialogue. Decision Support Systems 46(3):717–729

    Article  Google Scholar 

  • Galitsky B, De La Rosa JL, Dobrocsi G (2012) Inferring the semantic properties of sentences by mining syntactic parse trees. Data Knowl Eng 81:21–45

    Article  Google Scholar 

  • Galitsky B, Kuznetsov SO, Usikov D (2013) Parse thicket representation for multi-sentence search. In: International conference on conceptual structures, pp 153–172

    Google Scholar 

  • Galitsky B, Ilvovsky DI, Kuznetsov SO (2014) Extending tree kernels towards paragraphs. Int J Comput Linguist Appl 5(1):105–116

    Google Scholar 

  • Galitsky B, Botros S (2015) Searching for associated events in log data. US Patent 9,171,037

    Google Scholar 

  • Galitsky B (2017a) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26

    Article  Google Scholar 

  • Galitsky B (2017b) Matching parse thickets for open domain question answering. Data Knowl Eng 107:24–50

    Article  Google Scholar 

  • Ganter B, Kuznetsov S (2001) Pattern Structures and Their Projections, Proceedings of the 9th International Conference on Conceptual Structures, ICCS’01, ed. G. Stumme and H. Delugach, Lecture Notes in Artificial Intelligence, 2120, 129–142.

    Google Scholar 

  • Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco

    MATH  Google Scholar 

  • Gildea D (2003) Loosely tree-based alignment for machine translation. In: Proceedings of the 41th annual conference of the Association for Computational Linguistics (ACL-03), Sapporo, pp 80–87

    Google Scholar 

  • Iosif E, Potamianos A (2009) Unsupervised semantic similarity computation between terms using web documents. IEEE Trans Knowl Data Eng 13

    Google Scholar 

  • Kapoor S, Ramesh H (1995) Algorithms for Enumerating All Spanning Trees of Undirected and Weighted Graphs. SIAM J Comput 24:247–265

    Article  MathSciNet  Google Scholar 

  • Kok S, Domingos P (2008) Extracting semantic networks from text via relational clustering. In: Proceedings of the nineteenth European conference on machine learning. Springer, Antwerp, Belgium, pp 624–639

    Google Scholar 

  • Kuznetsov SO, Samokhin, MV (2005) Learning closed sets of labeled graphs for chemical applications. In: Inductive Logic Programming pp 190–208

    Google Scholar 

  • Lamberti F, Sanna A, Demartini C (2009) A Relation-Based Page Rank Algorithm for Semantic Web Search Engines. IEEE Trans Knowl Data Eng 21(1):123–136

    Article  Google Scholar 

  • Lin D, Pantel P (2001) DIRT: discovery of inference rules from text. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining 2001, 323–328

    Google Scholar 

  • Makhalova T, Ilvovsky DI, Galitsky BA (2015) Pattern Structures for News Clustering. FCA4AI@ IJCAI, 35-42

    Google Scholar 

  • Mill JS (1843) A system of logic, racionative and inductive, London

    Google Scholar 

  • Moldovan D, Clark C, Harabagiu S, Maiorano S (2003) Cogex: a logic prover for question answering. In: Proceedings of HLTNAACL 2003

    Google Scholar 

  • Moreda P, Navarro B, Palomar M (2007) Corpus-based semantic role approach in information retrieval. Data Knowl Eng 61:467–483

    Article  Google Scholar 

  • Moschitti A (2008) Kernel Methods, Syntax and Semantics for Relational Text Categorization. In: Proceeding of ACM 17th Conference on Information and Knowledge Management (CIKM). Napa Valley, California.

    Google Scholar 

  • Moschitti A, Pighin D, Basili R (2006). Semantic role labeling via tree kernel joint inference. In Proceedings of the 10th conference on computational natural language learning, New York, USA

    Google Scholar 

  • openNLP (2018) http://opennlp.apache.org/

  • Plotkin GD (1970) A note on inductive generalization. In: Meltzer B, Michie D (eds) Machine Intelligence, vol 5. Elsevier North-Holland, New York, pp 153–163

    Google Scholar 

  • Poon H, Domingos P (2008) Joint unsupervised coreference resolution with Markov logic. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP’08). Association for Computational Linguistics, Stroudsburg, pp 650–659

    Chapter  Google Scholar 

  • Ravichandran D, Hovy E (2002) Learning surface text patterns for a Question Answering system. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA

    Google Scholar 

  • Robinson JA (1965) A machine-oriented logic based on the resolution principle. J Assoc Comput Mach 12:23–41

    Article  MathSciNet  Google Scholar 

  • Romano L, Kouylekov M, Szpektor I, Dagan I, Lavelli A (2006) Investigating a generic paraphrase-based approach for relation extraction. In: Proceedings of EACL, 409–416

    Google Scholar 

  • Stevenson M, Greenwood MA (2005) A semantic approach to IE pattern induction. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor

    Google Scholar 

  • Strok F, Galitsky B, Ilvovsky D, Kuznetsov S (2014) Pattern structure projections for learning discourse structures. International Conference on Artificial Intelligence: methodology, Systems, and Applications. Springer, Cham, pp 254–260

    Google Scholar 

  • Strzalkowski T, Carballo JP, Karlgren J, Tapanainen AHP, Jarvinen T (1999) Natural language information retrieval: TREC-8 report. In: Text Retrieval conference

    Google Scholar 

  • Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (Eds.) (2003) Advances in learning theory: methods, models and applications, vol. 190 NATO-ASI series III: computer and systems sciences, IOS Press

    Google Scholar 

  • Thompson C, Mooney R, Tang L (1997) Learning to parse NL database queries into logical form. In: Workshop on automata induction, grammatical inference and language acquisition

    Google Scholar 

  • Voorhees EM (2004) Overview of the TREC 2001 Question Answering track. In: TREC

    Google Scholar 

  • Zanzotto FM, Moschitti A (2006) Automatic learning of textual entailments with cross-pair similarities. In: Proceedings of the Joint 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia.

    Google Scholar 

  • Zhang M, Zhou GD, Aw A (2008) Exploring syntactic structured features over parse trees for relation extraction using kernel methods. Inf Process Manage Int J 44(2):687–701

    Article  Google Scholar 

  • Zhao Y, Shen X, Senuma H, Aizawa A (2018) A comprehensive study: sentence compression with linguistic knowledge-enhanced gated neural network. Data Knowl Eng V117:307–318

    Article  Google Scholar 

  • Zettlemoyer LS, Collins M (2005) Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In: Bacchus F, Jaakkola T (eds) Proceedings of the twenty-first conference on uncertainty in artificial intelligence (UAI’05). AUAI Press, Arlington, pp 658–666

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Galitsky, B. (2019). Assuring Chatbot Relevance at Syntactic Level. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04299-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04298-1

  • Online ISBN: 978-3-030-04299-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics