Skip to main content

Building Chatbot Thesaurus

  • Chapter
  • First Online:
Developing Enterprise Chatbots

Abstract

We implement a scalable mechanism to build a thesaurus of entities which is intended to improve the relevance of a chatbot. The thesaurus construction process starts from the seed entities and mines available source domains for new entities associated with these seed entities. New entities are formed by applying the machine learning of syntactic parse trees (their generalizations) to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration. To match natural language expressions between source and target domains, we use syntactic generalization, an operation that finds a set of maximal common sub-trees of the parse trees of these expressions.

Thesaurus and syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned thesaurus in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of thesaurus and syntactic generalization-based text relevance assessment and conclude that a proposed algorithm for automated thesaurus learning is suitable for integration into chatbots. The proposed algorithm is implemented as a component of Apache OpenNLP project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alani H, Brewster C (2005) Ontology ranking based on the analysis of concept structures. K-CAP’05 Proceedings of the 3rd international conference on knowledge capture, pp 51–58

    Google Scholar 

  • Allen JF (1987) Natural language understanding. Benjamin Cummings, Menlo Park

    MATH  Google Scholar 

  • Amiridze N, Kutsia T (2018) Anti-unification and natural language processing fifth workshop on natural language and computer science, NLCS’18, EasyChair Preprint no. 203

    Google Scholar 

  • Blanco-Fernández Y, López-Nores M, Pazos-Arias JJ, García-Duque J (2011) An improvement for semantics-based recommender systems grounded on attaching temporal information to ontologies and user profiles. Eng Appl Artif Intell 24(8):1385–1397

    Article  Google Scholar 

  • Buitelaar P, Olejnik D, Sintek M (2003) A proteg’e´ plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the international semantic web conference (ISWC)

    Google Scholar 

  • Chu B-H, Lee C-E, Ho C-S (2008) An ontology-supported database refurbishing technique and its application in mining actionable troubleshooting rules from real-life databases. Eng Appl Artif Intell 21(8):1430–1442

    Article  Google Scholar 

  • Cimiano P, Pivk A, Schmidt-Thieme L, Staab S (2004) Learning taxonomic relations from heterogeneous sources of evidence. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press, Amsterdam/Berlin

    Google Scholar 

  • De la Rosa JL, Rovira M, Beer M, Montaner M, Gibovic D (2010) Reducing administrative burden by online information and referral services. In: Reddick Austin CG (ed) Citizens and E-government: evaluating policy and management. IGI Global, Hershey, pp 131–157

    Chapter  Google Scholar 

  • Dzikovska M, Swift M, Allen J, de Beaumont W (2005) Generic parsing for multi-domain semantic interpretation. International workshop on parsing technologies (Iwpt05), Vancouver BC

    Google Scholar 

  • Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Magill

    Google Scholar 

  • Galitsky B (2005) Disambiguation via default rules under answering complex questions. Int J AI Tools 14(1–2):157–175. World Scientific

    Article  Google Scholar 

  • Galitsky B (2013) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091

    Google Scholar 

  • Galitsky B (2016) Generalization of parse trees for iterative taxonomy learning. Inf Sci 329:125–143

    Article  Google Scholar 

  • Galitsky B (2017) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26

    Article  Google Scholar 

  • Galitsky B, Kovalerchuk B (2006) Mining the blogosphere for contributors’ sentiments. AAAI Spring symposium: computational approaches to analyzing weblogs, pp 37–39

    Google Scholar 

  • Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. Clust Order Trees Methods Appl 92:341–376

    MathSciNet  Google Scholar 

  • Galitsky B, Lebedeva N (2015) Recognizing documents versus meta-documents by tree kernel learning. FLAIRS conference, pp 540–545

    Google Scholar 

  • Galitsky B, McKenna EW (2017) Sentiment extraction from consumer reviews for providing product recommendations. US Patent App. 15/489,059

    Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2010) From generalization of syntactic parse trees to conceptual graphs. ICCS 2010:185–190

    Google Scholar 

  • Galitsky B, Kovalerchuk B, de la Rosa JL (2011a) Assessing plausibility of explanation and meta-explanation in inter-human conflicts. A special issue on semantic-based information and engineering systems. Eng Appl Artif Intell 24(8):1472–1486

    Article  Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011b) Using generalization of syntactic parse trees for taxonomy capture on the web. ICCS 2011:104–117

    Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring semantic properties of sentences mining syntactic parse trees. Data Knowl Eng 81:21–45

    Article  Google Scholar 

  • Grefenstette G (1994) Explorations in automatic thesaurus discovery. Kluwer Academic, Boston/London/Dordrecht

    Book  Google Scholar 

  • Harris Z (1968) Mathematical structures of language. Wiley, New York

    MATH  Google Scholar 

  • Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th international conference on computational linguistics, pp 539–545

    Google Scholar 

  • Heddon H (2008) Better living through thesauri. Digital Web Magazine. www.digital-web.com/articles/better_living_through_thesauri/

  • Howard RW (1992) Classifying types of concept and conceptual structure: some thesauri. J Cogn Psychol 4(2):81–111

    Article  Google Scholar 

  • Justo AV, dos Reis JC, Calado I, Rodrigues Jensen F (2018) Exploring ontologies to improve the empathy of interactive BotsE. IEEE 27th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE)

    Google Scholar 

  • Kerschberg L,Kim W, Scime A (2003) A semantic thesaurus-based personalizable meta-search agent. In: Truszkowski W (ed) Innovative concepts for agent-based aystems, vol. LNAI 2564, Lecture notes in artificial intelligence. Springer, Heidelberg, pp 3–31

    Google Scholar 

  • Kozareva Z, Hovy E, Riloff E (2009) Learning and evaluating the content and structure of a term thesaurus. Learning by reading and learning to read AAAI Spring symposium 2009. Stanford, CA

    Google Scholar 

  • Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING-ACL98, vol 2, pp 768–773

    Google Scholar 

  • Liu J, Birnbaum L (2007) Measuring semantic similarity between named entities by searching the web directory. Web Intell 2007:461–465

    Google Scholar 

  • Liu J, Birnbaum L (2008) What do they think?: aggregating local views about news events and topics. WWW 2008:1021–1022

    Article  Google Scholar 

  • Makhalova T, Dmitry A, Ilvovsky, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the first workshop on computing news storylines @ACL

    Google Scholar 

  • Morbach J, Yang A, Marquardt W (2007) OntoCAPE – a large-scale ontology for chemical process engineering. Eng Appl Artif Intell 20(2):147–161. https://doi.org/10.1016/j.engappai.2006.06.010

    Article  Google Scholar 

  • Moreno A, Valls A, Isern D, Marin L, Borràs J (2012) SigTur/E-destination: ontology-based personalized recommendation of tourism and leisure activities. Eng Appl Artif Intell. Available online 17 Mar 2012

    Google Scholar 

  • Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of the 17th European conference on machine learning, Berlin, Germany

    Google Scholar 

  • Nissan E (2014) Narratives, formalism, computational tools, and nonlinearity. In: Dershowitz N, Nissan E (eds) Language, culture, computation. Computing of the humanities, law, and narratives. Lecture notes in computer science, vol 8002. Springer, Berlin/Heidelberg

    Google Scholar 

  • OpenNLP (2012) opennlp.apache.org

  • Pan SJ, Qiang Yang A (2010) Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Poesio M, Ishikawa T, Schulte im Walde S, Viera R (2002) Acquiring lexical knowledge for anaphora resolution. In: Proceedings of the 3rd conference on language resources and evaluation (LREC)

    Google Scholar 

  • Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of 24th international conference on machine learning, pp 759–766

    Google Scholar 

  • Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA

    Google Scholar 

  • Reinberger ML, Spyns P (2005) Generating and evaluating triples for modelling a virtual environment. OTM workshops, pp 1205–1214

    Google Scholar 

  • Resnik P, Lin J (2010) Evaluation of NLP systems. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford

    Google Scholar 

  • Roth C (2006) Compact, evolving community thesauri using concept lattices ICCS 14 – July 17–21, 2006, Aalborg, DK

    Google Scholar 

  • Sánchez D (2010) A methodology to learn ontological attributes from the web. Data Knowl Eng 69(6):573–597

    Article  Google Scholar 

  • Sánchez D, Moreno A (2008) Pattern-based automatic thesaurus learning from the web. AI Commun 21(1):27–48

    MathSciNet  MATH  Google Scholar 

  • Sano AVD, Imanuel TD, Calista MI, Nindito H, Condrobimo AR (2018) The application of AGNES algorithm to optimize knowledge base for tourism chatbot. International conference on information management and technology (ICIMTech)

    Google Scholar 

  • Saxena N, Tiwari NK, Husain M (2014) A web search survey: a study for fusion of different sources to determine relevance. 2014 international conference on computing for sustainable global development (INDIACom)

    Google Scholar 

  • Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188

    Google Scholar 

  • Trias A, de la Rosa JL (2013) Survey of social search from the perspective of the village paradigm and online social networks. J Inf Sci 39(5):688–707

    Article  Google Scholar 

  • Trias A, de la Rosa JL, Galitsky B, Drobocsi G (2010) Automation of social networks with QA agents (extended abstract). In: van der Hoek, Kaminka L, Luck, Sen (eds) Proceedings of 9th international conference on autonomous agents and multi-agent systems, AAMAS ‘10, Toronto, pp 1437–1438

    Google Scholar 

  • Vicient C, Sánchez D, Moreno A (2012) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell. Available online 12 Sept 2012

    Google Scholar 

  • Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26(3):1092–1106

    Article  Google Scholar 

  • Vorontsov K, Potapenko A (2015) Additive regularization of topic models. Mach Learn 101(1–3):303–323

    Article  MathSciNet  Google Scholar 

  • Wang K, Ming Z, Chua TS (2009) A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’09). ACM, New York, NY, USA, pp 187–194

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Galitsky, B. (2019). Building Chatbot Thesaurus. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04299-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04298-1

  • Online ISBN: 978-3-030-04299-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics