Abstract
We implement a scalable mechanism to build a thesaurus of entities which is intended to improve the relevance of a chatbot. The thesaurus construction process starts from the seed entities and mines available source domains for new entities associated with these seed entities. New entities are formed by applying the machine learning of syntactic parse trees (their generalizations) to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration. To match natural language expressions between source and target domains, we use syntactic generalization, an operation that finds a set of maximal common sub-trees of the parse trees of these expressions.
Thesaurus and syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned thesaurus in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of thesaurus and syntactic generalization-based text relevance assessment and conclude that a proposed algorithm for automated thesaurus learning is suitable for integration into chatbots. The proposed algorithm is implemented as a component of Apache OpenNLP project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alani H, Brewster C (2005) Ontology ranking based on the analysis of concept structures. K-CAP’05 Proceedings of the 3rd international conference on knowledge capture, pp 51–58
Allen JF (1987) Natural language understanding. Benjamin Cummings, Menlo Park
Amiridze N, Kutsia T (2018) Anti-unification and natural language processing fifth workshop on natural language and computer science, NLCS’18, EasyChair Preprint no. 203
Blanco-Fernández Y, López-Nores M, Pazos-Arias JJ, GarcÃa-Duque J (2011) An improvement for semantics-based recommender systems grounded on attaching temporal information to ontologies and user profiles. Eng Appl Artif Intell 24(8):1385–1397
Buitelaar P, Olejnik D, Sintek M (2003) A proteg’e´ plug-in for ontology extraction from text based on linguistic analysis. In: Proceedings of the international semantic web conference (ISWC)
Chu B-H, Lee C-E, Ho C-S (2008) An ontology-supported database refurbishing technique and its application in mining actionable troubleshooting rules from real-life databases. Eng Appl Artif Intell 21(8):1430–1442
Cimiano P, Pivk A, Schmidt-Thieme L, Staab S (2004) Learning taxonomic relations from heterogeneous sources of evidence. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, evaluation and applications. IOS Press, Amsterdam/Berlin
De la Rosa JL, Rovira M, Beer M, Montaner M, Gibovic D (2010) Reducing administrative burden by online information and referral services. In: Reddick Austin CG (ed) Citizens and E-government: evaluating policy and management. IGI Global, Hershey, pp 131–157
Dzikovska M, Swift M, Allen J, de Beaumont W (2005) Generic parsing for multi-domain semantic interpretation. International workshop on parsing technologies (Iwpt05), Vancouver BC
Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Magill
Galitsky B (2005) Disambiguation via default rules under answering complex questions. Int J AI Tools 14(1–2):157–175. World Scientific
Galitsky B (2013) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091
Galitsky B (2016) Generalization of parse trees for iterative taxonomy learning. Inf Sci 329:125–143
Galitsky B (2017) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26
Galitsky B, Kovalerchuk B (2006) Mining the blogosphere for contributors’ sentiments. AAAI Spring symposium: computational approaches to analyzing weblogs, pp 37–39
Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. Clust Order Trees Methods Appl 92:341–376
Galitsky B, Lebedeva N (2015) Recognizing documents versus meta-documents by tree kernel learning. FLAIRS conference, pp 540–545
Galitsky B, McKenna EW (2017) Sentiment extraction from consumer reviews for providing product recommendations. US Patent App. 15/489,059
Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2010) From generalization of syntactic parse trees to conceptual graphs. ICCS 2010:185–190
Galitsky B, Kovalerchuk B, de la Rosa JL (2011a) Assessing plausibility of explanation and meta-explanation in inter-human conflicts. A special issue on semantic-based information and engineering systems. Eng Appl Artif Intell 24(8):1472–1486
Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011b) Using generalization of syntactic parse trees for taxonomy capture on the web. ICCS 2011:104–117
Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring semantic properties of sentences mining syntactic parse trees. Data Knowl Eng 81:21–45
Grefenstette G (1994) Explorations in automatic thesaurus discovery. Kluwer Academic, Boston/London/Dordrecht
Harris Z (1968) Mathematical structures of language. Wiley, New York
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th international conference on computational linguistics, pp 539–545
Heddon H (2008) Better living through thesauri. Digital Web Magazine. www.digital-web.com/articles/better_living_through_thesauri/
Howard RW (1992) Classifying types of concept and conceptual structure: some thesauri. J Cogn Psychol 4(2):81–111
Justo AV, dos Reis JC, Calado I, Rodrigues Jensen F (2018) Exploring ontologies to improve the empathy of interactive BotsE. IEEE 27th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE)
Kerschberg L,Kim W, Scime A (2003) A semantic thesaurus-based personalizable meta-search agent. In: Truszkowski W (ed) Innovative concepts for agent-based aystems, vol. LNAI 2564, Lecture notes in artificial intelligence. Springer, Heidelberg, pp 3–31
Kozareva Z, Hovy E, Riloff E (2009) Learning and evaluating the content and structure of a term thesaurus. Learning by reading and learning to read AAAI Spring symposium 2009. Stanford, CA
Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of COLING-ACL98, vol 2, pp 768–773
Liu J, Birnbaum L (2007) Measuring semantic similarity between named entities by searching the web directory. Web Intell 2007:461–465
Liu J, Birnbaum L (2008) What do they think?: aggregating local views about news events and topics. WWW 2008:1021–1022
Makhalova T, Dmitry A, Ilvovsky, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the first workshop on computing news storylines @ACL
Morbach J, Yang A, Marquardt W (2007) OntoCAPE – a large-scale ontology for chemical process engineering. Eng Appl Artif Intell 20(2):147–161. https://doi.org/10.1016/j.engappai.2006.06.010
Moreno A, Valls A, Isern D, Marin L, Borràs J (2012) SigTur/E-destination: ontology-based personalized recommendation of tourism and leisure activities. Eng Appl Artif Intell. Available online 17 Mar 2012
Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of the 17th European conference on machine learning, Berlin, Germany
Nissan E (2014) Narratives, formalism, computational tools, and nonlinearity. In: Dershowitz N, Nissan E (eds) Language, culture, computation. Computing of the humanities, law, and narratives. Lecture notes in computer science, vol 8002. Springer, Berlin/Heidelberg
OpenNLP (2012) opennlp.apache.org
Pan SJ, Qiang Yang A (2010) Survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Poesio M, Ishikawa T, Schulte im Walde S, Viera R (2002) Acquiring lexical knowledge for anaphora resolution. In: Proceedings of the 3rd conference on language resources and evaluation (LREC)
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: Proceedings of 24th international conference on machine learning, pp 759–766
Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA
Reinberger ML, Spyns P (2005) Generating and evaluating triples for modelling a virtual environment. OTM workshops, pp 1205–1214
Resnik P, Lin J (2010) Evaluation of NLP systems. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford
Roth C (2006) Compact, evolving community thesauri using concept lattices ICCS 14 – July 17–21, 2006, Aalborg, DK
Sánchez D (2010) A methodology to learn ontological attributes from the web. Data Knowl Eng 69(6):573–597
Sánchez D, Moreno A (2008) Pattern-based automatic thesaurus learning from the web. AI Commun 21(1):27–48
Sano AVD, Imanuel TD, Calista MI, Nindito H, Condrobimo AR (2018) The application of AGNES algorithm to optimize knowledge base for tourism chatbot. International conference on information management and technology (ICIMTech)
Saxena N, Tiwari NK, Husain M (2014) A web search survey: a study for fusion of different sources to determine relevance. 2014 international conference on computing for sustainable global development (INDIACom)
Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188
Trias A, de la Rosa JL (2013) Survey of social search from the perspective of the village paradigm and online social networks. J Inf Sci 39(5):688–707
Trias A, de la Rosa JL, Galitsky B, Drobocsi G (2010) Automation of social networks with QA agents (extended abstract). In: van der Hoek, Kaminka L, Luck, Sen (eds) Proceedings of 9th international conference on autonomous agents and multi-agent systems, AAMAS ‘10, Toronto, pp 1437–1438
Vicient C, Sánchez D, Moreno A (2012) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell. Available online 12 Sept 2012
Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26(3):1092–1106
Vorontsov K, Potapenko A (2015) Additive regularization of topic models. Mach Learn 101(1–3):303–323
Wang K, Ming Z, Chua TS (2009) A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’09). ACM, New York, NY, USA, pp 187–194
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Galitsky, B. (2019). Building Chatbot Thesaurus. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-04299-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04298-1
Online ISBN: 978-3-030-04299-8
eBook Packages: Computer ScienceComputer Science (R0)