A Content Management System for Chatbots

Galitsky, Boris

doi:10.1007/978-3-030-04299-8_9

Boris Galitsky²

2405 Accesses
4 Citations

Abstract

In this chapter we describe the industrial applications of our linguistic-based relevance technology for processing, classification and delivery of a stream of texts as data sources for chatbots. We present the content pipeline for eBay entertainment domain that employs this technology, and show that text processing relevance is the main bottleneck for its performance. A number of components of the chatbot content pipeline such as content mining, thesaurus formation, aggregation from multiple sources, validation, de-duplication, opinion mining and integrity enforcement need to rely on domain-independent efficient text classification, entity extraction and relevance assessment operations.

Text relevance assessment is based on the operation of syntactic generalization (SG, Chap. 5) which finds a maximum common sub-tree for a pair of parse trees for sentences. Relevance of two portions of texts is then defined as a cardinality of this sub-tree. SG is intended to substitute keyword-based analysis for more accurate assessment of relevance that takes phrase-level and sentence-level information into account. In the partial case of SG, where short expression are commonly used terms such as Facebook likes, SG ascends to the level of categories and a reasoning technique is required to map these categories in the course of relevance assessment.

A number of content pipeline components employ web mining which needs SG to compare web search results. We describe how SG works in a number of components in the content pipeline including personalization and recommendation, and provide the evaluation results for eBay deployment. Content pipeline support is implemented as an open source contribution OpenNLP.Similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aleman-Meza B, Halaschek C, Arpinar I, Sheth A (2003) A context-aware semantic association ranking. In: Proceedings of the first inernational workshop semantic web and databases (SWDB’03), pp 33–50
Google Scholar
Antoniou G, Billington D, Governatori G, Maher M (2001) Representation results for defeasible logic. ACM Trans Comput Log 2(2):255–287
Article MathSciNet Google Scholar
Banerjee S, Mitra P (2016) WikiWrite: generating wikipedia articles automatically. IJCAI, New York
Google Scholar
Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84
Article Google Scholar
Baroni M, Chantree F, Kilgarriff A, Sharoff S (2008) Cleaneval: a competition for cleaning web pages. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international language resources and evaluation (LREC’08)
Google Scholar
Bartlett FC (1932) Remembering: a study in experimental and social psychology. Cambridge University Press
Google Scholar
Barzilay R, Lee L (2004) Catching the drift: probabilistic content models, with applications to generation and summarization. HLT-NAACL
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bordini RH, Braubach L (2006) A survey of programming languages and platforms for multi-agent systems. Informatica 30:33–44
MATH Google Scholar
Bridle JS (1990) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing. Springer, pp 227–236
Google Scholar
Brzezinski D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of HAIS 2011, Springer Verlag lecture notes in artificial intelligence 6679, pp 155–163
Google Scholar
Cai D, Yu S, Wen J-R, Ma W-Y (2003) Extracting content structure for web pages based on visual representation. In: Zhou X, Zhang Y, Orlowska ME (eds) APWeb, volume 2642 of LNCS, Springer, pp 406–417
Google Scholar
Cascading (2013) Welcome to the Cascading ecosystem. www.cascading.org
Chesñevar C, Maguitman A, González MP (2009. Empowering recommendation technologies through argumentation. In: Rahwan I, Simari G (eds) Argumentation in artificial intelligence, Springer Verlag, (505 p, in press). ISBN 978-0-387-98196-3
Google Scholar
Cumby C, Roth D (2003) On kernel methods for relational learning. In: ICML, pp 107–14
Google Scholar
Cuzzocrea A (Editorial) (2012) Intelligent knowledge-based models and methodologies for complex information systems. Inf Sci 194:1–282
Google Scholar
de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: Proceedings of AAAI-05
Google Scholar
Ding L, Finin T, Joshi A, Pan R, Cost RS, Peng Y, Reddivari P, Doshi V, Sachs J (2004) Swoogle: a search and metadata engine for the semantic web. In: Proceedings of 13th ACM international conference on information and knowledge management (CIKM’04), pp 652–659
Google Scholar
Erenel Z, Altınçay H (2012) Nonlinear transformation of term frequencies for term weighting in text categorization. Eng Appl Artifi Intell 25(7):1505–1514
Article Google Scholar
Ferretti E, Errecalde M, García AJ, Simari GR (2007) An application of defeasible logic programming to decision making in a robotic environment. In: LPNMR, pp 297–302
Google Scholar
Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Adelaide
Google Scholar
Galitsky B (2012) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091
Google Scholar
Galitsky B (2013) Transfer learning of syntactic structures for building taxonomies for search engines. Eng Appl Artif Intell 26(10):2504–2515
Article Google Scholar
Galitsky B (2014) Learning parse structure of paragraphs and its applications in search. Eng Appl of AI 32:160–184
Article Google Scholar
Galitsky B (2015). Finding a lattice of needles in a haystack: forming a query from a set of items of interest. In: FCA4AI@IJCAI
Google Scholar
Galitsky B (2016) A tool for efficient content compilation. In: COLING Demo C16-2042 Osaka, Japan
Google Scholar
Galitsky B (2017) Matching parse thickets for open domain question answering. Data Knowl Eng 107:24–50
Article Google Scholar
Galitsky B, de la Rosa JL (2011) Concept-based learning of human behavior for customer relationship management. Spec Issue Inf Eng Appl Based on Lattices. Inf Sci 181(10):2016–2035
Google Scholar
Galitsky B, Ilvovsky D (2017) Chatbot with a discourse structure-driven dialogue management. In: EACL Demo E17-3022, Valencia, Spain
Google Scholar
Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. In: Clusters, orders, and trees: methods and applications, pp 341–376
Google Scholar
Galitsky B, Kuznetsov SO (2013) A web mining tool for assistance with creative writing. In: ECIR 2013: advances in information retrieval, pp 828–831
Google Scholar
Galitsky B, Levene M (2007) Providing rating services and subscriptions with web portal infrastructures. In: Encyclopedia of portal technologies and applications, pp 855–862
Google Scholar
Galitsky B, Usikov D (2008) Programming spatial algorithms in natural language. In: AAAI workshop technical report WS-08-11, Palo Alto, pp 16–24
Google Scholar
Galitsky B, Kuznetsov SO, Samokhin MV (2005) Analyzing conflicts with concept-based learning. In: International conference on conceptual structures, pp 307–322
Google Scholar
Galitsky B, Kuznetsov SO, Kovalerchuk B (2008) Argumentation vs meta-argumentation for the assessment of multi-agent conflict. Proc. of the AAAI Workshop on Metareasoning
Google Scholar
Galitsky B, Chen H, Du S (2009) Inversion of Forum Content Based on Authors’ Sentiments on Product Usability. AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0, pp 33–38
Google Scholar
Galitsky B, Dobrocsi G, de la Rosa JL (2010) Inverting semantic structure under open domain opinion mining twenty-third international FLAIRS conference
Google Scholar
Galitsky B Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011) Using generalization of syntactic parse trees for taxonomy capture on the web. In: ICCS, pp 104–117
Google Scholar
Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring the semantic properties of sentences by mining syntactic parse trees. Data Knowl Eng 81:21–45
Article Google Scholar
Galitsky B, Usikov D, Kuznetsov SO (2013) Parse thicket representations for answering multi-sentence questions. In: 20th international conference on conceptual structures, ICCS
Google Scholar
Galitsky B, Ilvovsky D, Kuznetsov SO (2015) Text classification into abstract classes based on discourse structure. In: Proceedings of recent advances in natural language processing, Hissar, Bulgaria, Sep 7–9 2015, pp 200–207
Google Scholar
Garcia A, Simari G (2004) Defeasible logic programming: an argumentative approach. Theory Pract Logic Program 4:95–138
Article MathSciNet Google Scholar
Gartner (2018) Gartner says 25 percent of customer service operations will use virtual customer assistants by 2020. https://www.gartner.com/newsroom/id/3858564
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical Report. Stanford University
Google Scholar
Gomez SA, Chesñevar CI, Simari GR (2010) Reasoning with inconsistent ontologies through argumentation. Appl Artif Intell 24(1 & 2):102–148
Article Google Scholar
Gomez H, Vilariño D, Pinto D, Sidorov G (2015) CICBUAPnlp: graph-based approach for answer selection in community question answering task. In: Sem Eavl-2015, pp 18–22
Google Scholar
Google (2018) Search using autocomplete. https://support.google.com/websearch/answer/106230
Harris Z (1982) Discourse and sublanguage. In: Kittredge R, Lehrberger J (eds) Sublanguage: studies of language in restricted semantic domains. Walter de Gruyter, Berlin, New York, pp 231–236
Google Scholar
Hendrikx M, Meijer S, Van Der Velden J, Iosup A (2013) Procedural content generation for games: a survey. ACM Trans Multimed Comput Commun Appl 9(1), Article 1, 22 pages
Google Scholar
iGoDigital (2013) https://www.crunchbase.com/organization/igodigital
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11:37–50
Article Google Scholar
Janusz A, Ślęzak D, Nguyen HS (2012) Unsupervised similarity learning from textual data. Fundam Inform 119(3):319–336
MathSciNet MATH Google Scholar
Jindal R, Taneja S (2017) A novel weighted classification approach using linguistic text mining. Int J Comput Appl 180(2):9–15
Google Scholar
Johnson MR (2016) Procedural generation of linguistics, dialects, naming conventions and spoken sentences. In: Proceedings of 1st international joint conference of DiGRA and FDG
Google Scholar
Kong F, Zhou G (2011) Improving tree kernel-based event pronoun resolution with competitive information. In: Proceedings of the twenty-second international joint conference on artificial intelligence, vol 3, pp 1814–1819
Google Scholar
Krippendorff K (2004) Reliability in content analysis: some common misconceptions and recommendations. Hum Commun Res 30(3):411–433
Google Scholar
Kuncheva LI (2004) Classier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems, LNCS, vol 3077. Springer, Heidelberg, p 1
Chapter Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning – Volume 32 (ICML’14), Eric P. Xing and Tony Jebara (Eds.), Vol 32
Google Scholar
Leouski AV, Croft WB (1996) An evaluation of techniques for clustering search results. UMass Tech Report #76. http://ciir.cs.umass.edu/pubfiles/ir-76.pdf
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710
MathSciNet Google Scholar
Liapis A, Yannakakis GN, Togelius J (2013) Sentient sketchbook: computer-aided game level authoring. In: InFDG, pp 213–220
Google Scholar
Mahout (2013) https://mahout.apache.org
Makhalova T, Ilvovsky DA, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the First Workshop on Computing News Storylines @ACL
Google Scholar
Mann WC, Thompson SA (1988) Rhetoric al structure theory: toward a functional theory of text organization. Text 8(3):243–281
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge UK
Book Google Scholar
Marcu D (1997) The rhetorical parsing, summarization, and generation of natural language texts. Unpublished Ph.D. dissertation, University of Toronto, Toronto, Canada
Google Scholar
Mavridis T, Symeonidis AL (2014) Semantic analysis of web documents for the generation of optimal content. Eng Appl Artif Intell 35:114–130
Article Google Scholar
Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of ACM 17th conference on information and knowledge management (CIKM). Napa Valley, California
Google Scholar
McKeown KR (1985) Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, Cambridge, UK
Book Google Scholar
Nagarajan V, Chandrasekar P (2014) Pivotal sentiment tree classifier. Int J Sci Technol Res 3(11):190
Google Scholar
OpenNLP (2018.) https://opennlp.apache.org/
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Nicoletta Calzolari N (ed) LREC’
Google Scholar
Pasternack J, Roth D (2009) Extracting article text from the web with maximum subsequence segmentation. In: WWW ‘09: proceedings of the 18th international conference on world wide web, ACM, New York, pp 971–980
Google Scholar
Rahwan I, Amgoud L (2006) An argumentation based approach for practical reasoning. In: International joint conference on autonomous agents and multi agent systems, pp 347–354
Google Scholar
Rédey G (1993) Conformal text representation. Eng Appl Artif Intell 6(1):65–71
Article Google Scholar
Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 task 9: sentiment analysis in Twitter. In: SemEval-2014
Google Scholar
Rubiolo M, Caliusco ML, Stegmayer G, Coronel M, Gareli Fabrizi M (2012) Knowledge discovery through ontology matching: an approach based on an artificial neural network model. Inf Sci 194:107–119
Article Google Scholar
Sagui F, Maguitman A, Chesñevar C, Simari G (2009) Modeling news trust: a defeasible logic programming approach. Iberoam J Artif Intell 12(40):63–72. Edited by AEPIA (Spanish Association of Artificial Intelligence), Madrid, Spain, ISSN 1137-3601
Google Scholar
Sauper C, Barzilay R (2000) Automatically generating wikipedia articles: a structure-aware approach, Proceedings of ACL
Google Scholar
Sauper C, Barzilay R (2009) Automatically generating wikipedia articles: a structure-aware approach. In: Proceedings of ACL. Suntec, Singapore, pp 2008–2016
Google Scholar
Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188
Google Scholar
Sidorov G (2014) Should syntactic N-grams contain names of syntactic relations? Int J Comput Linguist Appl 5(1):139–158
Google Scholar
Simplea (2018) AI Marketing, Chatbots, and Your CMS. https://simplea.com/Articles/AI-Marketing-Chatbots-and-Your-CMS
Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on empirical methods in natural language processing (EMNLP 2013)
Google Scholar
Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) (2003) Advances in learning theory: methods, models and applications, NATO-ASI series III: computer and systems sciences, vol 190. IOS Press, Amsterdam
Google Scholar
Tneogi (2018) Conversational interfaces need a different content management system. Chatbot Magazine. https://chatbotsmagazine.com/conversational-interfaces-need-a-different-content-management-system-b105bb6f716
Tunkelang D (2018) Search results clustering. https://queryunderstanding.com/search-results-clustering-b2fa64c6c809
Varshavsky R, Moshe T, Yuval P, Wilson DB (2010) Group recommendations in social networks. US Patent App 20110270774, Microsoft
Google Scholar
Vo NPA, Popescu O (2016) A multi-layer system for semantic textual similarity. In: 8th international conference on knowledge discovery and information Retrieval
Google Scholar
Vo NPA, Magnolini S, Popescu O (2015) FBK-HLT: a new framework for semantic textual similarity. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval-2015), NAACL-HLT 2015, At Denver, USA
Google Scholar
Wade M (2018) 5 ways chatbots are revolutionizing knowledge management. AtBot. https://blog.getbizzy.io/5-ways-chatbots-are-revolutionizing-knowledge-management-bdf925db66e9
Wenyin L, Quan X, Feng M, Qiu B (2010) A short text modeling method combining semantic and statistical information. Inf Sci 180(20):4031–4041
Article Google Scholar
Wray A (2002) Formulaic language and the lexicon. Cambridge University Press, Cambridge
Book Google Scholar
Zarrella G, Henderson J, Merkhofer EM, Strickhart L. (2015) MITRE: seven systems for semantic similarity in tweets. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Oracle (United States), San Jose, CA, USA
Boris Galitsky

Authors

Boris Galitsky
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Galitsky, B. (2019). A Content Management System for Chatbots. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-04299-8_9
Published: 05 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04298-1
Online ISBN: 978-3-030-04299-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics