Skip to main content

A Content Management System for Chatbots

  • Chapter
  • First Online:
Developing Enterprise Chatbots

Abstract

In this chapter we describe the industrial applications of our linguistic-based relevance technology for processing, classification and delivery of a stream of texts as data sources for chatbots. We present the content pipeline for eBay entertainment domain that employs this technology, and show that text processing relevance is the main bottleneck for its performance. A number of components of the chatbot content pipeline such as content mining, thesaurus formation, aggregation from multiple sources, validation, de-duplication, opinion mining and integrity enforcement need to rely on domain-independent efficient text classification, entity extraction and relevance assessment operations.

Text relevance assessment is based on the operation of syntactic generalization (SG, Chap. 5) which finds a maximum common sub-tree for a pair of parse trees for sentences. Relevance of two portions of texts is then defined as a cardinality of this sub-tree. SG is intended to substitute keyword-based analysis for more accurate assessment of relevance that takes phrase-level and sentence-level information into account. In the partial case of SG, where short expression are commonly used terms such as Facebook likes, SG ascends to the level of categories and a reasoning technique is required to map these categories in the course of relevance assessment.

A number of content pipeline components employ web mining which needs SG to compare web search results. We describe how SG works in a number of components in the content pipeline including personalization and recommendation, and provide the evaluation results for eBay deployment. Content pipeline support is implemented as an open source contribution OpenNLP.Similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aleman-Meza B, Halaschek C, Arpinar I, Sheth A (2003) A context-aware semantic association ranking. In: Proceedings of the first inernational workshop semantic web and databases (SWDB’03), pp 33–50

    Google Scholar 

  • Antoniou G, Billington D, Governatori G, Maher M (2001) Representation results for defeasible logic. ACM Trans Comput Log 2(2):255–287

    Article  MathSciNet  Google Scholar 

  • Banerjee S, Mitra P (2016) WikiWrite: generating wikipedia articles automatically. IJCAI, New York

    Google Scholar 

  • Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84

    Article  Google Scholar 

  • Baroni M, Chantree F, Kilgarriff A, Sharoff S (2008) Cleaneval: a competition for cleaning web pages. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international language resources and evaluation (LREC’08)

    Google Scholar 

  • Bartlett FC (1932) Remembering: a study in experimental and social psychology. Cambridge University Press

    Google Scholar 

  • Barzilay R, Lee L (2004) Catching the drift: probabilistic content models, with applications to generation and summarization. HLT-NAACL

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bordini RH, Braubach L (2006) A survey of programming languages and platforms for multi-agent systems. Informatica 30:33–44

    MATH  Google Scholar 

  • Bridle JS (1990) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing. Springer, pp 227–236

    Google Scholar 

  • Brzezinski D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of HAIS 2011, Springer Verlag lecture notes in artificial intelligence 6679, pp 155–163

    Google Scholar 

  • Cai D, Yu S, Wen J-R, Ma W-Y (2003) Extracting content structure for web pages based on visual representation. In: Zhou X, Zhang Y, Orlowska ME (eds) APWeb, volume 2642 of LNCS, Springer, pp 406–417

    Google Scholar 

  • Cascading (2013) Welcome to the Cascading ecosystem. www.cascading.org

  • Chesñevar C, Maguitman A, González MP (2009. Empowering recommendation technologies through argumentation. In: Rahwan I, Simari G (eds) Argumentation in artificial intelligence, Springer Verlag, (505 p, in press). ISBN 978-0-387-98196-3

    Google Scholar 

  • Cumby C, Roth D (2003) On kernel methods for relational learning. In: ICML, pp 107–14

    Google Scholar 

  • Cuzzocrea A (Editorial) (2012) Intelligent knowledge-based models and methodologies for complex information systems. Inf Sci 194:1–282

    Google Scholar 

  • de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: Proceedings of AAAI-05

    Google Scholar 

  • Ding L, Finin T, Joshi A, Pan R, Cost RS, Peng Y, Reddivari P, Doshi V, Sachs J (2004) Swoogle: a search and metadata engine for the semantic web. In: Proceedings of 13th ACM international conference on information and knowledge management (CIKM’04), pp 652–659

    Google Scholar 

  • Erenel Z, Altınçay H (2012) Nonlinear transformation of term frequencies for term weighting in text categorization. Eng Appl Artifi Intell 25(7):1505–1514

    Article  Google Scholar 

  • Ferretti E, Errecalde M, García AJ, Simari GR (2007) An application of defeasible logic programming to decision making in a robotic environment. In: LPNMR, pp 297–302

    Google Scholar 

  • Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Adelaide

    Google Scholar 

  • Galitsky B (2012) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091

    Google Scholar 

  • Galitsky B (2013) Transfer learning of syntactic structures for building taxonomies for search engines. Eng Appl Artif Intell 26(10):2504–2515

    Article  Google Scholar 

  • Galitsky B (2014) Learning parse structure of paragraphs and its applications in search. Eng Appl of AI 32:160–184

    Article  Google Scholar 

  • Galitsky B (2015). Finding a lattice of needles in a haystack: forming a query from a set of items of interest. In: FCA4AI@IJCAI

    Google Scholar 

  • Galitsky B (2016) A tool for efficient content compilation. In: COLING Demo C16-2042 Osaka, Japan

    Google Scholar 

  • Galitsky B (2017) Matching parse thickets for open domain question answering. Data Knowl Eng 107:24–50

    Article  Google Scholar 

  • Galitsky B, de la Rosa JL (2011) Concept-based learning of human behavior for customer relationship management. Spec Issue Inf Eng Appl Based on Lattices. Inf Sci 181(10):2016–2035

    Google Scholar 

  • Galitsky B, Ilvovsky D (2017) Chatbot with a discourse structure-driven dialogue management. In: EACL Demo E17-3022, Valencia, Spain

    Google Scholar 

  • Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. In: Clusters, orders, and trees: methods and applications, pp 341–376

    Google Scholar 

  • Galitsky B, Kuznetsov SO (2013) A web mining tool for assistance with creative writing. In: ECIR 2013: advances in information retrieval, pp 828–831

    Google Scholar 

  • Galitsky B, Levene M (2007) Providing rating services and subscriptions with web portal infrastructures. In: Encyclopedia of portal technologies and applications, pp 855–862

    Google Scholar 

  • Galitsky B, Usikov D (2008) Programming spatial algorithms in natural language. In: AAAI workshop technical report WS-08-11, Palo Alto, pp 16–24

    Google Scholar 

  • Galitsky B, Kuznetsov SO, Samokhin MV (2005) Analyzing conflicts with concept-based learning. In: International conference on conceptual structures, pp 307–322

    Google Scholar 

  • Galitsky B, Kuznetsov SO, Kovalerchuk B (2008) Argumentation vs meta-argumentation for the assessment of multi-agent conflict. Proc. of the AAAI Workshop on Metareasoning

    Google Scholar 

  • Galitsky B, Chen H, Du S (2009) Inversion of Forum Content Based on Authors’ Sentiments on Product Usability. AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0, pp 33–38

    Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL (2010) Inverting semantic structure under open domain opinion mining twenty-third international FLAIRS conference

    Google Scholar 

  • Galitsky B Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011) Using generalization of syntactic parse trees for taxonomy capture on the web. In: ICCS, pp 104–117

    Google Scholar 

  • Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring the semantic properties of sentences by mining syntactic parse trees. Data Knowl Eng 81:21–45

    Article  Google Scholar 

  • Galitsky B, Usikov D, Kuznetsov SO (2013) Parse thicket representations for answering multi-sentence questions. In: 20th international conference on conceptual structures, ICCS

    Google Scholar 

  • Galitsky B, Ilvovsky D, Kuznetsov SO (2015) Text classification into abstract classes based on discourse structure. In: Proceedings of recent advances in natural language processing, Hissar, Bulgaria, Sep 7–9 2015, pp 200–207

    Google Scholar 

  • Garcia A, Simari G (2004) Defeasible logic programming: an argumentative approach. Theory Pract Logic Program 4:95–138

    Article  MathSciNet  Google Scholar 

  • Gartner (2018) Gartner says 25 percent of customer service operations will use virtual customer assistants by 2020. https://www.gartner.com/newsroom/id/3858564

  • Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical Report. Stanford University

    Google Scholar 

  • Gomez SA, Chesñevar CI, Simari GR (2010) Reasoning with inconsistent ontologies through argumentation. Appl Artif Intell 24(1 & 2):102–148

    Article  Google Scholar 

  • Gomez H, Vilariño D, Pinto D, Sidorov G (2015) CICBUAPnlp: graph-based approach for answer selection in community question answering task. In: Sem Eavl-2015, pp 18–22

    Google Scholar 

  • Google (2018) Search using autocomplete. https://support.google.com/websearch/answer/106230

  • Harris Z (1982) Discourse and sublanguage. In: Kittredge R, Lehrberger J (eds) Sublanguage: studies of language in restricted semantic domains. Walter de Gruyter, Berlin, New York, pp 231–236

    Google Scholar 

  • Hendrikx M, Meijer S, Van Der Velden J, Iosup A (2013) Procedural content generation for games: a survey. ACM Trans Multimed Comput Commun Appl 9(1), Article 1, 22 pages

    Google Scholar 

  • iGoDigital (2013) https://www.crunchbase.com/organization/igodigital

  • Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11:37–50

    Article  Google Scholar 

  • Janusz A, ÅšlÄ™zak D, Nguyen HS (2012) Unsupervised similarity learning from textual data. Fundam Inform 119(3):319–336

    MathSciNet  MATH  Google Scholar 

  • Jindal R, Taneja S (2017) A novel weighted classification approach using linguistic text mining. Int J Comput Appl 180(2):9–15

    Google Scholar 

  • Johnson MR (2016) Procedural generation of linguistics, dialects, naming conventions and spoken sentences. In: Proceedings of 1st international joint conference of DiGRA and FDG

    Google Scholar 

  • Kong F, Zhou G (2011) Improving tree kernel-based event pronoun resolution with competitive information. In: Proceedings of the twenty-second international joint conference on artificial intelligence, vol 3, pp 1814–1819

    Google Scholar 

  • Krippendorff K (2004) Reliability in content analysis: some common misconceptions and recommendations. Hum Commun Res 30(3):411–433

    Google Scholar 

  • Kuncheva LI (2004) Classier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems, LNCS, vol 3077. Springer, Heidelberg, p 1

    Chapter  Google Scholar 

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning – Volume 32 (ICML’14), Eric P. Xing and Tony Jebara (Eds.), Vol 32

    Google Scholar 

  • Leouski AV, Croft WB (1996) An evaluation of techniques for clustering search results. UMass Tech Report #76. http://ciir.cs.umass.edu/pubfiles/ir-76.pdf

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710

    MathSciNet  Google Scholar 

  • Liapis A, Yannakakis GN, Togelius J (2013) Sentient sketchbook: computer-aided game level authoring. In: InFDG, pp 213–220

    Google Scholar 

  • Mahout (2013) https://mahout.apache.org

  • Makhalova T, Ilvovsky DA, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the First Workshop on Computing News Storylines @ACL

    Google Scholar 

  • Mann WC, Thompson SA (1988) Rhetoric al structure theory: toward a functional theory of text organization. Text 8(3):243–281

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge UK

    Book  Google Scholar 

  • Marcu D (1997) The rhetorical parsing, summarization, and generation of natural language texts. Unpublished Ph.D. dissertation, University of Toronto, Toronto, Canada

    Google Scholar 

  • Mavridis T, Symeonidis AL (2014) Semantic analysis of web documents for the generation of optimal content. Eng Appl Artif Intell 35:114–130

    Article  Google Scholar 

  • Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of ACM 17th conference on information and knowledge management (CIKM). Napa Valley, California

    Google Scholar 

  • McKeown KR (1985) Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  • Nagarajan V, Chandrasekar P (2014) Pivotal sentiment tree classifier. Int J Sci Technol Res 3(11):190

    Google Scholar 

  • OpenNLP (2018.) https://opennlp.apache.org/

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Nicoletta Calzolari N (ed) LREC’

    Google Scholar 

  • Pasternack J, Roth D (2009) Extracting article text from the web with maximum subsequence segmentation. In: WWW ‘09: proceedings of the 18th international conference on world wide web, ACM, New York, pp 971–980

    Google Scholar 

  • Rahwan I, Amgoud L (2006) An argumentation based approach for practical reasoning. In: International joint conference on autonomous agents and multi agent systems, pp 347–354

    Google Scholar 

  • Rédey G (1993) Conformal text representation. Eng Appl Artif Intell 6(1):65–71

    Article  Google Scholar 

  • Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 task 9: sentiment analysis in Twitter. In: SemEval-2014

    Google Scholar 

  • Rubiolo M, Caliusco ML, Stegmayer G, Coronel M, Gareli Fabrizi M (2012) Knowledge discovery through ontology matching: an approach based on an artificial neural network model. Inf Sci 194:107–119

    Article  Google Scholar 

  • Sagui F, Maguitman A, Chesñevar C, Simari G (2009) Modeling news trust: a defeasible logic programming approach. Iberoam J Artif Intell 12(40):63–72. Edited by AEPIA (Spanish Association of Artificial Intelligence), Madrid, Spain, ISSN 1137-3601

    Google Scholar 

  • Sauper C, Barzilay R (2000) Automatically generating wikipedia articles: a structure-aware approach, Proceedings of ACL

    Google Scholar 

  • Sauper C, Barzilay R (2009) Automatically generating wikipedia articles: a structure-aware approach. In: Proceedings of ACL. Suntec, Singapore, pp 2008–2016

    Google Scholar 

  • Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188

    Google Scholar 

  • Sidorov G (2014) Should syntactic N-grams contain names of syntactic relations? Int J Comput Linguist Appl 5(1):139–158

    Google Scholar 

  • Simplea (2018) AI Marketing, Chatbots, and Your CMS. https://simplea.com/Articles/AI-Marketing-Chatbots-and-Your-CMS

  • Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on empirical methods in natural language processing (EMNLP 2013)

    Google Scholar 

  • Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) (2003) Advances in learning theory: methods, models and applications, NATO-ASI series III: computer and systems sciences, vol 190. IOS Press, Amsterdam

    Google Scholar 

  • Tneogi (2018) Conversational interfaces need a different content management system. Chatbot Magazine. https://chatbotsmagazine.com/conversational-interfaces-need-a-different-content-management-system-b105bb6f716

  • Tunkelang D (2018) Search results clustering. https://queryunderstanding.com/search-results-clustering-b2fa64c6c809

  • Varshavsky R, Moshe T, Yuval P, Wilson DB (2010) Group recommendations in social networks. US Patent App 20110270774, Microsoft

    Google Scholar 

  • Vo NPA, Popescu O (2016) A multi-layer system for semantic textual similarity. In: 8th international conference on knowledge discovery and information Retrieval

    Google Scholar 

  • Vo NPA, Magnolini S, Popescu O (2015) FBK-HLT: a new framework for semantic textual similarity. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval-2015), NAACL-HLT 2015, At Denver, USA

    Google Scholar 

  • Wade M (2018) 5 ways chatbots are revolutionizing knowledge management. AtBot. https://blog.getbizzy.io/5-ways-chatbots-are-revolutionizing-knowledge-management-bdf925db66e9

  • Wenyin L, Quan X, Feng M, Qiu B (2010) A short text modeling method combining semantic and statistical information. Inf Sci 180(20):4031–4041

    Article  Google Scholar 

  • Wray A (2002) Formulaic language and the lexicon. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zarrella G, Henderson J, Merkhofer EM, Strickhart L. (2015) MITRE: seven systems for semantic similarity in tweets. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Galitsky, B. (2019). A Content Management System for Chatbots. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04299-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04298-1

  • Online ISBN: 978-3-030-04299-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics