AIML Knowledge Base Construction from Text Corpora

  • Giovanni De Gasperis
  • Isabella Chiari
  • Niva Florio
Part of the Studies in Computational Intelligence book series (SCI, volume 427)


Text mining (TM) and computational linguistics (CL) are computationally intensive fields where many tools are becoming available to study large text corpora and exploit the use of corpora for various purposes. In this chapter we will address the problem of building conversational agents or chatbots from corpora for domain-specific educational purposes. After addressing some linguistic issues relevant to the development of chatbot tools from corpora, a methodology to systematically analyze large text corpora about a limited knowledge domain will be presented. Given the Artificial Intelligence Markup Language as the “assembly language” for the artificial intelligence conversational agents we present a way of using text corpora as seed from which a set of “source files” can be derived. More specifically we will illustrate how to use corpus data to extract relevant keywords, multiword expressions, glossary building and text patterns in order to build an AIML knowledge base that could be later used to build interactive conversational systems. The approach we propose does not require deep understanding techniques for the analysis of text.

As a case study it will be shown how to build the knowledge base of an English conversational agent for educational purpose from a child story that can answer question about characters, facts and episodes of the story. A discussion of the main linguistic and methodological issues and further improvements is offered in the final part of the chapter.


Latent Semantic Analysis Source Text Question Answering Text Corpus Turing Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agostaro, F., Augello, A., Pilato, G., Vassallo, G., Gaglio, S.: A Conversational Agent Based on a Conceptual Interpretation of a Data Driven Semantic Space. In: Bandini, S., Manzoni, S. (eds.) AI*IA 2005. LNCS (LNAI), vol. 3673, pp. 381–392. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. Augello, A., Vassallo, G., Gaglio, S., Pilato, G.: A Semantic Layer on Semi-Structured Data Sources for Intuitive Chatbots. In: International Conference on Complex, Intelligent and Software Intensive Systems, pp. 760–765 (2009)Google Scholar
  3. Augello, A., Gambino, O., Cannella, V., Pirrone, R., Gaglio, S., Pilato, G.: An Emotional Talking Head for a Humoristic Chatbot. In: Applications of Digital Signal Processing. InTech (2011)Google Scholar
  4. Batacharia, B., Levy, D., Catizone, R., Krotov, A., Wilks, Y.: CONVERSE: a conversational companion. Kluwer Iternational Series in Engineering and Computer Science, pp. 205–216. Kluwer Academic Publishers Group (1999)Google Scholar
  5. Chantarotwong, B.: The learning chatbot. Ph.D. Thesis. UC Berkeley School of Information (2006)Google Scholar
  6. Chomsky, N.: Turing on the ”Imitation game”. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, pp. 103–106. Springer, New York (2008)Google Scholar
  7. Colby, K.M., Weber, S., Hilf, F.D.: Artificial Paranoia. Artificial Intelligence 2(1), 1–15 (1971)CrossRefGoogle Scholar
  8. Cliff, D., Atwell, E.: Leeds Unix Knowledge Expert: a domain-dependent Expert System generated with domain-independent tools. BCS-SGES: British Computer Society Specialist Group on Expert Systems Journal 19, 49–51 (1987)Google Scholar
  9. De Gasperis, G.: Building an AIML Chatter Bot Knowledge-Base Starting from a FAQ and a Glossary. JE-LKS. Journal of e-Learning and Knowledge Society 2, 79–88 (2010)Google Scholar
  10. De Gasperis, G., Florio, N.: Learning to read/type a second language in a chatbot enhanced environment. In: Proceedings of ebTEL 2012: International Workshop on Evidenced-based Technology Enhanced Learning, University of Salamanca, March 28-30 (accepted for publication, 2012)Google Scholar
  11. De Pietro, O., Frontera, G.: TutorBot: An Application AIML-based for Web-Learning. In: Advanced Technology for Learning, vol. 2(1), ACTA Press (2005)Google Scholar
  12. Epstein, R., Roberts, G., Beber, G.: Parsing the Turing test: philosophical and methodological issues in the quest for the thinking computer. Springer, New York (2008)Google Scholar
  13. Eynon, R., Davie, C., Wilks, Y.: The Learning Companion: an Embodied Conversational Agent for Learning. In: Conference on WebSci 2009: Society On-Line (2009)Google Scholar
  14. Fellbaum, C.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  15. Fellbaum, C.: WordNet and wordnets. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, pp. 665–670. Elsevier, Oxford (2005)Google Scholar
  16. Feng, D., Shaw, E., Kim, J., Hovy, E.: An intelligent Discussion-bot for answering student queries in threaded discussions. In: Proceeding of the International Conference on Intelligent User Interfaces, IUI, pp. 171–177 (2006)Google Scholar
  17. Guiraud, P.: Problèmes et méthodes de la statistique linguistique. Presses universitaires de France, Paris (1960)Google Scholar
  18. Heller, B., Procter, M., Mah, D., Jewell, L., Cheung, B.: Freudbot: An investigation of chatbot technology in distance education. In: Proceedings of the World Conference on Multimedia, Hypermedia and Telecommunication (2005)Google Scholar
  19. Hutchens, J.L.: How to pass the Turing test by cheating. School of Electrical, Electronic and Computer Engineering research report TR97-05. University of Western Australia, Perth (1996)Google Scholar
  20. Hutchens, J.L., Alder, M.D.: Introducing MegaHAL. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 271–274 (1998)Google Scholar
  21. Jia, J.: The study of the application of a keywords-based chatbot system on the teaching of foreign languages, Arxiv preprint cs/0310018 (2003)Google Scholar
  22. Jia, J.: The study of the application of a web-based chatbot system on the teaching of foreign languages. In: Ferdig, R.E., Crawford, C., Carlsen, R., Davis, N., Price, J., Weber, R., Willis, D.A. (eds.) Proceedings of Society for Information Technology and Teacher Education International Conference 2004, pp. 1201–1207 (2004)Google Scholar
  23. Jia, J.: CSIEC: A computer assisted English learning chatbot based on textual knowledge and reasoning. Knowledge-Based Systems 22(4), 249–255 (2009)CrossRefGoogle Scholar
  24. Kerly, A., Hall, P., Bull, S.: Bringing chatbots into education: Towards natural language negotiation of open learner models. Know.-Based Syst. 20(2), 177–185 (2007)CrossRefGoogle Scholar
  25. Kerry, A., Ellis, R., Bull, S.: Conversational Agents in E-Learning. In: Applications and Innovations in Intelligent Systems XVI, pp. 169–182 (2009)Google Scholar
  26. Kim, Y.G., Lee, C.H., Han, S.G.: Educational Application of Dialogue System to Support e-Learning. In: Association for the Advancement of Computing in Education, AACE (2002)Google Scholar
  27. Knill, O., Carlsson, J., Chi, A., Lezama, M.: An artificial intelligence experiment in college math education (2004), Preprint,
  28. Leech, G., Rayson, P., Wilson, A.: Word frequencies in written and spoken English: based on the British National Corpus. Longman, London (2001)Google Scholar
  29. Mauldin, M.L.: Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. In: AAAI 1994 Proceedings of the Twelfth National Conference on Artificial Intelligence, vol. 1, pp. 16–21 (1994)Google Scholar
  30. Moor, J.: The Turing test: the elusive standard of artificial intelligence, vol. 6, p. 273. Kluwer Academic Publishers, Dordrecht (2003)zbMATHGoogle Scholar
  31. Pirner, J.: The beast can talk (2012), Pdf. Published online, (accessed February 2012)
  32. Pirrone, R., Cannella, V., Russo, G.: Awareness mechanisms for an intelligent tutoring system. In: Proc. of 23th Association for the Advancement of Artificial Intelligence (2008)Google Scholar
  33. Santos-Pérez, M., González-Parada, E., Cano-García, J.M.: AVATAR: An Open Source Architecture for Embodied Conversational Agents in Smart Environments. In: Bravo, J., Hervás, R., Villarreal, V. (eds.) IWAAL 2011. LNCS, vol. 6693, pp. 109–115. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  34. Schmid, H.: Probabilistic Part-of-Speech Tagging Using DecisionTrees. Paperpresented to the Proceedings of International Conference on New Methods in Language Processing (1994)Google Scholar
  35. Shawar, B.A., Atwell, E.: Using dialogue corpora to train a chatbot. In: Archer, D., Rayson, P., Wilson, A., McEnery, T. (eds.) Proceedings of the Corpus Linguistics 2003 Conference, pp. 681–690. Lancaster University (2003)Google Scholar
  36. Shawar, B.A., Atwell, E.: Machine Learning from dialogue corpora to generate chatbots. Expert Update Journal 6(3), 25–29 (2003)Google Scholar
  37. Shawar, B.A., Atwell, E.: A chatbot system as a tool to animate a corpus. ICAME J. 29, 5–24 (2005)Google Scholar
  38. Shawar, B.A., Atwell, E.: Chatbots: are they really useful? LDV Forum 22, 29–49 (2007)Google Scholar
  39. Shawar, B.A., Atwell, E.: Different measurements metrics to evaluate a chatbot system. In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pp. 89–96 (2007)Google Scholar
  40. Shieber, S.M.: The Turing test: verbal behavior as the hallmark of intelligence. MIT Press, Cambridge (2004)zbMATHGoogle Scholar
  41. Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)MathSciNetCrossRefGoogle Scholar
  42. Ueno, M., Mori, N., Matsumoto, K.: Novel Chatterbot System Utilizing Web Information. In: Distributed Computing and Artificial Intelligence, pp. 605–612 (2010)Google Scholar
  43. Veletsianos, G., Heller, R., Overmyer, S., Procter, M.: Conversational agents in virtual worlds: Bridging disciplines. Wiley Online Library, British Journal of Educational Technology 41(1), 123–140 (2010)Google Scholar
  44. Vieira, A.C., Teixeria, L., Timteo, A., Tedesco, P., Barros, F.: Analyzing online collaborative dialogues: The OXEnTCH-Chat. In: Proceedings of the Intelligent Tutoring Systems 7th International Conference, pp. 72–101. IEEE (2004)Google Scholar
  45. Vrajitoru, D.: Evolutionary sentence building for chatterbots. In: GECCO 2003 Late Breaking Papers, pp. 315–321 (2003)Google Scholar
  46. Vrajitoru, D.: NPCs and Chatterbots with Personality and Emotional Response. In: 2006 IEEE Symposium on Computational Intelligence and Games, pp. 142–147 (2006)Google Scholar
  47. Wallace, R.S., Tomabechi, H., Aimless, D.: Chatterbots Go Native: Considerations for an eco-system fostering the development of artificial life forms in a human world (2003), (accessed February 2012)
  48. Wallace, R.S.: The Anatomy of A.L.I.C.E. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing Test, pp. 181–210. Springer, Netherlands (2009)CrossRefGoogle Scholar
  49. Weizenbaum, J.: ELIZA A computer program for the study of natural language communication between man and machine. Communications of the ACM 10(8), 36–45 (1966)CrossRefGoogle Scholar
  50. Wilensky, R., Chin, D.N., Luria, M., Martin, J., Mayfield, J., Wu, D.: The Berkeley UNIX consultant project. Computational Linguistics 14(4), 35–84 (1988)Google Scholar
  51. Wu, Y., Wang, G., Li, W., Li, Z.: Automatic Chatbot Knowledge Acquisition from Online Forum via Rough Set and Ensemble Learning. In: IFIP International Conference on Network and Parallel Computing, NPC 2008, pp. 242–246. IEEE (2008)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2013

Authors and Affiliations

  • Giovanni De Gasperis
    • 1
  • Isabella Chiari
    • 2
  • Niva Florio
    • 1
  1. 1.Dipartimento di Ingegneria e Scienze dell’Informazione, MatematicaUniversità degli Studi dell’AquilaL’AquilaItaly
  2. 2.Dipartimento di Scienze documentarie, linguistico-filologiche e geograficheUniversità degli Studi di Roma “La Sapienza”RomaItaly

Personalised recommendations