Taxonomic Semantic Indexing for Textual Case-Based Reasoning

  • Juan A. Recio-Garcia
  • Nirmalie Wiratunga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6176)


Case-Based Reasoning (CBR) solves problems by reusing past problem-solving experiences maintained in a casebase. The key CBR knowledge container therefore is its casebase. However there are further containers such as similarity, reuse and revision knowledge that are also crucial. Automated acquisition approaches are particularly attractive to discover knowledge for such containers. Majority of research in this area is focused on introspective algorithms to extract knowledge from within the casebase. However the rapid increase in Web applications has resulted in large volumes of user generated experiential content. This forms a valuable source of background knowledge for CBR system development. In this paper we present a novel approach to acquiring knowledge from Web pages. The primary knowledge structure is a dynamically generated taxonomy which once created can be used during the retrieve and reuse stages of the CBR cycle. Importantly this taxonomy is pruned according to a clustering-based sense disambiguation heuristic that uses similarity over the solution vocabulary of cases. Algorithms presented in the paper are applied to several online FAQ systems consisting of textual problem-solving cases. The goodness of generated taxonomies is evidenced by improved semantic comparison of text due to successful sense disambiguation resulting in higher retrieval accuracy. Our results show significant improvements over standard text comparison alternatives.


Latent Dirichlet Allocation Word Sense Disambiguation Inverse Pattern Ontology Match Snippet Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using word-net. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136–145 (2002)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370–383 (2007)CrossRefGoogle Scholar
  4. 4.
    Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of AI Research 24, 305–339 (2005)zbMATHGoogle Scholar
  5. 5.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  6. 6.
    Díaz-Agudo, B., González-Calero, P.A., Recio-García, J.A., Sánchez-Ruiz-Granados, A.A.: Building CBR systems with jcolibri. Sci. Comput. Program. 69(1-3), 68–75 (2007)zbMATHCrossRefGoogle Scholar
  7. 7.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of The 20th International Joint Conference for Artificial Intelligence, Hyderabad, India (2007)Google Scholar
  8. 8.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Morristown (1992)CrossRefGoogle Scholar
  9. 9.
    Keller, F., Lapata, M., Ourioupina, O.: Using the web to overcome data sparseness. In: EMNLP 2002: Proceedings of the ACL 2002 conference on Empirical Methods in Natural Language Processing, pp. 230–237. Association for Computational Linguistics, Morristown (2002)CrossRefGoogle Scholar
  10. 10.
    Leake, D., Powell, J.: Knowledge planning and learned personalization for web-based case adaptation. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 284–298. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proc. of SIGDOC 1986: 5th International Conference on Systems Documentation, pp. 24–26 (1986)Google Scholar
  12. 12.
    Marta Sabou, M.D., Motta, E.: Exploring the semantic web as background knowledge for ontology matching, 156–190 (2008)Google Scholar
  13. 13.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity - measuring the relatedness of concepts. In: Proceedings of the Nineteenth National Conference on Artificial Intelligence, AAAI 2004 (2004)Google Scholar
  14. 14.
    Philipp Cimiano, S.H., Staab, S.: Towards the self-annotating web. In: WWW 2004: Proceedings of the 13th International Conference on World Wide Web, pp. 462–471. ACM, New York (2004)CrossRefGoogle Scholar
  15. 15.
    Plaza, E.: Semantics and experience in the future web. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 44–58. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Recio-García, J.A., Díaz-Agudo, B., González-Calero, P.A., Sánchez-Ruiz-Granados, A.: Ontology based CBR with jcolibri. In: Applications and Innovations in Intelligent Systems XIV. SGAI 2006, pp. 149–162. Springer, Heidelberg (2006)Google Scholar
  17. 17.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  18. 18.
    Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int. J. Metadata Semant. Ontologies 2(2), 112–122 (2007)CrossRefGoogle Scholar
  19. 19.
    Simpson, G.B.: Lexical ambiguity and its role in models of word recognition. Psychological Bulletin 92(2), 316–340 (1984)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI 2006: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1419–1424. AAAI Press, Menlo Park (2006)Google Scholar
  21. 21.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, London (2006)zbMATHGoogle Scholar
  22. 22.
    Weber, R.O., Ashley, K.D., Brüninghaus, S.: Textual case-based reasoning. The Knowledge Engineering Review 20(03), 255–260 (2006)CrossRefGoogle Scholar
  23. 23.
    Wiratunga, N., Lothian, R., Chakraborty, S., Koychev, I.: Propositional approach to textual case indexing. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 380–391. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  24. 24.
    Wiratunga, N., Lothian, R., Massie, S.: Unsupervised feature selection for text data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Zornitsa Kozareva, E.R., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of ACL 2008: HLT, pp. 1048–1056. Association for Computational Linguistics, Columbus (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Juan A. Recio-Garcia
    • 1
  • Nirmalie Wiratunga
    • 2
  1. 1.Universidad Complutense de MadridSpain
  2. 2.Robert Gordon UniversityAberdeenUnited Kingdom

Personalised recommendations