Visualizing and Evaluating Complexity of Textual Case Bases

  • Sutanu Chakraborti
  • Ulises Cerviño Beresi
  • Nirmalie Wiratunga
  • Stewart Massie
  • Robert Lothian
  • Deepak Khemani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5239)

Abstract

This paper deals with two relatively less well studied problems in Textual CBR, namely visualizing and evaluating complexity of textual case bases. The first is useful in case base maintenance, the second in making informed choices regarding case base representation and tuning of parameters for the TCBR system, and also for explaining the behaviour of different retrieval/classification techniques over diverse case bases. We present an approach to visualize textual case bases by “stacking” similar cases and features close to each other in an image derived from the case-feature matrix. We propose a complexity measure called GAME that exploits regularities in stacked images to evaluate the alignment between problem and solution components of cases. GAME class , a counterpart of GAME in classification domains, shows a strong correspondence with accuracies reported by standard classifiers over classification tasks of varying complexity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Mitchell, T.: Machine Learning. Mc Graw Hill International (1997)Google Scholar
  2. 2.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proc. of ECML, pp. 137–142. ACM Press, New York (1998)Google Scholar
  3. 3.
    Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised Latent Semantic Indexing using Adaptive Sprinkling. In: Proc. IJCAI, pp. 1582–1587 (2007)Google Scholar
  4. 4.
    Lamontagne, L.: Textual CBR Authoring using Case Cohesion, in TCBR’06 - Reasoning with Text. In: Proc of the ECCBR 2006 Workshops, pp. 33–43 (2006)Google Scholar
  5. 5.
    Feldman, R., Sanger, J.: The Text Mining Handbook. Cambridge University Press, Cambridge (2007)Google Scholar
  6. 6.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: ndexing by Latent Semantic Analysis. JASIST 41(6), 391–407 (1990)CrossRefGoogle Scholar
  7. 7.
    Massie, S.: Complexity Modelling for Case Knowledge Maintenance in Case Based Reasoning, PhD Thesis, The Robert Gordon University (2006)Google Scholar
  8. 8.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 49–73 (2003)CrossRefGoogle Scholar
  9. 9.
    Delany, S.J., Cunningham, P.: An Analysis of Case-base Editing in a Spam Filtering System. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)Google Scholar
  10. 10.
    HCE visualization, HCI Lab, University of Maryland, http://www.cs.umd.edu/hcil/hce/
  11. 11.
    Delany, S.J., Bridge, D.: Feature-Based and Feature-Free Textual CBR: A Comparison in Spam Filtering. In: Proc. of Irish Conference on AI and Cognitive Science, pp. 244–253 (2006)Google Scholar
  12. 12.
    Vinay, V., Cox, I.J., Milic-Fralyling, N., Wood, K.: Measuring the Complexity of a Collection of Documents. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 107–118. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Díaz-Agudo, B., González-Calero, P.A.: Formal concept analysis as a support technique for CBR. Knowledge Based Syst. 14(3-4), 163–171 (2001)CrossRefGoogle Scholar
  14. 14.
    Brüninghaus, S., Ashley, K.D.: The Role of Information Extraction for Textual CBR. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 74–89. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Chakraborti, S., Watt, S., Wiratunga, N.: Introspective Knowledge Acquisition in Case Retrieval Networks for Textual CBR. In: Proc. of the 9th UK CBR Workshop, pp. 51–61 (2004)Google Scholar
  16. 16.
    Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: A Propositional Approach to Textual Case Indexing. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 380–391. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    White, R.W., Ruthven, I., Jose, J.M.: A Study of Factors Affecting the Utility of Implicit Relevance Feedback. In: Proc. of SIGIR 2005 (2005)Google Scholar
  18. 18.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  19. 19.
    Bille, P.: A survey of tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press and McGraw-Hill (2001)Google Scholar
  21. 21.
    Berry, M., Dumais, S., O‘Brien, G.: Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573–595 (1995)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sutanu Chakraborti
    • 1
  • Ulises Cerviño Beresi
    • 2
  • Nirmalie Wiratunga
    • 2
  • Stewart Massie
    • 2
  • Robert Lothian
    • 2
  • Deepak Khemani
    • 3
  1. 1.Systems Research LabTata Research Development and Design CentrePuneIndia
  2. 2.School of ComputingThe Robert Gordon UniversityScotlandUK
  3. 3.Department of Computer Science and EngineeringIndian Institute of Technology, MadrasChennaiIndia

Personalised recommendations