A Propositional Approach to Textual Case Indexing

  • Nirmalie Wiratunga
  • Rob Lothian
  • Sutanu Chakraborti
  • Ivan Koychev
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3721)

Abstract

Problem solving with experiences that are recorded in text form requires a mapping from text to structured cases, so that case comparison can provide informed feedback for reasoning. One of the challenges is to acquire an indexing vocabulary to describe cases. We explore the use of machine learning and statistical techniques to automate aspects of this acquisition task. A propositional semantic indexing tool, Psi, which forms its indexing vocabulary from new features extracted as logical combinations of existing keywords, is presented. We propose that such logical combinations correspond more closely to natural concepts and are more transparent than linear combinations. Experiments show Psi-derived case representations to have superior retrieval performance to the original keyword-based representations. Psi also has comparable performance to Latent Semantic Indexing, a popular dimensionality reduction technique for text, which unlike Psi generates linear combinations of the original features.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in KD and DM, pp. 307–327. AAAI/MIT (1995)Google Scholar
  2. 2.
    Aha, D. (ed.): Mixed-Initiatives Workshop at 6th ECCBR. Springer, Heidelberg (2002)Google Scholar
  3. 3.
    Bruninghaus, S., Ashley, K.D.: Bootstrapping case base development with annotated case summaries. In: Proc. of the 2nd ICCBR, pp. 59–73. Springer, Heidelberg (1999)Google Scholar
  4. 4.
    Cohen, W.W.: Providing database-like access to the web using queries based on textual similarity. In: Proc of the Int. Conf. on Management of Data, pp. 558–560 (1998)Google Scholar
  5. 5.
    Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorisation. ACM Transactions in Information Systems 17(2), 141–173 (1999)CrossRefGoogle Scholar
  6. 6.
    Das, S.: Filters, wrappers and a boosting based hybrid for feature selection. In: Proc. of the 18th ICML, pp. 74–81. Morgan Kaufmann, San Francisco (2001)Google Scholar
  7. 7.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  8. 8.
    Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Proc. of the 7th ECCBR, pp. 128–141. Springer, Heidelberg (2004)Google Scholar
  9. 9.
    Forman, G., Cohen, I.: Learning with Little: Comparison of Classifiers Given Little Training. In: Proc. of the 8th European Conf. on PKDD, pp. 161–172 (2004)Google Scholar
  10. 10.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of the 13th ICML, pp. 148–156 (1996)Google Scholar
  11. 11.
    Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proc. of the 17th Int FLAIRS Conference, pp. 307–315. AAAI Press, Menlo Park (2004)Google Scholar
  12. 12.
    Iba, W., Langley, P.: Induction of one-level decision trees. In: Proc. of the 9th Int Workshop on Machine Learning, pp. 233–240 (1992)Google Scholar
  13. 13.
    Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF. Technical report, Carnegie Mellon University CMU-CS-96 118 (1996)Google Scholar
  14. 14.
    Lenz, M.: Defining knowledge layers for textual CBR. In: Proc. of the 4th European Workshop on CBR, pp. 298–309. Springer, Heidelberg (1998)Google Scholar
  15. 15.
    Nahm, U.Y., Mooney, R.J.: Mining soft-matching rules from textual data. In: Proc. of the 17th IJCAI, pp. 979–984 (2001)Google Scholar
  16. 16.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)CrossRefGoogle Scholar
  17. 17.
    Salton, G., McGill, M.J.: An introduction to modern IR. McGraw-Hill, New York (1983)Google Scholar
  18. 18.
    Sebastiani, F.: ML in automated text categorisation. ACM Computing surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Wiratunga, N., Koychev, I., Massie, S.: Feature selection and generalisation for textual retrieval. In: Proc. of the 7th ECCBR, pp. 806–820. Springer, Heidelberg (2004)Google Scholar
  20. 20.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorisation. In: Proc. of the 14th ICML, pp. 412–420. Springer, Heidelberg (1997)Google Scholar
  21. 21.
    Zelikovitz, S.: Mining for features to improve classification. In: Proc. of Machine Learning, Models, Technologies and Applications (2003)Google Scholar
  22. 22.
    Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Proc. of the 10th Int. Conf. on Information and KM (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Nirmalie Wiratunga
    • 1
  • Rob Lothian
    • 1
  • Sutanu Chakraborti
    • 1
  • Ivan Koychev
    • 2
  1. 1.School of ComputingThe Robert Gordon UniversityAberdeen, ScotlandUK
  2. 2.Institute of Mathematics and InformaticsBulgarian Academy of ScienceSofiaBulgaria

Personalised recommendations