Abstract
Problem solving with experiences that are recorded in text form requires a mapping from text to structured cases, so that case comparison can provide informed feedback for reasoning. One of the challenges is to acquire an indexing vocabulary to describe cases. We explore the use of machine learning and statistical techniques to automate aspects of this acquisition task. A propositional semantic indexing tool, Psi, which forms its indexing vocabulary from new features extracted as logical combinations of existing keywords, is presented. We propose that such logical combinations correspond more closely to natural concepts and are more transparent than linear combinations. Experiments show Psi-derived case representations to have superior retrieval performance to the original keyword-based representations. Psi also has comparable performance to Latent Semantic Indexing, a popular dimensionality reduction technique for text, which unlike Psi generates linear combinations of the original features.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in KD and DM, pp. 307–327. AAAI/MIT (1995)
Aha, D. (ed.): Mixed-Initiatives Workshop at 6th ECCBR. Springer, Heidelberg (2002)
Bruninghaus, S., Ashley, K.D.: Bootstrapping case base development with annotated case summaries. In: Proc. of the 2nd ICCBR, pp. 59–73. Springer, Heidelberg (1999)
Cohen, W.W.: Providing database-like access to the web using queries based on textual similarity. In: Proc of the Int. Conf. on Management of Data, pp. 558–560 (1998)
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorisation. ACM Transactions in Information Systems 17(2), 141–173 (1999)
Das, S.: Filters, wrappers and a boosting based hybrid for feature selection. In: Proc. of the 18th ICML, pp. 74–81. Morgan Kaufmann, San Francisco (2001)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Proc. of the 7th ECCBR, pp. 128–141. Springer, Heidelberg (2004)
Forman, G., Cohen, I.: Learning with Little: Comparison of Classifiers Given Little Training. In: Proc. of the 8th European Conf. on PKDD, pp. 161–172 (2004)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of the 13th ICML, pp. 148–156 (1996)
Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proc. of the 17th Int FLAIRS Conference, pp. 307–315. AAAI Press, Menlo Park (2004)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proc. of the 9th Int Workshop on Machine Learning, pp. 233–240 (1992)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF. Technical report, Carnegie Mellon University CMU-CS-96Â 118 (1996)
Lenz, M.: Defining knowledge layers for textual CBR. In: Proc. of the 4th European Workshop on CBR, pp. 298–309. Springer, Heidelberg (1998)
Nahm, U.Y., Mooney, R.J.: Mining soft-matching rules from textual data. In: Proc. of the 17th IJCAI, pp. 979–984 (2001)
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)
Salton, G., McGill, M.J.: An introduction to modern IR. McGraw-Hill, New York (1983)
Sebastiani, F.: ML in automated text categorisation. ACM Computing surveys 34, 1–47 (2002)
Wiratunga, N., Koychev, I., Massie, S.: Feature selection and generalisation for textual retrieval. In: Proc. of the 7th ECCBR, pp. 806–820. Springer, Heidelberg (2004)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorisation. In: Proc. of the 14th ICML, pp. 412–420. Springer, Heidelberg (1997)
Zelikovitz, S.: Mining for features to improve classification. In: Proc. of Machine Learning, Models, Technologies and Applications (2003)
Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Proc. of the 10th Int. Conf. on Information and KM (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I. (2005). A Propositional Approach to Textual Case Indexing. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_38
Download citation
DOI: https://doi.org/10.1007/11564126_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)