A Propositional Approach to Textual Case Indexing

Wiratunga, Nirmalie; Lothian, Rob; Chakraborti, Sutanu; Koychev, Ivan

doi:10.1007/11564126_38

Nirmalie Wiratunga²³,
Rob Lothian²³,
Sutanu Chakraborti²³ &
…
Ivan Koychev²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3721))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2846 Accesses
8 Citations

Abstract

Problem solving with experiences that are recorded in text form requires a mapping from text to structured cases, so that case comparison can provide informed feedback for reasoning. One of the challenges is to acquire an indexing vocabulary to describe cases. We explore the use of machine learning and statistical techniques to automate aspects of this acquisition task. A propositional semantic indexing tool, Psi, which forms its indexing vocabulary from new features extracted as logical combinations of existing keywords, is presented. We propose that such logical combinations correspond more closely to natural concepts and are more transparent than linear combinations. Experiments show Psi-derived case representations to have superior retrieval performance to the original keyword-based representations. Psi also has comparable performance to Latent Semantic Indexing, a popular dimensionality reduction technique for text, which unlike Psi generates linear combinations of the original features.

Download to read the full chapter text

Chapter PDF

Techniques for Processing LSI Queries Incorporating Phrases

Embedded Word Representations for Rich Indexing: A Case Study for Medical Records

Supervised Semantic Indexing Using Sub-spacing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in KD and DM, pp. 307–327. AAAI/MIT (1995)
Google Scholar
Aha, D. (ed.): Mixed-Initiatives Workshop at 6th ECCBR. Springer, Heidelberg (2002)
Google Scholar
Bruninghaus, S., Ashley, K.D.: Bootstrapping case base development with annotated case summaries. In: Proc. of the 2nd ICCBR, pp. 59–73. Springer, Heidelberg (1999)
Google Scholar
Cohen, W.W.: Providing database-like access to the web using queries based on textual similarity. In: Proc of the Int. Conf. on Management of Data, pp. 558–560 (1998)
Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorisation. ACM Transactions in Information Systems 17(2), 141–173 (1999)
Article Google Scholar
Das, S.: Filters, wrappers and a boosting based hybrid for feature selection. In: Proc. of the 18th ICML, pp. 74–81. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Proc. of the 7th ECCBR, pp. 128–141. Springer, Heidelberg (2004)
Google Scholar
Forman, G., Cohen, I.: Learning with Little: Comparison of Classifiers Given Little Training. In: Proc. of the 8th European Conf. on PKDD, pp. 161–172 (2004)
Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of the 13th ICML, pp. 148–156 (1996)
Google Scholar
Gupta, K.M., Aha, D.W.: Towards acquiring case indexing taxonomies from text. In: Proc. of the 17th Int FLAIRS Conference, pp. 307–315. AAAI Press, Menlo Park (2004)
Google Scholar
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proc. of the 9th Int Workshop on Machine Learning, pp. 233–240 (1992)
Google Scholar
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF. Technical report, Carnegie Mellon University CMU-CS-96 118 (1996)
Google Scholar
Lenz, M.: Defining knowledge layers for textual CBR. In: Proc. of the 4th European Workshop on CBR, pp. 298–309. Springer, Heidelberg (1998)
Google Scholar
Nahm, U.Y., Mooney, R.J.: Mining soft-matching rules from textual data. In: Proc. of the 17th IJCAI, pp. 979–984 (2001)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)
Article Google Scholar
Salton, G., McGill, M.J.: An introduction to modern IR. McGraw-Hill, New York (1983)
Google Scholar
Sebastiani, F.: ML in automated text categorisation. ACM Computing surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Wiratunga, N., Koychev, I., Massie, S.: Feature selection and generalisation for textual retrieval. In: Proc. of the 7th ECCBR, pp. 806–820. Springer, Heidelberg (2004)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorisation. In: Proc. of the 14th ICML, pp. 412–420. Springer, Heidelberg (1997)
Google Scholar
Zelikovitz, S.: Mining for features to improve classification. In: Proc. of Machine Learning, Models, Technologies and Applications (2003)
Google Scholar
Zelikovitz, S., Hirsh, H.: Using LSI for text classification in the presence of background text. In: Proc. of the 10th Int. Conf. on Information and KM (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, The Robert Gordon University, Aberdeen, Scotland, AB25 1HG, UK
Nirmalie Wiratunga, Rob Lothian & Sutanu Chakraborti
Institute of Mathematics and Informatics, Bulgarian Academy of Science, Sofia, 1113, Bulgaria
Ivan Koychev

Authors

Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Rob Lothian
View author publications
You can also search for this author in PubMed Google Scholar
Sutanu Chakraborti
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Koychev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6, 4050-190, Porto, Portugal
Luís Torgo
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel Brazdil
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
Faculty of Economics of the University of Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I. (2005). A Propositional Approach to Textual Case Indexing. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_38

Download citation

DOI: https://doi.org/10.1007/11564126_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Propositional Approach to Textual Case Indexing

Abstract

Chapter PDF

Similar content being viewed by others

Techniques for Processing LSI Queries Incorporating Phrases

Embedded Word Representations for Rich Indexing: A Case Study for Medical Records

Supervised Semantic Indexing Using Sub-spacing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Propositional Approach to Textual Case Indexing

Abstract

Chapter PDF

Similar content being viewed by others

Techniques for Processing LSI Queries Incorporating Phrases

Embedded Word Representations for Rich Indexing: A Case Study for Medical Records

Supervised Semantic Indexing Using Sub-spacing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation