Skip to main content

Feature Selection and Generalisation for Retrieval of Textual Cases

  • Conference paper
Advances in Case-Based Reasoning (ECCBR 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3155))

Included in the following conference series:

Abstract

Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a significant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show significant improvement in retrieval accuracy whenever generalised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–327. AAAI/MIT Press (1995)

    Google Scholar 

  2. Alvarez, W., Ruiz, C.: Collaborative recommendation via adaptive association rule mining. In: Proceedings of the International Workshop on Web Mining for E-Commerce, pp. 35–41 (2000)

    Google Scholar 

  3. Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: Proceedings of the 14th Conference on Computational Statistics (2002)

    Google Scholar 

  4. Bruninghaus, S., Ashley, K.: Bootstrapping case base development with annotated case summaries. In: Proceedings of the Second International Conference on Case-Based Reasoning, ICCBR 1999, pp. 59–73 (1999)

    Google Scholar 

  5. Bruninghaus, S., Ashley, K.: The role of information extraction for textual CBR. In: Proceedings of the 4th International Conference on Case-Based Reasoning, ICCBR 2001, pp. 74–89 (2001)

    Google Scholar 

  6. Cai, L., Hofmann, T.: Text categorisation by boosting automatically extracted concepts. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 182–189 (2003)

    Google Scholar 

  7. Chakraborti, S., Ambati, S., Balaraman, V., Khemani, D.: Integrating knowledge sources and acquiring vocabulary for textual CBR. In: Proceedings of the 8th UK-CBR workshop, pp. 74–84 (2003)

    Google Scholar 

  8. Das, S.: Filters, wrappers and a boosting based hybrid for feature selection. In: Proceedings of the 18th International Conference on Machine Learning, pp. 74–81. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  9. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning (1996)

    Google Scholar 

  10. Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 233–240 (1992)

    Google Scholar 

  11. Jarmulak, J., Craw, S., Rowe, R.: Genetic algorithms to optimise CBR retrieval. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS (LNAI), vol. 1898, pp. 136–147. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: IML 1994, pp. 121–129 (1994) Journal version in AIJ

    Google Scholar 

  13. Lenz, M.: Defining knowledge layers for textual case-based reasoning. In: Smyth, B., Cunningham, P. (eds.) EWCBR 1998. LNCS (LNAI), vol. 1488, p. 298. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Lenz, M.: Knowledge sources for textual CBR applications. In: Proceedings of the AAAI 1998 Workshop on Textual Case-Based Reasoning, pp. 24–29. AAAI Press, Menlo Park (1998)

    Chapter  Google Scholar 

  15. Mitchell, T.: Machine Learning. McGraw-Hill International, New York (1997)

    MATH  Google Scholar 

  16. Pazzani, M.J., Muramatsu, J., Billsus, D.: Syskill and Webert: Identifying interesting web sites. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, OR, pp. 54–61 (1996)

    Google Scholar 

  17. Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)

    Article  Google Scholar 

  18. Salton, G., McGill, M.J.: An introduction to modern information retrieval. McGraw-Hill, New York (1983)

    Google Scholar 

  19. Weber, R., Aha, D.W., Sandhu, N., Munoz-Avila, H.: A textual case-based reasoning framework for knowledge management applications. In: Proceedings of the 9th German Workshop on Case-Based Reasoning, Shaker Verlag (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wiratunga, N., Koychev, I., Massie, S. (2004). Feature Selection and Generalisation for Retrieval of Textual Cases. In: Funk, P., González Calero, P.A. (eds) Advances in Case-Based Reasoning. ECCBR 2004. Lecture Notes in Computer Science(), vol 3155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28631-8_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28631-8_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22882-0

  • Online ISBN: 978-3-540-28631-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics