SPIRE 2003: String Processing and Information Retrieval pp 197-210 | Cite as
Improving Text Retrieval in Medical Collections Through Automatic Categorization
Abstract
A current and important research issue is the retrieval of relevant medical information. In fact, while the medical knowledge expands at a rate never observed before, its diffusion is slow. One of the main reasons is the difficulty in locating the relevant information in the modern and large medical text collections of today. In this work, we introduce a framework, based on Bayesian networks, that allows combining information derived from the text of the medical documents with information on the diseases related to these documents (obtained from an automatic categorization method). This leads to a new ranking formula which we evaluate using a medical reference collection, the OHSUMED collection. Our results indicate that this combination of evidences might yield considerable gains in retrieval performance. When the queries are strongly related to diseases, these gains might be as high as 84%. This shows that information generated by an automatic categorization procedure can be used effectively to improve the quality of the answers provided by an information retrieval (IR) system specialized in the medical domain.
Keywords
Information Retrieval Bayesian Network Retrieval Performance Automatic Categorization Text RetrievalPreview
Unable to display preview. Download preview PDF.
References
- 1.Apte, C., Damerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)CrossRefGoogle Scholar
- 2.Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman, Harlow (1999)Google Scholar
- 3.Broglio, J., Callan, J.P., Croft, W.B., Nachbar, D.W.: Document retrieval and routing using the inquery system. In: Proceedings of the Third Text Retrieval Conference - TREC-3, National Institute of Standards and Technology, Gaithersburg, Maryland, USA, pp. 241–256 (1995) (NIST Special Publication 500-225)Google Scholar
- 4.Callan, J.: Document filtering with inference networks. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 262–269 (1996)Google Scholar
- 5.Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 307–315 (1996)Google Scholar
- 6.Haines, D., Croft, W.B.: Relevance feedback and inference networks. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp. 2–11 (1993)Google Scholar
- 7.Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 192–201 (1994)Google Scholar
- 8.Lam, W., Ruiz, M., Srinivasan, P.: Automatic Text Categorization and its Application to Text Retrieval. IEEE Transactions on Knowledge and Data Engineering 11(6), 865–879 (1999)CrossRefGoogle Scholar
- 9.Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Technical report, Center for Intelligent Information Retrieval at University of Massachusetts, Amherst, Massachusetts (1995)Google Scholar
- 10.Larkey, L.S., Croft, W.B.: Combining Classifiers in Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 289–297 (1996)Google Scholar
- 11.Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 298–306 (1996)Google Scholar
- 12.Lima, L.R.S., Laender, A.H.F., Ribeiro-Neto, B.: A Hierarchical Approach to the Automatic Categorization of Medical Documents. In: Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, pp. 132–139 (1998)Google Scholar
- 13.Pearl, J.: Probabilistic Reasoning in Intellingent System: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
- 14.Pestotnik, S.L.: Medical informatics: Meeting the information challenges of a changing health care system. Journal of Informed Pharmacotherapy 2(1) (2000)Google Scholar
- 15.Ribeiro-Neto, B., Laender, A.H.F., Lima, L.R.S.: An experimental study in automatically categorizing medical documents. Journal of the American Society for Information Science and Technology 52(5), 391–401 (2001)CrossRefGoogle Scholar
- 16.Ribeiro-Neto, B., Muntz, R.: A Belief Network Model for IR. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 253–260 (1996)Google Scholar
- 17.Ribeiro-Neto, B., Silva, I., Muntz, R.: Bayesian network models for information retrieval. In: Crestani, F., Pasi, G. (eds.) Soft Computing in Information Retrieval, pp. 259–291. Physica-Verlag, Heidelberg (2000)Google Scholar
- 18.Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
- 19.Satomura, Y., Amaral, M.B.: Automated diagnostic indexing by natural language processing. Medical Informatics 17(3), 149–163 (1992)CrossRefGoogle Scholar
- 20.Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E., Ziviani, N.: Link-based and Content-based Evidential Information in a Belief Network Model. In: Proceedings of the 23rd Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103, Athens, Greece (2000)Google Scholar
- 21.Turtle, H., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)CrossRefGoogle Scholar
- 22.Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 13–22 (1994)Google Scholar
- 23.Yang, Y., Chute, C.: An Application of Least Squares Fit Mapping to Text Information Retrieval. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 281–290 (1993)Google Scholar