Skip to main content

Relation-Based Document Retrieval for Biomedical Literature Databases

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

Abstract

In this paper, we explore the direct use of relations in information retrieval for precision-focused biomedical literature search. A relation is defined as a pair of two concepts which are semantically and syntactically related to each other. Unlike the traditional term-based IR models, our model represents a document by a set of controlled concepts and their binary relations. Since document level co-occurrence of two concepts, in many cases, does not mean this document really addresses their relationships, the direct use of relation may improve the precision of very specific search, e.g. searching documents that mention genes regulated by Smad4. For this purpose, we develop a generic ontology-based approach to extract concepts and their relations; a prototyped IR system supporting relation-based search is then built for Medline abstract search. We then use this novel IR system to improve the retrieval result of all official runs in TREC-2004 Genomics Track. The experiment shows promising performance of relation-based IR. The mean of P@100 (the precision of top 100 documents) for all 50 topics is raised from 26.37 %( the P@100 of the best run is 42.10%) to 53.69% while the recall is kept at an acceptable level of 44.31%. The experiment also demonstrates the expressiveness of relations for the representation of genomic information needs.

This research work is supported in part from the NSF Career grant (NSF IIS 0448023). NSF CCF 0514679 and the research grant from PA Dept of Health.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bai, J., Song, D., Bruza, P., Nie, J.Y., Cao, G.: Query Expansion Using Term Relationships in Language Models for Information Retrieval. In: In Proceedings of the ACM 14th Conference on Information and Knowledge Management (CIKM), Bremen, Germany (November 2005)

    Google Scholar 

  2. Cao, G., Nie, J.Y., Bai, J.: Integrating Word Relationships into Language Models. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 298–305 (2005)

    Google Scholar 

  3. Dimitrov, M., Bontcheva, K., Cunningham, H., Maynard, D.: A Light-weight Approach to Coreference Resolution for Named Entities in Text. In: Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon (2002)

    Google Scholar 

  4. Ding, J., Berleant, D., Xu, J., Fulmer, A.W.: Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser. In: The 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2003) (2003)

    Google Scholar 

  5. Gao, J., Nie, J.Y., Wu, G., Cao, G.: Dependency Language Model for Information Retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 170–177 (2004)

    Google Scholar 

  6. Hersh, W., et al.: TREC 2004 Genomics Track Overview. In: The thirteenth Text Retrieval Conference (2004)

    Google Scholar 

  7. Hu, X., Yoo, I., Song, I.Y., Song, M., Han, J., Lechner, M.: Extracting and Mining Protein-Protein Interaction Network from Biomedical Literature. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)

    Google Scholar 

  8. Jones, K.S.: Exhaustivity and specificity. Journal of Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  9. Kim, J.T., Moldovan, D.I.: Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction. IEEE Transactions on Knowledge and Data Engineering 7(5), 713–724 (1995)

    Article  Google Scholar 

  10. Lesk, M.: Automatic Sense Disambiguation: How to Tell a Pine Cone from and Ice Cream Cone. In: Proceedings of the SIGDOC 1986 Conference, ACM, New York (1986)

    Google Scholar 

  11. Miller, D., Leek, T., Schwartz, M.R.: A Hidden Markov Model Information Retrieval System. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)

    Google Scholar 

  12. Mooney, R.J., Bunescu, R.: Mining Knowledge from Text Using Information Extraction. In: SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), vol. 7(1), pp. 3–10 (2005)

    Google Scholar 

  13. Palakal, M., Stephens, M., Mukhopadhyay, S., Raje, R., Rhodes, S.: A multi-level text mining method to extract biological relationships. In: Proceedings of the IEEE Computer Society Bioinformatics Conference (CBS 2002), August 14-16, 2002, pp. 97–108 (2002)

    Google Scholar 

  14. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41, 288–297 (1990)

    Article  Google Scholar 

  15. Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, July 03-06, 1994, pp. 142–151 (1994)

    Google Scholar 

  16. Schenker, A., Last, M., Bunke, H., Kandel, A.: Clustering of Web Documents Using a Graph Model. In: Antonacopoulos, A., Hu, J. (eds.) Web Document Analysis: Challenges and Opportunities (2003)

    Google Scholar 

  17. Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a Conceptual Dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1314–1319 (1995)

    Google Scholar 

  18. Soderland, S.: Learning Information Extraction rules for Semi-structured and free text. Machine Learning 34, 233–272 (1998)

    Article  MATH  Google Scholar 

  19. Stokoe, C., Tait, J.I.: Towards a Sense Based Document Representation for Information Retrieval. In: Proceedings of the Twelfth Text REtrieval Conference (TREC), Gaithersburg M.D (2004)

    Google Scholar 

  20. van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in information retrieval. Journal of Documentation 33(2), 106–119 (1977)

    Article  Google Scholar 

  21. Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Converting Semi-structured Clinical Medical Records into Information and Knowledge. In: Proceeding of The International Workshop on Biomedical Data Engineering (BMDE) in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, April 5-8 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, X., Hu, X., Lin, X., Han, H., Zhang, X. (2006). Relation-Based Document Retrieval for Biomedical Literature Databases. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_48

Download citation

  • DOI: https://doi.org/10.1007/11733836_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33337-1

  • Online ISBN: 978-3-540-33338-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics