Relation-Based Document Retrieval for Biomedical IR

Zhou, Xiaohua; Hu, Xiaohua; Li, Guangren; Lin, Xia; Zhang, Xiaodan

doi:10.1007/11790105_9

Xiaohua Zhou²³,
Xiaohua Hu²³,
Guangren Li²⁴,
Xia Lin²³ &
…
Xiaodan Zhang²³

Part of the book series: Lecture Notes in Computer Science ((TCSB,volume 4070))

292 Accesses
1 Citations

Abstract

In this paper, we explore the use of term relations in information retrieval for precision-focused biomedical literature search. A relation is defined as a pair of two terms which are semantically and syntactically related to each other. Unlike the traditional “bag-of-word” model for documents, our model represents a document by a set of sense-disambiguated terms and their binary relations. Since document level co-occurrence of two terms, in many cases, does not mean this document addresses their relationships, the direct use of relation may improve the precision of very specific search, e.g. searching documents that mention genes regulated by Smad4. For this purpose, we develop a generic ontology-based approach to extract terms and their relations, and present a betweenness centrality based approach to rank retrieved documents. A prototyped IR system supporting relation-based search is then built for Medline abstract search. We use this novel IR system to improve the retrieval result of all official runs in TREC-2004 Genomics Track. The experiment shows promising performance of relation-based IR. The average P@100 (the precision of top 100 documents) for 50 topics is significantly raised from 26.37 %( the P@100 of the best run is 42.10%) to 53.69% while the MAP (mean average precision) is kept at an above-average level of 26.59%. The experiment also shows the expressiveness of relations for the representation of information needs, especially in the area of biomedical literature full of various biological relations.

This research work is supported in part from the NSF Career grant (NSF IIS 0448023). NSF CCF 0514679 and the research grant from PA Dept of Health.

An erratum to this chapter is available at http://dx.doi.org/10.1007/11790105_10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anthonisse, J.M.: The rush in a directed graph., Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam (1971)
Google Scholar
Brandes, U.: A faster algorithm for Betweenness centrality. Journal of Mathematical Sociology 25(2), 163–177 (2001)
Article MATH Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Dimitrov, M., Bontcheva, K., Cunningham, H., Maynard, D.: A Light-weight Approach to Coreference Resolution for Named Entities in Text. In: Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon 2000 (2002)
Google Scholar
Ding, J., Berleant, D., Xu, J., Fulmer, A.W.: Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser. In: 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2003) (2003)
Google Scholar
Freeman, L.C.: A set of measures of centrality based on Betweenness. Sociometry 40, 35–41 (1977)
Article Google Scholar
Hersh, W., et al.: TREC 2004 Genomics Track Overview. In: The thirteenth Text Retrieval Conference (2004)
Google Scholar
Hu, X., Yoo, I., Song, I.Y., Song, M., Han, J., Lechner, M.: Extracting and Mining Protein-Protein Interaction Network from Biomedical Literature. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)
Google Scholar
Jones, K.S.: Exhaustivity and specificity. Journal of Documentation 28, 11–21 (1972)
Article Google Scholar
Lesk, M.: Automatic Sense Disambiguation: How to Tell a Pine Cone from and Ice Cream Cone. In: Proceedings of the SIGDOC 1986 Conference. ACM Press, New York (1986)
Google Scholar
Mooney, R.J., Bunescu, R.: Mining Knowledge from Text Using Information Extraction., SIGKDD Explorations (special issue on Text Mining and Natural Language Processing) 7(1), pp. 3-10 (2005)
Google Scholar
Palakal, M., Stephens, M., Mukhopadhyay, S., Raje, R., Rhodes, S.: A multi-level text mining method to extract biological relationships. In: Proceedings of the IEEE Computer Society Bioinformatics Conference (CBS 2002), August 14-16, pp. 97–108 (2002)
Google Scholar
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and Development in Information Retrieval
Google Scholar
Salton, G., Wu, H., Yu, C.T.: The measurement of term importance in automatic indexing. Journal of the American Society for Information Science 32(3), 175–186 (1981)
Article MATH Google Scholar
Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, July 03-06, pp. 142–151 (1994)
Google Scholar
Schenker, A., Last, M., Bunke, H., Kandel, A.: Clustering of Web Documents Using a Graph Model. In: Antonacopoulos, A., Hu, J. (eds.) Web Document Analysis: Challenges and Opportunities (2003)
Google Scholar
Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a Conceptual Dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1314–1319 (1995)
Google Scholar
Soderland, S.: Learning Information Extraction rules for Semi-structured and free text. Machine Learning 34, 233–272 (1998)
Article Google Scholar
Stokoe, C., Tait, J.I.: Towards a Sense Based Document Representation for Information Retrieval. In: Proceedings of the Twelfth Text REtrieval Conference (TREC), Gaithersburg M.D (2004)
Google Scholar
Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Converting Semi-structured Clinical Medical Records into Information and Knowledge. In: Proceeding of The International Workshop on Biomedical Data Engineering (BMDE) in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan (April 5-8, 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science & Technology, Drexel University, 3141 Chestnut Street, Philadelphia, PA, 19104, USA
Xiaohua Zhou, Xiaohua Hu, Xia Lin & Xiaodan Zhang
Faculty of Economy, Hunan University, Changsha, China
Guangren Li

Authors

Xiaohua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Guangren Li
View author publications
You can also search for this author in PubMed Google Scholar
Xia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Microsoft Research - Centre for Computational and Systems Biology, University of Trento, Piazza Manci, 17, 38050, Povo (TN), Italy
Corrado Priami
College of Computer and Information Engineering, Hehan University, Henan, China
Xiaohua Hu
Georgia State University, Dept. of CS, 30302, Atlanta, GA, USA
Yi Pan
Department of Computer Science, San Jose State University, CA 95192, San Jose, USA
Tsau Young Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, X., Hu, X., Li, G., Lin, X., Zhang, X. (2006). Relation-Based Document Retrieval for Biomedical IR. In: Priami, C., Hu, X., Pan, Y., Lin, T.Y. (eds) Transactions on Computational Systems Biology V. Lecture Notes in Computer Science(), vol 4070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790105_9

Download citation

DOI: https://doi.org/10.1007/11790105_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36048-3
Online ISBN: 978-3-540-36049-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics