Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain

Arens, Robert

doi:10.1007/978-3-642-14125-6_17

Robert Arens³

2426 Accesses
2 Citations

Abstract

Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to search MEDLINE. We propose to address these problems through learning ranking functions from user relevance feedback. Simple active learning techniques can be used to learn ranking functions using a fraction of the available data, with performance approaching that of functions learned using all available data. Furthermore, ranking functions learned using metadata features from the Medical Subject Heading (MeSH) terms associated with MEDLINE citations greatly outperform functions learned using textual features. An in-depth investigation is made into the effect of a number of variables in the ranking round, while further investigation is made into peripheral issues such as users providing inconsistent data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

References

R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval (Addison-Wesley, 1999)
Google Scholar
S. Blott, F. Camous, C. Gurrin, G.J.F. Jones, A.F. Smeaton, On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search, in Proceedings Conference on Information Research and Applications (CORIA ’05) (2005)
Google Scholar
K. Brinker, Active learning of label ranking functions, in Proceedings of the International Conference on Machine Learning (ICML 2004) (2004)
Google Scholar
K.A. Bronander, P.H. Goodman, T.F. Inman, T.L. Veach, Boolean search experience and abilities of medical students and practicing physicians. Teach. Learn. Med. 16(3), 284–289 (2004)
Article Google Scholar
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning (2005)
Google Scholar
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, H.-W. Hon, Adapting ranking SVM to document retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’06) (2006)
Google Scholar
N. Christianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, 2000)
Google Scholar
G.A. Churchill, J. Peter, Research design effects on the reliability of rating scales: A meta-analysis. J. Mark. Res. 21(4), 360–375 (1984)
Article Google Scholar
W.W. Cohen, R.E. Schapire, Y. Singer, Learning to order things. J. Artif. Intell. Res. 10, 243–270 (1999)
MathSciNet MATH Google Scholar
D. Cohn, L. Atlas, R. Ladner, Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)
Google Scholar
N. Craswell, D. Hawking, Overview of the TREC 2004 web track, in Proceedings of the Text Retrieval Conference (TREC ’04) (2004)
Google Scholar
H. Drucker, B. Shahrary, D.C. Gibbon, Support vector machines: Relevance feedback and information retrieval. Inf. Process. Manag. 38, 305–323 (2002)
Article MATH Google Scholar
S. Ertekin, J. Huang, L. Bottou, C.L. Giles, Learning on the border: Active learning in imbalanced data classification, in Proceedings of the ACM Conference on Information and Knowledge Management (CIKM ’07) (2007)
Google Scholar
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (1997)
Google Scholar
J. Fürnkranz, E. Hüllermeier, Pairwise preference learning and ranking, in Proceedings of the 14th European Conference on Machine Learning (ECML-03) (Springer, 2003), pp. 145–156
Google Scholar
T. Goetz, C.-W. von der Lieth, PubFinder: A tool for improving retrieval rate of relevant pubmed abstracts. Nucleic Acids Res. 33, W774–W778 (2005). Web Server issue
Google Scholar
S. Har-Peled, D. Roth, D. Zimak, Constraint classification: A new apporach to multiclass classification, in Algorithmic Learning Theory (Springer Berlin/Heidelberg, 2002)
Google Scholar
R.B. Haynes, K.A. McKibbon, N.L. Wilczynski, S.D. Walter, S.R. Werre, Optimal search strategies for retrieving scientifically strong studies of treatment from MEDLINE: analytical survey. Br. Med. J. 330(7501), 1179 (2005)
Google Scholar
W. Hersh, C. Buckley, T. Leone, D. Hickam, OHSUMED: An interactive retrieval evaluation and new large test collection for research, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’94) (1994)
Google Scholar
J.R. Herskovic, E.V. Bernstam, Using incomplete citation data for MEDLINE results ranking, in Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA ’05) (2005), pp. 316–320
Google Scholar
K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’02) (2002), pp. 133–142
Google Scholar
J. Lewis, S. Ossowski, J. Hicks, M. Errami, H.R. Garner, Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18), 2298–2304 (2006)
Article Google Scholar
Y. Lin, W. Li, K. Chen, Y. Liu, A document clustering and ranking system for exploring MEDLINE citations. J. Am. Med. Inf. Assn. 14(5), 651–661 (2007)
Article Google Scholar
T.-Y. Liu, J. Xu, T. Qin, W. Xiong, H. Li, Letor: Benchmark dataset for research on learning to rank for information retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’07) (2007)
Google Scholar
M. Muin, P. Fontelo, Technical development of PubMed interact: an improved interface for MEDLINE/PubMed searches. BMC Bioinformatics 6(36), (2006)
Google Scholar
National Library of Medicine. Introduction to MeSH. http://www.nlm.nih.gov/mesh/introduction.html (2009)
National Library of Medicine. MEDLINE fact sheet. http://www.nlm.nih.gov/pubs/factsheets/medline.html (2009)
M.V. Plikus, Z. Zhang, C.-M. Chuong, Pubfocus: Semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm. BMC Bioinformatics 7, 424 (2006)
Article Google Scholar
F. Radlinski, T. Joachims, Query chains: Learning to rank from implicit feedback, in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)
Google Scholar
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford, Okapi at TREC-3, in Proceedings of the 3rd Text Retrieval Conference (TREC-3) (1995)
Google Scholar
B.P. Suomela, M.A. Andrade, Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics 6, 75 (2005)
Article Google Scholar
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of the ACM International Conference on Multimedia (MM ’01) (2001)
Google Scholar
G. You, S. Hwang, Personalized ranking: A contextual ranking approach, in Proceedings of the ACM Symposium on Applied Computing (SAC ’07) (2007)
Google Scholar
H. Yu, SVM selective sampling for ranking with application to data retrieval, in Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Iowa, Iowa City, IA, 52242, USA
Robert Arens

Authors

Robert Arens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Arens .

Editor information

Editors and Affiliations

FB Informatik, TU Darmstadt, Hochschulstr. 10, Darmstadt, 64289, Germany
Johannes Fürnkranz
FB Mathematik und Informatik, Philipps-Universität Marburg, Hans-Meerwein-Str., Marburg, 35032, Germany
Eyke Hüllermeier

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Arens, R. (2010). Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain. In: Fürnkranz, J., Hüllermeier, E. (eds) Preference Learning. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14125-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-14125-6_17
Published: 03 September 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14124-9
Online ISBN: 978-3-642-14125-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics