Skip to main content

Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain

  • Chapter
  • First Online:
Preference Learning

Abstract

Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to search MEDLINE. We propose to address these problems through learning ranking functions from user relevance feedback. Simple active learning techniques can be used to learn ranking functions using a fraction of the available data, with performance approaching that of functions learned using all available data. Furthermore, ranking functions learned using metadata features from the Medical Subject Heading (MeSH) terms associated with MEDLINE citations greatly outperform functions learned using textual features. An in-depth investigation is made into the effect of a number of variables in the ranking round, while further investigation is made into peripheral issues such as users providing inconsistent data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

References

  1. R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval (Addison-Wesley, 1999)

    Google Scholar 

  2. S. Blott, F. Camous, C. Gurrin, G.J.F. Jones, A.F. Smeaton, On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search, in Proceedings Conference on Information Research and Applications (CORIA ’05) (2005)

    Google Scholar 

  3. K. Brinker, Active learning of label ranking functions, in Proceedings of the International Conference on Machine Learning (ICML 2004) (2004)

    Google Scholar 

  4. K.A. Bronander, P.H. Goodman, T.F. Inman, T.L. Veach, Boolean search experience and abilities of medical students and practicing physicians. Teach. Learn. Med. 16(3), 284–289 (2004)

    Article  Google Scholar 

  5. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in Proceedings of the 22nd International Conference on Machine Learning (2005)

    Google Scholar 

  6. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, H.-W. Hon, Adapting ranking SVM to document retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’06) (2006)

    Google Scholar 

  7. N. Christianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, 2000)

    Google Scholar 

  8. G.A. Churchill, J. Peter, Research design effects on the reliability of rating scales: A meta-analysis. J. Mark. Res. 21(4), 360–375 (1984)

    Article  Google Scholar 

  9. W.W. Cohen, R.E. Schapire, Y. Singer, Learning to order things. J. Artif. Intell. Res. 10, 243–270 (1999)

    MathSciNet  MATH  Google Scholar 

  10. D. Cohn, L. Atlas, R. Ladner, Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)

    Google Scholar 

  11. N. Craswell, D. Hawking, Overview of the TREC 2004 web track, in Proceedings of the Text Retrieval Conference (TREC ’04) (2004)

    Google Scholar 

  12. H. Drucker, B. Shahrary, D.C. Gibbon, Support vector machines: Relevance feedback and information retrieval. Inf. Process. Manag. 38, 305–323 (2002)

    Article  MATH  Google Scholar 

  13. S. Ertekin, J. Huang, L. Bottou, C.L. Giles, Learning on the border: Active learning in imbalanced data classification, in Proceedings of the ACM Conference on Information and Knowledge Management (CIKM ’07) (2007)

    Google Scholar 

  14. Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (1997)

    Google Scholar 

  15. J. Fürnkranz, E. Hüllermeier, Pairwise preference learning and ranking, in Proceedings of the 14th European Conference on Machine Learning (ECML-03) (Springer, 2003), pp. 145–156

    Google Scholar 

  16. T. Goetz, C.-W. von der Lieth, PubFinder: A tool for improving retrieval rate of relevant pubmed abstracts. Nucleic Acids Res. 33, W774–W778 (2005). Web Server issue

    Google Scholar 

  17. S. Har-Peled, D. Roth, D. Zimak, Constraint classification: A new apporach to multiclass classification, in Algorithmic Learning Theory (Springer Berlin/Heidelberg, 2002)

    Google Scholar 

  18. R.B. Haynes, K.A. McKibbon, N.L. Wilczynski, S.D. Walter, S.R. Werre, Optimal search strategies for retrieving scientifically strong studies of treatment from MEDLINE: analytical survey. Br. Med. J. 330(7501), 1179 (2005)

    Google Scholar 

  19. W. Hersh, C. Buckley, T. Leone, D. Hickam, OHSUMED: An interactive retrieval evaluation and new large test collection for research, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’94) (1994)

    Google Scholar 

  20. J.R. Herskovic, E.V. Bernstam, Using incomplete citation data for MEDLINE results ranking, in Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA ’05) (2005), pp. 316–320

    Google Scholar 

  21. K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  22. T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’02) (2002), pp. 133–142

    Google Scholar 

  23. J. Lewis, S. Ossowski, J. Hicks, M. Errami, H.R. Garner, Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18), 2298–2304 (2006)

    Article  Google Scholar 

  24. Y. Lin, W. Li, K. Chen, Y. Liu, A document clustering and ranking system for exploring MEDLINE citations. J. Am. Med. Inf. Assn. 14(5), 651–661 (2007)

    Article  Google Scholar 

  25. T.-Y. Liu, J. Xu, T. Qin, W. Xiong, H. Li, Letor: Benchmark dataset for research on learning to rank for information retrieval, in Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR ’07) (2007)

    Google Scholar 

  26. M. Muin, P. Fontelo, Technical development of PubMed interact: an improved interface for MEDLINE/PubMed searches. BMC Bioinformatics 6(36), (2006)

    Google Scholar 

  27. National Library of Medicine. Introduction to MeSH. http://www.nlm.nih.gov/mesh/introduction.html (2009)

  28. National Library of Medicine. MEDLINE fact sheet. http://www.nlm.nih.gov/pubs/factsheets/medline.html (2009)

  29. M.V. Plikus, Z. Zhang, C.-M. Chuong, Pubfocus: Semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm. BMC Bioinformatics 7, 424 (2006)

    Article  Google Scholar 

  30. F. Radlinski, T. Joachims, Query chains: Learning to rank from implicit feedback, in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)

    Google Scholar 

  31. S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford, Okapi at TREC-3, in Proceedings of the 3rd Text Retrieval Conference (TREC-3) (1995)

    Google Scholar 

  32. B.P. Suomela, M.A. Andrade, Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics 6, 75 (2005)

    Article  Google Scholar 

  33. S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of the ACM International Conference on Multimedia (MM ’01) (2001)

    Google Scholar 

  34. G. You, S. Hwang, Personalized ranking: A contextual ranking approach, in Proceedings of the ACM Symposium on Applied Computing (SAC ’07) (2007)

    Google Scholar 

  35. H. Yu, SVM selective sampling for ranking with application to data retrieval, in Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD ’05) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Arens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Arens, R. (2010). Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain. In: Fürnkranz, J., Hüllermeier, E. (eds) Preference Learning. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14125-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14125-6_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14124-9

  • Online ISBN: 978-3-642-14125-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics