Accessing Biomedical Literature in the Current Information Landscape

  • Ritu Khare
  • Robert Leaman
  • Zhiyong LuEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1159)


Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy.

Key words

Biomedical literature search Text mining Information retrieval Bioinformatics Open access Relevance ranking Semantic search Text similarity search 



This research was supported by the Intramural Research Program at the National Institutes of Health, National Library of Medicine.


  1. 1.
    PubMed. US National Library of Medicine, National Institutes of Health.
  2. 2.
    Google Scholar. Google.
  3. 3.
    PubMed Central. US National Library of Medicine, National Institutes of Health.
  4. 4.
    Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond PubMed? Mol Cell 21(5):589–594. doi: 10.1016/j.molcel.2006.02.012 PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Islamaj Dogan R, Murray GC, Neveol A et al (2009) Understanding PubMed user search behavior through log analysis. Database 2009:bap018. doi: 10.1093/database/bap018 PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Garg AX, Iansavichus AV, Kastner M et al (2006) Lost in publication: half of all renal practice evidence is published in non-renal journals. Kidney Int 70(11):1995–2005. doi: 10.1038/ PubMedGoogle Scholar
  7. 7.
    Boyack KW, Newman D, Duhon RJ et al (2011) Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PloS One 6(3):e18029. doi: 10.1371/journal.pone.0018029 PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423. doi: 10.1186/1471-2105-8-423 PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Yiotis K (2005) The open access initiative: a New paradigm for scholarly communications. Inform Tech Libr 24(4):157–162Google Scholar
  10. 10.
    Wikipedia PubMed Central. Accessed 13 Jul 2013
  11. 11.
    Davis PM (2013) Public accessibility of biomedical articles from PubMed Central reduces journal readership: retrospective cohort analysis. FASEB J 27(7):2536–2541. doi: 10.1096/fj.13-229922 PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Grefsheim SF, Rankin JA (2007) Information needs and information seeking in a biomedical research setting: a study of scientists and science administrators. J Med Libr Assoc 95(4):426–434. doi: 10.3163/1536-5050.95.4.426 PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Hemminger BM, Lu D, Vaughan KTL et al (2007) Information seeking behavior of academic scientists. J Am Soc Inform Sci Tech 58(14):2205–2225CrossRefGoogle Scholar
  14. 14.
    Kim JJ, Rebholz-Schuhmann D (2008) Categorization of services for seeking information in biomedical literature: a typology for improvement of practice. Brief Bioinform 9(6):452–465. doi: 10.1093/bib/bbn032 PubMedCrossRefGoogle Scholar
  15. 15.
    PubMed Tutorial, Automatic Term Mapping. US. National Library of Medicine, National Institutes of Health.
  16. 16.
    Embase: biomedical database. Elsevier.
  17. 17.
    Roche A-M Embase: answers to your biomedical questions. Accessed 16 Jul 2013
  18. 18.
    Lu Z (2011) PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011:baq036. doi: 10.1093/database/baq036 PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Falagas ME, Giannopoulou KP, Issaris EA et al (2007) World databases of summaries of articles in the biomedical fields. Arch Intern Med 167(11):1204–1206. doi: 10.1001/archinte.167.11.1204 PubMedCrossRefGoogle Scholar
  20. 20.
    Hoskins IC, Norris WE, Taylor R (2008) Databases of biomedical literature: getting the whole picture. Arch Intern Med 168(1):113. doi: 10.1001/archinternmed.2007.26, author reply 113-114PubMedGoogle Scholar
  21. 21.
    Bakkalbasi N, Bauer K, Glover J et al (2006) Three options for citation tracking: Google Scholar, Scopus and Web of Science. Biomed Digit Libr 3:7. doi: 10.1186/1742-5581-3-7 PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Falagas ME, Pitsouni EI, Malietzis GA et al (2008) Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J 22(2):338–342. doi: 10.1096/fj.07-9492LSF PubMedCrossRefGoogle Scholar
  23. 23.
    Bar-Ilan J (2008) Which h-index?: A comparison of WoS, Scopus and Google Scholar. Scientometrics 74(2):257–271CrossRefGoogle Scholar
  24. 24.
    Web of science. Thomson Reuters.
  25. 25.
    Scopus: document search. Elsevier.
  26. 26.
    The Thomson Reuters journal selection process. Thomson Reuters.
  27. 27.
    Tuomilehto J, Lindstrom J, Eriksson JG et al (2001) Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med 344(18):1343–1350. doi: 10.1056/NEJM200105033441801 PubMedCrossRefGoogle Scholar
  28. 28.
  29. 29.
    SpringerLink. Springer.
  30. 30. | Search through over 11 million science, health, medical journal full text articles and books. Elsevier.
  31. 31.
  32. 32.
  33. 33.
    Lipman D (2012) The PubReader view: a new way to read articles in PMC. NLM Tech Bull 389:e7Google Scholar
  34. 34.
    Lu Z, Wilbur WJ, McEntyre JR et al (2009) Finding query suggestions for PubMed. AMIA Annu Symp Proc 2009:396–400PubMedCentralPubMedGoogle Scholar
  35. 35.
    Neveol A, Dogan RI, Lu Z (2010) Author keywords in biomedical journal articles. AMIA Annu Symp Proc 2010:537–541PubMedCentralPubMedGoogle Scholar
  36. 36.
    Islamaj Dogan R, Lu Z (2010) Click-words: learning to predict document keywords from a user perspective. Bioinformatics 26(21):2767–2775. doi: 10.1093/bioinformatics/btq459 PubMedCentralPubMedCrossRefGoogle Scholar
  37. 37.
    Lu Z, Kim W, Wilbur WJ (2008) Evaluating relevance ranking strategies for MEDLINE retrieval. AMIA Annu Symp Proc 439Google Scholar
  38. 38.
    Lu Z, Kim W, Wilbur WJ (2009) Evaluating relevance ranking strategies for MEDLINE retrieval. J Am Med Inform Assoc 16(1):32–36. doi: 10.1197/jamia.M2935 PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Errami M, Wren JD, Hicks JM et al (2007) eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acids Res 35(Web Server issue):W12–W15. doi: 10.1093/nar/gkm221 PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Fontaine JF, Barbosa-Silva A, Schaefer M et al (2009) MedlineRanker: flexible ranking of biomedical literature. Nucleic Acids Res 37(Web Server issue):W141–W146. doi: 10.1093/nar/gkp353 PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Ortuno FM, Rojas I, Andrade-Navarro MA et al (2013) Using cited references to improve the retrieval of related biomedical documents. BMC Bioinformatics 14:113. doi: 10.1186/1471-2105-14-113 PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Tbahriti I, Chichester C, Lisacek F et al (2006) Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library. Int J Med Inform 75(6):488–495. doi: 10.1016/j.ijmedinf.2005.06.007 PubMedCrossRefGoogle Scholar
  43. 43.
    Poulter GL, Rubin DL, Altman RB et al (2008) MScanner: a classifier for retrieving Medline citations. BMC Bioinformatics 9:108. doi: 10.1186/1471-2105-9-108 PubMedCentralPubMedCrossRefGoogle Scholar
  44. 44.
    Soldatos TG, O’Donoghue SI, Satagopam VP et al (2012) Caipirini: using gene sets to rank literature. BioData Min 5(1):1. doi: 10.1186/1756-0381-5-1 PubMedCentralPubMedCrossRefGoogle Scholar
  45. 45.
    Kim JJ, Pezik P, Rebholz-Schuhmann D (2008) MedEvi: retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 24(11):1410–1412. doi: 10.1093/bioinformatics/btn117 PubMedCentralPubMedCrossRefGoogle Scholar
  46. 46.
    Nobata C, Cotter P, Okazaki N et al. (2008) Kleio: a knowledge-enriched information retrieval system for biology. Paper presented at the 31st annual international ACM SIGIR conference on research and development in information retrievalGoogle Scholar
  47. 47.
    Torvik VI, Smalheiser NR (2009) Author name disambiguation in MEDLINE. ACM Trans Knowl Discov Data 3(3)Google Scholar
  48. 48.
    Ohta T, Miyao Y, Ninomiya T et al (2006) An intelligent search engine and GUI-based efficient MEDLINE search tool based on deep syntactic parsing. Paper presented at the COLING/ACL Interactive presentation sessions, Sydney, AustraliaGoogle Scholar
  49. 49.
    Douglas SM, Montelione GT, Gerstein M (2005) PubNet: a flexible system for visualizing literature derived networks. Genome Biol 6(9):R80. doi: 10.1186/gb-2005-6-9-r80 PubMedCentralPubMedCrossRefGoogle Scholar
  50. 50.
    Rebholz-Schuhmann D, Kirsch H, Arregui M et al (2007) EBIMed: text crunching to gather facts for proteins from Medline. Bioinformatics 23(2):e237–e244. doi: 10.1093/bioinformatics/btl302 PubMedCrossRefGoogle Scholar
  51. 51.
    Giglia E (2011) Quertle and KNALIJ: searching PubMed has never been so easy and effective. Eur J Phys Rehabil Med 47(4):687–690PubMedGoogle Scholar
  52. 52.
    Coppernoll-Blach P (2011) Quertle: the conceptual relationships alternative search engine for PubMed. J Med Libr Assoc 99(2):U159–U176. doi: 10.3163/1536-5050.99.2.017 CrossRefGoogle Scholar
  53. 53.
    Wei CH, Kao HY, Lu Z (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 41(Web Server issue):W518–W522. doi: 10.1093/nar/gkt441 PubMedCentralPubMedCrossRefGoogle Scholar
  54. 54.
    Wei CH, Kao HY, Lu Z (2012) PubTator: A PubMed-like interactive curation system for document triage and literature curation. Paper presented at the BioCreative Workshop 2012, Washington DCGoogle Scholar
  55. 55.
    Arighi CN, Carterette B, Cohen KB et al (2013) An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database 2013:bas056. doi: 10.1093/database/bas056 PubMedCentralPubMedCrossRefGoogle Scholar
  56. 56.
    Arighi CN, Roberts PM, Agarwal S et al (2011) BioCreative III interactive task: an overview. BMC Bioinformatics 12(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4 PubMedCentralPubMedCrossRefGoogle Scholar
  57. 57.
    Lu Z, Hirschman L (2012) Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II. Database 2012:43. doi: 10.1093/database/bas043 Google Scholar
  58. 58.
    Neveol A, Wilbur WJ, Lu Z (2012) Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database 2012:bas026. doi: 10.1093/database/bas026 PubMedCentralPubMedCrossRefGoogle Scholar
  59. 59.
    Lu Z, Kao HY, Wei CH et al (2011) The gene normalization task in BioCreative III. BMC Bioinformatics 12(Suppl 8):S2. doi: 10.1186/1471-2105-12-S8-S2 PubMedCentralPubMedCrossRefGoogle Scholar
  60. 60.
    Van Landeghem S, Bjorne J, Wei CH et al (2013) Large-scale event extraction from literature with multi-level gene normalization. PloS One 8(4):e55814. doi: 10.1371/journal.pone.0055814 PubMedCentralPubMedCrossRefGoogle Scholar
  61. 61.
    Wei CH, Kao HY, Lu Z (2012) SR4GN: a species recognition software tool for gene normalization. PloS One 7(6):e38460. doi: 10.1371/journal.pone.0038460 PubMedCentralPubMedCrossRefGoogle Scholar
  62. 62.
    Wei CH, Harris BR, Kao HY et al (2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29(11):1433–1439. doi: 10.1093/bioinformatics/btt156 PubMedCrossRefGoogle Scholar
  63. 63.
    Leaman R, Dogan RI, Lu Z (2013) DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29:2909–2917PubMedCentralPubMedCrossRefGoogle Scholar
  64. 64.
    Leaman R, Khare R, Lu Z (2013) NCBI at 2013 ShARe/CLEF eHealth shared task: disorder normalization in clinical notes with DNorm. Conference and Labs of the Evaluation Forum 2013 Working NotesGoogle Scholar
  65. 65.
    Ding J, Hughes LM, Berleant D et al (2006) PubMed assistant: a biologist-friendly interface for enhanced PubMed search. Bioinformatics 22(3):378–380. doi: 10.1093/bioinformatics/bti821 PubMedCrossRefGoogle Scholar
  66. 66.
    Schardt C, Adams MB, Owens T et al (2007) Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak 7:16. doi: 10.1186/1472-6947-7-16 PubMedCentralPubMedCrossRefGoogle Scholar
  67. 67.
    Richardson WS, Wilson MC, Nishikawa J et al (1995) The well-built clinical question: a key to evidence-based decisions. ACP J Club 123(3):A12–A13PubMedGoogle Scholar
  68. 68.
    Armstrong EC (1999) The well-built clinical question: the key to finding the best evidence efficiently. WMJ 98(2):25–28PubMedGoogle Scholar
  69. 69.
    Plikus MV, Zhang Z, Chuong CM (2006) PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm. BMC Bioinformatics 7:424. doi: 10.1186/1471-2105-7-424 PubMedCentralPubMedCrossRefGoogle Scholar
  70. 70.
    Bernstam EV, Herskovic JR, Aphinyanaphongs Y et al (2006) Using citation data to improve retrieval from MEDLINE. J Am Med Inform Assoc 13(1):96–105. doi: 10.1197/jamia.M1909 PubMedCentralPubMedCrossRefGoogle Scholar
  71. 71.
    Tanaka LY, Herskovic JR, Iyengar MS et al (2009) Sequential result refinement for searching the biomedical literature. J Biomed Inform 42(4):678–684. doi: 10.1016/j.jbi.2009.02.009 PubMedCentralPubMedCrossRefGoogle Scholar
  72. 72.
    Lin J (2008) PageRank without hyperlinks: reranking with PubMed related article networks for biomedical text retrieval. BMC Bioinformatics 9:270. doi: 10.1186/1471-2105-9-270 PubMedCentralPubMedCrossRefGoogle Scholar
  73. 73.
    Yeganova L, Comeau DC, Kim W et al (2009) How to interpret PubMed queries and Why it matters. J Am Soc Inf Sci Technol 60(2):264–274. doi: 10.1002/Asi.20979 CrossRefGoogle Scholar
  74. 74.
    Yu H, Kim T, Oh J et al (2010) Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS. BMC Bioinformatics 11(Suppl 2):S6. doi: 10.1186/1471-2105-11-S2-S6 PubMedCentralPubMedCrossRefGoogle Scholar
  75. 75.
    States DJ, Ade AS, Wright ZC et al (2009) MiSearch adaptive PubMed search tool. Bioinformatics 25(7):974–976. doi: 10.1093/bioinformatics/btn033 PubMedCentralPubMedCrossRefGoogle Scholar
  76. 76.
    Smalheiser NR, Zhou W, Torvik VI (2008) Anne O’Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results. J Biomed Discov Collab 3:2. doi: 10.1186/1747-5333-3-2 PubMedCentralPubMedCrossRefGoogle Scholar
  77. 77.
    Yamamoto Y, Takagi T (2007) Biomedical knowledge navigation by literature clustering. J Biomed Inform 40(2):114–130. doi: 10.1016/j.jbi.2006.07.004 PubMedCrossRefGoogle Scholar
  78. 78.
    Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25(1):25–29. doi: 10.1038/75556 PubMedCentralPubMedCrossRefGoogle Scholar
  79. 79.
    Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33(Web Server issue):W783–W786. doi: 10.1093/nar/gki470 PubMedCentralPubMedCrossRefGoogle Scholar
  80. 80.
    Perez-Iratxeta C, Bork P, Andrade MA (2001) XplorMed: a tool for exploring MEDLINE abstracts. Trends Biochem Sci 26(9):573–575PubMedCrossRefGoogle Scholar
  81. 81.
    Perez-Iratxeta C, Perez AJ, Bork P et al (2003) Update on XplorMed: a web server for exploring scientific literature. Nucleic Acids Res 31(13):3866–3868PubMedCentralPubMedCrossRefGoogle Scholar
  82. 82.
    Lee EK, Lee HR, Quarshie A (2011) SEACOIN: an investigative tool for biomedical informatics researchers. AMIA Annu Symp Proc 2011:750–759PubMedCentralPubMedGoogle Scholar
  83. 83.
    Mu X, Ryu H, Lu K (2011) Supporting effective health and biomedical information retrieval and navigation: a novel facet view interface evaluation. J Biomed Inform 44(4):576–586. doi: 10.1016/j.jbi.2011.01.008 PubMedCrossRefGoogle Scholar
  84. 84.
    Liu F, Yu C, Meng W (2004) Personalized web search for improving retrieval effectiveness. IEEE Trans Knowl Data Eng 16(1):28–40CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.National Center for Biotechnology InformationU.S. National Library of Medicine, NIHBethesdaUSA
  2. 2.National Center for Biotechnology InformationU.S. National Library of Medicine, NIHBethesdaUSA
  3. 3.National Center for Biotechnology InformationU.S. National Library of Medicine, NIHBethesdaUSA

Personalised recommendations