Skip to main content

Biomedical Text Mining: A Survey of Recent Progress

  • Chapter
  • First Online:
Mining Text Data

Abstract

The biomedical community makes extensive use of text mining technology. In the past several years, enormous progress has been made in developing tools and methods, and the community has been witness to some exciting developments. Although the state of the community is regularly reviewed, the sheer volume of work related to biomedical text mining and the rapid pace in which progress continues to be made make this a worthwhile, if not necessary, endeavor. This chapter provides a brief overview of the current state of text mining in the biomedical domain. Emphasis is placed on the resources and tools available to biomedical researchers and practitioners, as well as the major text mining tasks of interest to the community. These tasks include the recognition of explicit facts from biomedical literature, the discovery of previously unknown or implicit facts, document summarization, and question answering. For each topic, its basic challenges and methods are outlined and recent and influential work is reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. B. Abacha and P. Zweigenbaum. A hybrid approach for the extraction of semantic relations from MEDLINE abstracts. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 6609 of Lecture Notes in Computer Science, pages 139150. Springer Berlin / Heidelberg, 2011.

    Google Scholar 

  2. A. B. Abacha and P. Zweigenbaum. Medical entity recognition: A comparison of semantic and statistical methods. In Proceedings of BioNLP 2011 Workshop, pages 5664, 2011.

    Google Scholar 

  3. S. Afantenos, V. Karkaletsis, and P. Stamatopoulos. Summarization from medical documents: A survey. Artificial Intelligence in Medicine, 33(2):157177, 2005.

    Article  Google Scholar 

  4. S. Agarwal and H. Yu. Automatically classifying sentences in fulltext biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23):31743180, 2009.

    Article  Google Scholar 

  5. S. Agarwal and H. Yu. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, pages 610, 2009.

    Google Scholar 

  6. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307328. American Association for Artificial Intelligence, 1996.

    Google Scholar 

  7. A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter, and T. Salakoski. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics, 9(Suppl 11):S2, 2008.

    Google Scholar 

  8. B. Alex, B. Haddow, and C. Grover. Recognising nested named entities in biomedical text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pages 6572, 2007.

    Google Scholar 

  9. R. B. Altman, C. M. Bergman, J. Blake, C. Blaschke, A. Cohen, F. Gannon, L. Grivell, U. Hahn, W. Hersh, L. Hirschman, L. J. Jensen, M. Krallinger, B. Mons, S. I. ODonoghue, M. C. Peitsch, D. Rebholz-Schuhmann, H. Shatkay, and A. Valencia. Text mining for biology - the way forward: opinions from leading scientists. Genome Biology, 9(Suppl 2):S7, 2008.

    Google Scholar 

  10. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403410, 1990.

    Google Scholar 

  11. S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17):33893402, 1997.

    Article  Google Scholar 

  12. S. Ananiadou and J. Mcnaught. Text Mining for Biology And Biomedicine. Artech House, Inc., 2005.

    Google Scholar 

  13. S. Ananiadou, S. Pyysalo, J. Tsujii, and D. B. Kell. Event extraction for systems biology by text mining the literature. Trends in Biotechnology, 28(7):381390, 2010.

    Article  Google Scholar 

  14. A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229236, 2010.

    Google Scholar 

  15. R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555596, 2008.

    Article  Google Scholar 

  16. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cheryy, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1):2529, 2000.

    Google Scholar 

  17. S. J. Athenikos and H. Han. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine, 99(1):124, 2010.

    Article  Google Scholar 

  18. B. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, and J. H. Holmes. Identifying potential adverse effects using the web: A new approach to medical hypothesis generation. In Press, 2011.

    Google Scholar 

  19. BioNLP. http://www.bionlp.org/.

    Google Scholar 

  20. J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, and T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics, 26(12):i382i390, 2010.

    Article  Google Scholar 

  21. J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski. Extracting complex biological events with rich graphbased feature sets. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 1018, 2009.

    Google Scholar 

  22. K. W. Boyack, D. Newman, R. J. Duhon, R. Klavans, M. Patek, J. R. Biberstine, B. Schijvenaars, A. Skupin, N. Ma, and K. Borner. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3):e18029, 2011.

    Google Scholar 

  23. M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H.-P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 9(1):207, 2008.

    Google Scholar 

  24. E. Buyko, E. Faessler, J. Wermter, and U. Hahn. Event extraction from trimmed dependency graphs. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 1927, 2009.

    Google Scholar 

  25. Y. Cai and X. Cheng. Biomedical named entity recognition with tri-training learning. In Proceedings of the 2009 2nd International Conference on Biomedical Engineering and Informatics, pages 15, 2009.

    Google Scholar 

  26. CALBC challenge. http://www.calbc.eu/.

    Google Scholar 

  27. Y. Cao, F. Liu, P. Simpson, L. Antieau, A. Bennett, J. J. Cimino, J. Ely, and H. Yu. AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2):277288, 2011.

    Article  Google Scholar 

  28. D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang. Protemot: Prediction of protein binding sites with automatically extracted geometrical templates. Nucleic Acids Research, 34(suppl 2):W303W309, 2006.

    Article  Google Scholar 

  29. W. W. Chapman and K. B. Cohen. Current issues in biomedical text mining and natural language processing. Journal of Biomedical Informatics, 42(5):757759, 2009.

    Article  Google Scholar 

  30. E. S. Chen, G. Hripcsak, H. Xu, M. Markatou, and C. Friedman. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study. Journal of the American Medical Informatics Association, 15(1):8798, 2008.

    Article  Google Scholar 

  31. H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In Pacific Symposium on Biocomputing, pages 415, 2006.

    Google Scholar 

  32. A. M. Cohen andW. R. Hersh. A survey of current work in biomedical text mining. Briefings in Bioinformatics, 6(1):5771, 2005.

    Google Scholar 

  33. K. B. Cohen and L. Hunter. Getting started in text mining. PLoS Computational Biology, 4(1):e20, 2008.

    Google Scholar 

  34. K. B. Cohen, K. Verspoor, H. L. Johnson, C. Roeder, P. V. Ogren, W. A. Baumgartner, Jr., E. White, H. Tipney, and L. Hunter. High-precision biological event extraction with a concept recognizer. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 5058, 2009.

    Google Scholar 

  35. T. Cohen, G. K. Whitfield, R. W. Schvaneveldt, K. Mukund, and T. Rindflesch. EpiphaNet: An interactive tool to support biomedical discoveries. Journal of Biomedical Discovery and Collaboration, 5:2149, 2010.

    Google Scholar 

  36. N. Collier, C. Nobata, and J.-i. Tsujii. Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1, pages 201207, 2000.

    Google Scholar 

  37. P. Corbett and A. Copestake. Cascaded classifiers for confidencebased chemical named entity recognition. BMC Bioinformatics, 9(Suppl 11):S4, 2008.

    Google Scholar 

  38. CRAFT: The colorado richly annotated full text corpus. http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

    Google Scholar 

  39. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Daml janovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Version 6). GATE, 2011.

    Google Scholar 

  40. T. Delbecque, P. Jacquemart, and P. Zweigenbaum. Indexing UMLS semantic types for medical question-answering. In R. Engelbrecht, A. Geissbuhler, C. Lovis, and G. Mihalas, editors, Connecting Medical Informatics and Bio-Informatics: Proceedings of MIE2005 - The XIXth International Congress of the European Federation for Medical Informatics, pages 805810. IOS Press, 2005.

    Google Scholar 

  41. D. Demner-Fushman, W. W. Chapman, and C. J. McDonald. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5):760772, 2009.

    Article  Google Scholar 

  42. D. Demner-Fushman, B. Few, S. E. Hauser, and G. Thoma. Automatically identifying health outcome information in MEDLINE records. Journal of the American Medical Informatics Association, 13(1):5260, 2006.

    Article  Google Scholar 

  43. D. Demner-Fushman and J. Lin. Knowledge exraction for clinical question answering: Preliminary results. In Proceedings of the AAAI 2005 Workshop on Question Ansering in Restricted Domains, 2005.

    Google Scholar 

  44. D. Demner-Fushman and J. Lin. Answer extraction, semantic clustering, and extractive summarization for clinical question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 841848, 2006.

    Google Scholar 

  45. D. Demner-Fushman and J. Lin. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63103, 2007.

    Article  Google Scholar 

  46. D. Demner-Fushman, C. Seckman, C. Fisher, S. E. Hauser, J. Clayton, and G. R.1. Thoma. A prototype system to support evidencebased practice. In AMIA Annual Symposium Proceedings, pages 151155, 2008.

    Google Scholar 

  47. S. Dipper, M. Götze, and M. Stede. Simple annotation tools for complex annotation tasks: An evaluation. In Proceedings of the LREC Workshop on XML-Based Richly Annotated Corpora, pages 5462, 2004.

    Google Scholar 

  48. eHOST: The extensible human oracle suite of tools. http://code.google.com/p/ehost/.

    Google Scholar 

  49. N. Elhadad, M.-Y. Kan, J. L. Klavans, and K. R. McKeown. Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2):179198, 2005.

    Article  Google Scholar 

  50. J. W. Ely, J. A. Osheroff, M. H. Ebell, M. L. Chambliss, D. C. Vinson, J. J. Stevermer, and E. A. Pifer. Obstacles to answering doctors questions about patient care with evidence: qualitative study. British Medical Journal, 324(7339):710, 2002.

    Google Scholar 

  51. Electronic medical records and genomics. https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page.

    Google Scholar 

  52. European bioinformatics institute. http://www.ebi.ac.uk/.

    Google Scholar 

  53. D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty. Building Watson: An overview of the DeepQA project. AI Magazine, 31(3):5979, 2010.

    Google Scholar 

  54. D. Ferrucci and A. Lally. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327348, 2004.

    Article  Google Scholar 

  55. J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair. Exploiting context for biomedical entity recognition: From syntax to the web. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 8891, 2004.

    Google Scholar 

  56. M. Fiszman, D. Demner-Fushman, H. Kilicoglu, and T. C. Rindflesch. Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Journal of Biomedical Informatics, 42(5):801813, 2009.

    Article  Google Scholar 

  57. K. Franzén, G. Eriksson, F. Olsson, L. Asker, P. Lidén, and J. Cöster. Protein names and how to find them. International Journal of Medical Informatics, 67(1-3):4961, 2002.

    Article  Google Scholar 

  58. C. Friedman, G. Hripcsak, L. Shagina, and H. Liu. Arepresenting information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association, 6:7687, 1999.

    Article  Google Scholar 

  59. K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi. Toward information extraction: Identifying protein names from biological papers. In Pacific Symposium on Biocomputing, pages 707718, 1998.

    Google Scholar 

  60. K. Fundel, R. Küffner, and R. Zimmer. RelExrelation extraction using dependency parse trees. Bioinformatics, 23(3):365371, 2007.

    Article  Google Scholar 

  61. R. Gaizauskas, G. Demetriou, P. J. Artymiuk, and P. Willett. Protein structures and information extraction from biological texts: The PASTA system. Bioinformatics, 19(1):135143, 2003.

    Article  Google Scholar 

  62. B. Gu. Recognizing nested named entities in GENIA corpus. In Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis, pages 112113, 2006.

    Google Scholar 

  63. J. Hakenberg, S. Bickel, C. Plake, U. Brefeld, H. Zahn, L. Faulstich, U. Leser, and T. Scheffer. Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(Suppl 1):S9, 2005.

    Google Scholar 

  64. J. Hakenberg, C. Plake, and U. Leser. LLL05 challenge: Genic interaction extraction - identification of language patterns based on alignment and finite state automata. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 3845, 2005.

    Google Scholar 

  65. W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Health Informatics. Springer, third edition, 2005.

    Google Scholar 

  66. HighWire press. http://highwire.org/.

    Google Scholar 

  67. L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh. Overview of BioCreAtIvE task 1B: Normalized gene lists. BMC Bioinformatics, 6(Suppl 1):S11, 2005.

    Google Scholar 

  68. L. Hirschman, A. A. Morgan, and A. S. Yeh. Rutabaga by any other name: Extracting biological names. Journal of Biomedical Informatics, 35(4):247259, 2002.

    Article  Google Scholar 

  69. L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia. Overview of BioCreAtIvE: Critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1):S1, 2005.

    Google Scholar 

  70. W.-J. Hou and H.-H. Chen. Enhancing performance of protein name recognizers using collocation. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 2532, 2003.

    Google Scholar 

  71. D. Hristovski, C. Friedman, T. C. Rindflesch, and B. Peterlin. Exploiting semantic relations for literature-based discovery. In AMIA Anual Symposium Proceedings, pages 349353, 2006.

    Google Scholar 

  72. D. Hristovski, B. Peterlin, S. Džeroski, and J. Stare. Literaturebased discovery support system and its application to disease gene identification. In S. Džeroski and L. Todorovski, editors, Computational Discovery of Scientific Knowledge, volume 4660 of Lecture Notes in Computer Science, pages 307326. Springer Berlin / Heidelberg, 2007.

    Google Scholar 

  73. D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Improving literature-based discovery support by genetic knowledge integration. Studies in Health Technogy and Informatics, 95:6873, 2003.

    Google Scholar 

  74. D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2-4):289298, 2005.

    Article  Google Scholar 

  75. D. Hristovski, J. Stare, B. Peterlin, and S. Džeroski. Supporting discovery in medicine by association rule mining in MEDLINE and UMLS. In V. L. Patel, R. Rogers, and R. Haux, editors, Proceedings of the 10th World Congress on Medical Informatics, volume 84/2001 of Studies in Health Technology and Informatics, pages 13441348. IOS Press, 2001.

    Google Scholar 

  76. X. Hu, X. Zhang, I. Yoo, X. Wang, and J. Feng. Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule. International Journal of Intelligent Systems, 25(2):207223, 2010.

    Google Scholar 

  77. X. Huang, J. Lin, and D. Demner-Fushman. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA Annual Symposium Proceedings, pages 359363, 2006.

    Google Scholar 

  78. K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science yournal articles: Enzyme interactions and protein structures. In Pacific Symposium on Biocomputing, pages 502513, 2000.

    Google Scholar 

  79. L. Hunter, Z. Lu, J. Firby, W. Baumgartner, H. Johnson, P. Ogren, and K. B. Cohen. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-typespecific gene expression. BMC Bioinformatics, 9(1):78, 2008.

    Google Scholar 

  80. Informatics for integrating biology and the bedside. https://www.i2b2.org/resrcs/hive.html.

    Google Scholar 

  81. P. Jacqumart and P. Zweigenbaum. Towards a medical questionanswering system: A feasibility study. Studies in Health Technology and Informatics, 95:463468, 2003.

    Google Scholar 

  82. R. Jelier, G. Jenster, L. Dorssers, B. Wouters, P. Hendriksen, B. Mons, R. Delwel, and J. Kors. Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC Bioinformatics, 8(1):14, 2007.

    Google Scholar 

  83. R. Kabiljo, A. B. Clegg, and A. J. Shepherd. A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics, 10:233, 2008.

    Article  Google Scholar 

  84. J. Kalpathy-Cramer, H. Müler, S. Bedrick, I. Eggel, A. de Herrera, and T. Tsikrika. The CLEF 2011 medical image retrieval and classification tasks. In CLEF 2011 Working Notes, 2011.

    Google Scholar 

  85. H. Karsten and H. Suominen. Mining of clinical and biomedical text and data. International Journal of Medical Informatics, 78(12):786787, 2009.

    Article  Google Scholar 

  86. J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. Tuning support vector machines for biomedical named entity recognition. In Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, pages 18, 2002.

    Google Scholar 

  87. H. Kilicoglu and S. Bergler. Syntactic dependency based heuristics for biological event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 119127, 2009.

    Google Scholar 

  88. J.-D. Kim, T. Ohta, N. Nguyen, S. Pyysalo, R. Bossy, and J. Tsujii. Overview of BioNLP shared task 2011. In Proceedings of the BioNLP Shared Task 2011 Workshop, pages 16, 2011.

    Google Scholar 

  89. J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. Overview of BioNLP09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 19, 2009.

    Google Scholar 

  90. J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. GENIA corpusa semantically annotated corpus for bio-textmining. Bioinformatics, 19(Suppl 1):i180i182, 2003.

    Article  Google Scholar 

  91. J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier. Introduction to the bio-entity recognition task at JNLPBA. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 7075, 2004.

    Google Scholar 

  92. S. Kim, J. Yoon, and J. Yang. Kernel approaches for genic interaction extraction. Bioinformatics, 24(1):118126, 2008. [93] S. Kinoshita, K. B. Cohen, P. Ogren, and L. Hunter. BioCreAtIvE task 1A: Entity identification with a stochastic tagger. BMC Bioinformatics, 6(Suppl 1):S4, 2005.

    Google Scholar 

  93. J. Kontos, J. Lekakis, I. Malagardi, and J. Peros. Grammars for question answering systems based on intelligent text mining in biomedicine. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005. [95] J. Kontos, I. Malagardi, and J. Peros. Question answering and rhetoric analysis of biomedical texts in the AROMA system. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005.

    Google Scholar 

  94. M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia. Overview of the protein-protein interaction annotation extraction task of BioCreAtIve II. Genome Biology, 9(Suppl 2):S4, 2008.

    Google Scholar 

  95. M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia. Evaluation of textmining systems for biology: Overview of the second BioCreAtIvE community challenge. Genome Biology, 9(Suppl 2):S1, 2008.

    Google Scholar 

  96. M. Krallinger, A. Valencia, and L. Hirschman. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome biology, 9(Suppl 2):S8, 2008.

    Google Scholar 

  97. M. Krauthammer and G. Nenadic. Term identification in the biomedical literature. Journal of Biomedical Informatics, 37(6):512526, 2004.

    Article  Google Scholar 

  98. M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman. Using BLAST for identifying gene and protein names in journal articles. Gene, 259(1-2):245252, 2000.

    Article  Google Scholar 

  99. R. Leaman and G. Gonzalez. BANNER: An executable survey of advances in biomedical named entity recognition. In Pacific Symposium on Biocomputing, pages 652663, 2008.

    Google Scholar 

  100. L. C. Lee, F. Horn, and F. E. Cohen. Automatic extraction of protein point mutations using a graph bigram association. PLoS Computational Biology, 3(2):e16, 2007.

    Google Scholar 

  101. G. Leech. Adding linguistic annotation. In M. Wynne, editor, Developing Linguistic Corpora: A Guide to Good Practice, pages 1729. Oxbow Books, 2005.

    Google Scholar 

  102. U. Leser and J. Hakenberg. What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics, 6(4):357369, 2005.

    Article  Google Scholar 

  103. M. Liberman, M. Mandel, and GlaxoSmithKline Pharmaceuticals R&D. PennBioIE CYP 1.0, 2008.

    Google Scholar 

  104. M. Liberman, M. Mandel, and P. White. PennBioIE Oncology 1.0, 2008.

    Google Scholar 

  105. C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.

    Google Scholar 

  106. C.-Y. Lin, G. Cao, J. Gao, and J.-Y. Nie. An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 463470, 2006.

    Google Scholar 

  107. J. Lin and D. Demner-Fushman. The role of knowledge in conceptual retrieval: A study in the domain of clinical medicine. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99106, 2006.

    Google Scholar 

  108. R. T. K. Lin, J. Liang-Te Chiu, H.-J. Dai, M.-Y. Day, R. T.-H. Tsai, and W.-L. Hsu. Biological question answering with syntactic and semantic feature matching and an improved mean reciprocal ranking measurement. In Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration, pages 184189, 2008.

    Google Scholar 

  109. D. A. Lindberg, B. L. Humphreys, and A. T. McCray. The unified medical language system. Methods of Information in Medicine, 32(4):281291, 1993.

    Google Scholar 

  110. X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai, and B. Schatz. Generating gene summaries from biomedical literature: A study of semi-structured summarization. Information Processing & Management, 43(6):17771791, 2007.

    Article  Google Scholar 

  111. Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PheneGo: Assigning phenotypic context to gene ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 6475, 2006.

    Google Scholar 

  112. Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PhenoGo: Assigning phenotypic context to Gene Ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 6475, 2006.

    Google Scholar 

  113. D. Maynard. D1.2.2.1.3 benchmarking of annotation tools, 2007. http://knowledgeweb.semanticweb.org/semanticportal/deliverables/D1.2.2.1.3.pdf.

    Google Scholar 

  114. K. R. McKeown, S.-F. Chang, J. Cimino, S. K. Feiner, C. Friedman, L. Gravano, V. Hatzivassiloglou, S. Johnson, D. A. Jordan, J. L. Klavans, A. Kushniruk, V. Patel, and S. Teufel. PERSIVAL, a system for personalized search and summarization over multimedia healthcare information. In Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pages 331340, 2001.

    Google Scholar 

  115. S. Mika and B. Rost. Protein names precisely peeled off free text. Bioinformatics, 20(suppl 1):i241i247, 2004.

    Article  Google Scholar 

  116. T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi. Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics, 6(Suppl 1):S8, 2005.

    Google Scholar 

  117. M. Miwa, R. Satre, and J.-D. Kim. Event extraction with complex event classification using rich features. Journal of Bioinformatics and Computational Biology, 8(1):131146, 2010.

    Article  Google Scholar 

  118. M. Miwa, R. Satre, Y. Miyao, and J. Tsujii. Protein-protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, 78(12):e39e46, 2009.

    Article  Google Scholar 

  119. Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, and J. Tsujii. Semantic retrieval for the accurate identification of relational concepts in massive textbases. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 10171024, 2006.

    Google Scholar 

  120. Y. Miyao, K. Sagae, R. Satre, T. Matsuzaki, and J. Tsujii. Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics, 25(3):394400, 2009.

    Article  Google Scholar 

  121. L. P. Morales, A. D. Esteban, and P. Gervás. Concept-graph based biomedical automatic summarization using ontologies. In Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, pages 5356, 2008.

    Google Scholar 

  122. A. Morgan, L. Hirschman, A. Yeh, and M. Colosimo. Gene name extraction using FlyBase resources. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine -Volume 13, pages 18, 2003.

    Google Scholar 

  123. A. A. Morgan, L. Hirschman, M. Colosimo, A. S. Yeh, and J. B. Colombe. Gene name identification and normalization using a model organism database. Journal of Biomedical Informatics, 37(6):396410, 2004.

    Article  Google Scholar 

  124. A. A. Morgan, Z. Lu, X. Want, A. M. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H.-h. Liu, R. Torres, M. Krauthammer, W. W. Lau, H. Liu, C.-N. Hsu, M. Scheumie, K. B. Cohen, and L. Hirschman. Overview of BioCre-AtIvE II: Gene normalization. Genome Biology, 9(Suppl 2):S3, 2008.

    Google Scholar 

  125. H. Müller, J. Kalpathy-Cramer, I. Eggel, S. Bedrick, C. E. Charles E. Kahn, Jr., and W. Hersh. Overview of the clef 2010 medical image retrieval track. In Working Notes of CLEF 2010, 2010.

    Google Scholar 

  126. M. Narayanaswamy, K. E. Ravikumar, and K. Vijay-Shanker. A biological named entity recognizer. In Pacific Symposium on Biocomputing, pages 427438, 2003.

    Google Scholar 

  127. National center for biomedical ontology. http://www.bioontology.org/.

    Google Scholar 

  128. NCBO BioPortal. http://bioportal.bioontology.org/.

    Google Scholar 

  129. National Center for Biotechnology Information. Entrez Programming Utilities Help, 2010. http://www.ncbi.nlm.nih.gov/books/NBK25501/.

    Google Scholar 

  130. National centre for text mining. http://www.nactem.ac.uk/.

    Google Scholar 

  131. C. Nédellec. Learning language in logic - genic interaction extraction challenge. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 3137, 2005.

    Google Scholar 

  132. Neuroscience information framework. http://neuinfo.org/.

    Google Scholar 

  133. Y. Niu and G. Hirst. Analysis and semantic classes in medical text for question answering. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2004.

    Google Scholar 

  134. Y. Niu, G. Hirst, G. McArthur, and R.-G. P. Answering clinical questions with role identification. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pages 7380, 2003.

    Google Scholar 

  135. Y. Niu, X. Zhu, and G. Hirst. Using outcome polarity in sentence extraction for medical question-answering. In AMIA Anual Symposium Proceedings, pages 599603, 2006.

    Google Scholar 

  136. Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity information in medical text. In AMIA Anual Symposium Proceedings, pages 570574, 2005.

    Google Scholar 

  137. C. Nobata, N. Collier, and J.-i. Tsujii. Automatic term identification and classification in biology texts. In Proceedings of the Natural Language Pacific Rim Symposium, pages 369374, 1999.

    Google Scholar 

  138. P. V. Ogren. Knowtator: A protégé plug-in for annotated corpus construction. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 273275, 2006.

    Google Scholar 

  139. D. Okanohara, Y. Miyao, Y. Tsuruoka, and J. Tsujii. Improving the scalability of semi-Markov conditional random fields for named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 465472, 2006.

    Google Scholar 

  140. F. Olsson, G. Eriksson, K. Franzén, L. Asker, and P. Lidén. Notions of correctness when evaluating protein name taggers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, pages 17, 2002.

    Google Scholar 

  141. Open biological and biomedical ontologies. http://www.obofoundry.org/.

    Google Scholar 

  142. ORBIT project. http://orbit.nlm.nih.gov/.

    Google Scholar 

  143. A. Özgür, T. Vu, G. Erkan, and D. R. Radev. Identifying genedisease associations using centrality on a literature mined geneinteraction network. Bioinformatics, 24(13):i277i285, 2008.

    Article  Google Scholar 

  144. A. Özgür, Z. Xiang, D. R. Radev, and Y. He. Literature-based discovery of IFN-γ and vaccine-mediated gene interaction networks. Journal of Biomedicine & Biotechnology, page 426479, 2010.

    Google Scholar 

  145. E. Pafilis, S. ODonoghue, L. Jensen, H. Horn, M. Kuhn, N. Brown, and R. Schneider. Reflect - augmented browsing for the life scientist. Nature Biotechnology, 27:508510, 2009.

    Article  Google Scholar 

  146. S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 160167, 2002.

    Google Scholar 

  147. M. Palakal, J. Bright, T. Sebastian, and S. Hartanto. A comparative study of cells in inflammation, EAE and MS using biomedical literature data mining. Journal of Biomedical Science, 14(1):6785, 2007.

    Google Scholar 

  148. V. Petri, M. Shimoyama, G. Hayman, J. Smith, M. Tutaj, J. de Pons, M. Dwinell, D. Munzenmaier, S. Twigger, and H. Jacob. The rat genome database pathway portal. Database, 2011.

    Google Scholar 

  149. I. Petrič, U. Tanja, B. Cestnik, and M. Macedoni-Lukšič. Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics, 42(2):219227, 2009.

    Article  Google Scholar 

  150. Pharmacogenomics knowledge base. http://www.pharmgkb.org/.

    Google Scholar 

  151. H. Poon and L. Vanderwende. Joint inference for knowledge extraction from biomedical literature. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 813821, 2010.

    Google Scholar 

  152. PubMed central open access subset. http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

    Google Scholar 

  153. S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9(Suppl 3):S6, 2008.

    Google Scholar 

  154. S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski. BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(1):50, 2007.

    Google Scholar 

  155. L. A. Ramshaw and M. P. Marcus. Text chunking using transformation-based learning. In 3rd ACL SIGDAT Workshop on Very Large Corpora, pages 8294, 1995.

    Google Scholar 

  156. L. H. Reeve, H. Han, and A. D. Brooks. The use of domainspecific concepts in biomedical text summarization. Information Processing & Management, 43(6):17651776, 2007.

    Article  Google Scholar 

  157. W. S. Richardson, M. C. Wilson, J. Nishikawa, and R. S. Hayward. The well-built clinical question: A key to evidence-based decisions. ACP Journal Club, 123(3):A12A13, 1995.

    Google Scholar 

  158. S. Riedel, H.-W. Chun, T. Takagi, and J. Tsujii. A Markov logic approach to bio-molecular event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 4149, 2009.

    Google Scholar 

  159. S. Riedel and A. McCallum. Fast and robust joint models for biomedical event extraction. In Proceedings of the 2011 Conference on Emperical Methods in Natural Language Processing, pages 112, 2011.

    Google Scholar 

  160. F. Rinaldi, J. Dowdall, G. Schneider, and A. Persidis. Answering questions in the genomics domain. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2005.

    Google Scholar 

  161. F. Rinaldi, K. Kaljurand, and R. Saetre. Terminological resources for text mining over biomedical scientific literature. Artificial Intelligence in Medicine, 52(2):107114, 2011.

    Article  Google Scholar 

  162. F. Rinaldi, G. Schneider, K. Kaljurand, M. Hess, C. Andronis, O. Konstandi, and A. Persidis. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artificial Intelligence in Medicine, 39(2):127136, 2007.

    Article  Google Scholar 

  163. T. C. Rindflesch and M. Fiszman. The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6):462477, 2003.

    Article  Google Scholar 

  164. T. C. Rindflesch, H. Kilicoglu, M. Fiszman, G. Rosemblat, and D. Shin. Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, 31:1521, 2011.

    Google Scholar 

  165. B. Rink, S. Harabagiu, and K. Roberts. Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association, 18(5):594600, 2011.

    Article  Google Scholar 

  166. A. Roberts, R. Gaizauskas, andM. Hepple. Extracting clinical relationships from patient narratives. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 1018, 2008.

    Google Scholar 

  167. P. Ruch, C. Boyer, C. Chichester, I. Tbahriti, A. Geissbühler, P. Fabry, J. Gobeill, V. Pillet, D. Rebholz-Schuhmann, C. Lovis, and A.-L. Veuthey. Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics, 76(2-3):195200, 2007.

    Article  Google Scholar 

  168. D. L. Sackett, W. M. C. Rosenberg, J. A. M. Gray, and R. B. Haynes. Evidence based medicine: What it is and what it isnt. British Medical Journal, 312(7023):7172, 1996.

    Article  Google Scholar 

  169. M. Saeed, M. Villarroel, A. Reisner, G. Clifford, L. Lehman, G. Moody, T. Heldt, T. Kyaw, B. Moody, and R. Mark. Multiparameter intelligent monitoring in intensive care II (MIMICII): A public-access intensive care unit database. Crit Care Med, 39(5):952960, 2011.

    Article  Google Scholar 

  170. J. Šarić, L. J. Jensen, R. Ouzounova, I. Rojas, and P. Bork. Extraction of regulatory gene/protein networks from MEDLINE. Bioinformatics, 22(6):645650, 2006.

    Article  Google Scholar 

  171. Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou. How to make the most of NE dictionaries in statistical NER. BMC Bioinformatics, 9(Suppl 11):S5, 2008.

    Google Scholar 

  172. J. Seki, K. Mostafa. Discovering implicit associations between genes and hereditary diseases. In Pacific Symposium on Biocomputing, pages 316327, 2007.

    Google Scholar 

  173. B. Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 104107, 2004.

    Google Scholar 

  174. B. Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(4):31913192, 2005.

    Article  Google Scholar 

  175. H. Shatkay, F. Pan, A. Rzhetsky, and W. Wilbur. Multidimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 24(18):20862093, 2008.

    Article  Google Scholar 

  176. H. Shatkay, J. W. Wilbur, and A. Rzhetsky. Annotation guidelines, 2005. http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/AnnotationGuidelines.pdf.

    Google Scholar 

  177. D. Shen, J. Zhang, G. Zhou, J. Su, and C.-L. Tan. Effective adaptation of a hidden markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 4956, 2003.

    Google Scholar 

  178. Z. Shi, G. Melli, Y. Wang, Y. Liu, B. Gu, M. Kashani, A. Sarkar, and F. Popowich. Question answering summarization of multiple biomedical documents. In Z. Kobti and D. Wu, editors, Advances in Artificial Intelligence, volume 4509 of Lecture Notes in Computer Science, pages 284295. Springer Berlin / Heidelberg, 2007.

    Google Scholar 

  179. M. S. Simpson, D. Demner-Fushman, and G. R. Thoma. Evaluating the importance of image-related text for ad-hoc and case-based biomedical article retrieval. In AMIA Annual Symposium Proceedings, pages 752756, 2010.

    Google Scholar 

  180. N. Smalheiser. The Arrowsmith project: 2005 status report. In A. Hoffmann, H. Motoda, and T. Scheffer, editors, Discovery Science, volume 3735 of Lecture Notes in Computer Science, pages 2643. Springer Berlin / Heidelberg, 2005.

    Google Scholar 

  181. N. Smalheiser, V. Torvik, A. Bischoff-Grethe, L. Burhans, M. Gabriel, R. Homayouni, A. Kashef, M. Martone, G. Perkins, D. Price, A. Talk, and R. West. Collaborative development of the arrowsmith two node search interface designed for laboratory investigators. Journal of Biomedical Discovery and Collaboration, 1(1):8, 2006.

    Google Scholar 

  182. N. Smalheiser, W. Zhou, and V. Torvik. Anne OTate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results. Journal of Biomedical Discovery and Collaboration, 3(1):2, 2008.

    Google Scholar 

  183. N. R. Smalheiser and D. R. Swanson. Using Arrowsmith: A computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3):149153, 1998.

    Article  Google Scholar 

  184. N. R. Smalheiser, V. I. Torvik, andW. Zhou. Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Computer Methods and Programs in Biomedicine, 94(2):190197, 2009.

    Google Scholar 

  185. L. Smith, L. Tanabe, R. Johnson nee Ando, C.-J. Kuo, I.-F. Chung, C.-N. Hsu, Y.-S. Lin, R. Klinger, C. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C. Struble, R. Povinelli, A. Vlachos, W. Baumgartner, L. Hunter, B. Carpenter, R. Tzong-Han Tsai, H.-J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. Mana-Lopez, J. Mata, and W. Wilbur. Overview of BioCreAtIve II: Gene mention recognition. Genome Biology, 9(Suppl 2):S2, 2008.

    Google Scholar 

  186. M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang. SNOWMED clinical terms: Overview of the development process and project status. In Proceedings of the AMIA Symposium, pages 662666, 2001.

    Google Scholar 

  187. D. R. Swanson. Fish oil, Raynauds syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):718, 1986.

    Google Scholar 

  188. D. R. Swanson. Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4):526557, 1988.

    Google Scholar 

  189. D. R. Swanson. Somatomedin C and arginine: Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2):157186, 1990.

    Google Scholar 

  190. D. R. Swanson. Complementary structures in disjoint science literatures. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 280289, 1991.

    Google Scholar 

  191. D. R. Swanson and N. R. Smalheiser. An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2):183203, 1997.

    Article  MATH  Google Scholar 

  192. D. R. Swanson, N. R. Smalheiser, and A. Bookstein. Information discovery from complementary literatures: Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10):797812, 2001.

    Article  Google Scholar 

  193. K. Takahashi, A. Koike, and T. Takagi. Question answering system in biomedical domain. In Proceedings of the 15th International Conference on Genome Informatics, pages 161162, 2004.

    Google Scholar 

  194. K. Takeuchi and N. Collier. Bio-medical entity extraction using support vector machines. Artificial Intelligence in Medicine, 33(2):125137, 2005.

    Article  Google Scholar 

  195. R. M. Terol, P. Martínez-Barco, and M. Palomar. A knowledge based method for the medical question answering problem. Computers in Biology and Medicine, 37(10):15111521, 2007.

    Article  Google Scholar 

  196. P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou. Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics, 10(1):349, 2009.

    Google Scholar 

  197. V. I. Torvik and N. R. Smalheiser. A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics, 23(13):16581665, 2007.

    Article  Google Scholar 

  198. TREC-9 filtering track collections. http://trec.nist.gov/data/t9_filtering.html.

    Google Scholar 

  199. TREC genomics track data. http://ir.ohsu.edu/genomics/data.html.

    Google Scholar 

  200. R. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu. BIOSMILE: A semantic role labeling system for biomedical berbs using a maximumentropy model with automatically generated template features. BMC Bioinformatics, 8(1):325, 2007.

    Google Scholar 

  201. Y. Tsuruoka, M. Miwa, K. Hamamoto, J. Tsujii, and S. Ananiadou. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics, 27(13):i111i119, 2011.

    Article  Google Scholar 

  202. Y. Tsuruoka and J. Tsujii. Boosting precision and recall of dictionary-based protein name recognition. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 4148, 2003.

    Google Scholar 

  203. Y. Tsuruoka and J. Tsujii. Probabilistic term variant generator for biomedical terms. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 167173, 2003.

    Google Scholar 

  204. Y. Tsuruoka, J. Tsujii, and S. Ananiadou. FACTA: A text search engine for finding associated biomedical concepts. Bioinformatics, 24(21):25592560, 2008.

    Article  Google Scholar 

  205. O. Tuason, L. Chen, L. H., and C. Friedman. Biological nomenclatures: A source of lexical knowledge and ambiguity. In Pacific Symposium on Biocomputing, pages 238249, 2004.

    Google Scholar 

  206. H. Turtle and W. B. Croft. Evaluation of an inference networkbased retrieval model. ACM Transactions on Information Systems, 9:187222, 1991.

    Article  Google Scholar 

  207. Orange book: Approved drug products with therapeutic equivalence evaluations. http://www.accessdata.fda.gov/scripts/cder/ob/default.cfm.

    Google Scholar 

  208. Databases, resources & APIs. http://wwwcf2.nlm.nih.gov/nlm_eresources/eresources/search_database.cfm.

    Google Scholar 

  209. University of Pittsburgh NLP repository. http://www.dbmi.pitt.edu/nlpfront.

    Google Scholar 

  210. Y. Usami, H.-C. Cho, N. Okazaki, and J. Tsujii. Automatic acquisition of huge training data for bio-medical named entity recognition. In Proceedings of BioNLP 2011 Workshop, pages 6573, 2011.

    Google Scholar 

  211. O. Uzuner. Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16(5):561570, 2009.

    Article  Google Scholar 

  212. O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane. Identifyingn patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 15(1):1424, 2008.

    Article  Google Scholar 

  213. O. Uzuner, I. Solti, and E. Cadag. Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5):514518, 2010.

    Article  Google Scholar 

  214. O. Uzuner, B. R. South, S. Shen, and S. L. DuVall. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552556, 2011.

    Article  Google Scholar 

  215. V. Vincze, G. Szarvas, R. Farkas, G. Mora, and J. Csirik. The Bio-Scope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.

    Google Scholar 

  216. A. Vlachos and C. Gasperin. Bootstrapping and evaluating named entity recognition in the biomedical domain. In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 138145, 2006.

    Google Scholar 

  217. T. Wattarujeekrit, P. Shah, and N. Collier. PASBio: Predicateargument structures for event extraction in molecular biology. BMC Bioinformatics, 5(1):155, 2004.

    Google Scholar 

  218. M. Weeber, H. Klein, L. T. W. de Jong-van den Berg, and R. Vos. Using concepts in literature-based discovery: Simulating Swansons Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7):548557, 2001.

    Google Scholar 

  219. W. Weiming, D. Hu, M. Feng, and L. Wenyin. Automatic clinical question answering based on UMLS relations. In Third International Conference on Semantics, Knowledge and Grid, pages 495498, 2007.

    Google Scholar 

  220. J. W. Wilbur, A. Rzhetsky, and H. Shatkay. New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7:356, 2006.

    Article  Google Scholar 

  221. G. Williams, P. Davis, A. Rogers, T. Bieri, P. Ozersky, and J. Spieth. Methods and strategies for gene structure curation in wormbase. Database, 2011.

    Google Scholar 

  222. K. Yamamoto, T. Kudo, A. Konagaya, and Y. Matsumoto. Protein name tagging for biomedical annotation in text. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 6572, 2003.

    Google Scholar 

  223. J. Yang, A. M. Cohen, and W. Hersh. Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. In AMIA Annual Symposium Proceedings, pages 831835, 2007.

    Google Scholar 

  224. A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. BioCreAtIvE task 1A: Gene mention finding evaluation. BMC Bioinformatics, 6(Suppl 1):S2, 2005.

    Google Scholar 

  225. M. Yetisgen-Yildiz and W. Pratt. Using statistical and knowledgebased approaches for literature-based discovery. Journal of Biomedical Informatics, 39(6):600611, 2006.

    Article  Google Scholar 

  226. M. Yetisgen-Yildiz and W. Pratt. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics, 42(4):633643, 2009.

    Article  Google Scholar 

  227. I. Yoo, X. Hu, and I.-Y. Song. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinformatics, 8(Suppl 9):S4, 2007.

    Google Scholar 

  228. H. Yu, S. Agarwal, M. Johnston, and A. Cohen. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration, 4(1):1, 2009.

    Google Scholar 

  229. H. Yu and Y.-G. Cao. Automatically extracting information needs from ad hoc clinical questions. In AMIA Annual Symposium Proceedings, pages 96100, 2008.

    Google Scholar 

  230. H. Yu and M. Lee. Accessing bioscience images from abstract sentences. Bioinformatics, 22(14):e547e556, 2006.

    Article  Google Scholar 

  231. H. Yu, M. Lee, D. Kaufman, J. Ely, J. A. Osheroff, G. Hripcsak, and J. Cimino. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. Journal of Biomedical Informatics, 40(3):236251, 2007.

    Article  Google Scholar 

  232. H. Yu and C. Sable. Being Erlang Shen: Identifying answerable questions. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence on Knowledge and Reasonin for Answering Questions, pages 614, 2005.

    Google Scholar 

  233. H. Yu, C. Sable, and H. Zhu. Classifying medical questions based on an evidence taxonomy. In Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, 2005.

    Google Scholar 

  234. G. Zhou, D. Shen, J. Zhang, J. Su, and S. Tan. Recognition of protein/ gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(Suppl 1):S7, 2005.

    Google Scholar 

  235. P. Zweigenbaum and D. Demner-Fushman. Advanced literaturemining tools. In D. Edwards, J. Stajich, and D. Hansen, editors, Bioinformatics: Tools and Applications, pages 347380. Springer, 2009.

    Google Scholar 

  236. P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen. Frontiers of biomedical text mining: Current progress. Briefings in Bioinformatics, 8(5):358375, 2007.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew S. Simpson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Simpson, M.S., Demner-Fushman, D. (2012). Biomedical Text Mining: A Survey of Recent Progress. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3223-4_14

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-3222-7

  • Online ISBN: 978-1-4614-3223-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics