Biomedical Text Mining: A Survey of Recent Progress

Simpson, Matthew S.; Demner-Fushman, Dina

doi:10.1007/978-1-4614-3223-4_14

Matthew S. Simpson³ &
Dina Demner-Fushman³

20k Accesses
51 Citations

Abstract

The biomedical community makes extensive use of text mining technology. In the past several years, enormous progress has been made in developing tools and methods, and the community has been witness to some exciting developments. Although the state of the community is regularly reviewed, the sheer volume of work related to biomedical text mining and the rapid pace in which progress continues to be made make this a worthwhile, if not necessary, endeavor. This chapter provides a brief overview of the current state of text mining in the biomedical domain. Emphasis is placed on the resources and tools available to biomedical researchers and practitioners, as well as the major text mining tasks of interest to the community. These tasks include the recognition of explicit facts from biomedical literature, the discovery of previously unknown or implicit facts, document summarization, and question answering. For each topic, its basic challenges and methods are outlined and recent and influential work is reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. B. Abacha and P. Zweigenbaum. A hybrid approach for the extraction of semantic relations from MEDLINE abstracts. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 6609 of Lecture Notes in Computer Science, pages 139–150. Springer Berlin / Heidelberg, 2011.
Google Scholar
A. B. Abacha and P. Zweigenbaum. Medical entity recognition: A comparison of semantic and statistical methods. In Proceedings of BioNLP 2011 Workshop, pages 56–64, 2011.
Google Scholar
S. Afantenos, V. Karkaletsis, and P. Stamatopoulos. Summarization from medical documents: A survey. Artificial Intelligence in Medicine, 33(2):157–177, 2005.
Article Google Scholar
S. Agarwal and H. Yu. Automatically classifying sentences in fulltext biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23):3174–3180, 2009.
Article Google Scholar
S. Agarwal and H. Yu. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, pages 6–10, 2009.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. American Association for Artificial Intelligence, 1996.
Google Scholar
A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter, and T. Salakoski. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics, 9(Suppl 11):S2, 2008.
Google Scholar
B. Alex, B. Haddow, and C. Grover. Recognising nested named entities in biomedical text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pages 65–72, 2007.
Google Scholar
R. B. Altman, C. M. Bergman, J. Blake, C. Blaschke, A. Cohen, F. Gannon, L. Grivell, U. Hahn, W. Hersh, L. Hirschman, L. J. Jensen, M. Krallinger, B. Mons, S. I. O’Donoghue, M. C. Peitsch, D. Rebholz-Schuhmann, H. Shatkay, and A. Valencia. Text mining for biology - the way forward: opinions from leading scientists. Genome Biology, 9(Suppl 2):S7, 2008.
Google Scholar
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410, 1990.
Google Scholar
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, 1997.
Article Google Scholar
S. Ananiadou and J. Mcnaught. Text Mining for Biology And Biomedicine. Artech House, Inc., 2005.
Google Scholar
S. Ananiadou, S. Pyysalo, J. Tsujii, and D. B. Kell. Event extraction for systems biology by text mining the literature. Trends in Biotechnology, 28(7):381–390, 2010.
Article Google Scholar
A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236, 2010.
Google Scholar
R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596, 2008.
Article Google Scholar
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cheryy, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1):25–29, 2000.
Google Scholar
S. J. Athenikos and H. Han. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine, 99(1):1–24, 2010.
Article Google Scholar
B. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, and J. H. Holmes. Identifying potential adverse effects using the web: A new approach to medical hypothesis generation. In Press, 2011.
Google Scholar
BioNLP. http://www.bionlp.org/.
Google Scholar
J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, and T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics, 26(12):i382–i390, 2010.
Article Google Scholar
J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski. Extracting complex biological events with rich graphbased feature sets. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 10–18, 2009.
Google Scholar
K. W. Boyack, D. Newman, R. J. Duhon, R. Klavans, M. Patek, J. R. Biberstine, B. Schijvenaars, A. Skupin, N. Ma, and K. Borner. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3):e18029, 2011.
Google Scholar
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H.-P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 9(1):207, 2008.
Google Scholar
E. Buyko, E. Faessler, J. Wermter, and U. Hahn. Event extraction from trimmed dependency graphs. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 19–27, 2009.
Google Scholar
Y. Cai and X. Cheng. Biomedical named entity recognition with tri-training learning. In Proceedings of the 2009 2nd International Conference on Biomedical Engineering and Informatics, pages 1–5, 2009.
Google Scholar
CALBC challenge. http://www.calbc.eu/.
Google Scholar
Y. Cao, F. Liu, P. Simpson, L. Antieau, A. Bennett, J. J. Cimino, J. Ely, and H. Yu. AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2):277–288, 2011.
Article Google Scholar
D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang. Protemot: Prediction of protein binding sites with automatically extracted geometrical templates. Nucleic Acids Research, 34(suppl 2):W303–W309, 2006.
Article Google Scholar
W. W. Chapman and K. B. Cohen. Current issues in biomedical text mining and natural language processing. Journal of Biomedical Informatics, 42(5):757–759, 2009.
Article Google Scholar
E. S. Chen, G. Hripcsak, H. Xu, M. Markatou, and C. Friedman. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study. Journal of the American Medical Informatics Association, 15(1):87–98, 2008.
Article Google Scholar
H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In Pacific Symposium on Biocomputing, pages 4–15, 2006.
Google Scholar
A. M. Cohen andW. R. Hersh. A survey of current work in biomedical text mining. Briefings in Bioinformatics, 6(1):57–71, 2005.
Google Scholar
K. B. Cohen and L. Hunter. Getting started in text mining. PLoS Computational Biology, 4(1):e20, 2008.
Google Scholar
K. B. Cohen, K. Verspoor, H. L. Johnson, C. Roeder, P. V. Ogren, W. A. Baumgartner, Jr., E. White, H. Tipney, and L. Hunter. High-precision biological event extraction with a concept recognizer. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 50–58, 2009.
Google Scholar
T. Cohen, G. K. Whitfield, R. W. Schvaneveldt, K. Mukund, and T. Rindflesch. EpiphaNet: An interactive tool to support biomedical discoveries. Journal of Biomedical Discovery and Collaboration, 5:21–49, 2010.
Google Scholar
N. Collier, C. Nobata, and J.-i. Tsujii. Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1, pages 201–207, 2000.
Google Scholar
P. Corbett and A. Copestake. Cascaded classifiers for confidencebased chemical named entity recognition. BMC Bioinformatics, 9(Suppl 11):S4, 2008.
Google Scholar
CRAFT: The colorado richly annotated full text corpus. http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Google Scholar
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Daml janovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Version 6). GATE, 2011.
Google Scholar
T. Delbecque, P. Jacquemart, and P. Zweigenbaum. Indexing UMLS semantic types for medical question-answering. In R. Engelbrecht, A. Geissbuhler, C. Lovis, and G. Mihalas, editors, Connecting Medical Informatics and Bio-Informatics: Proceedings of MIE2005 - The XIXth International Congress of the European Federation for Medical Informatics, pages 805–810. IOS Press, 2005.
Google Scholar
D. Demner-Fushman, W. W. Chapman, and C. J. McDonald. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5):760–772, 2009.
Article Google Scholar
D. Demner-Fushman, B. Few, S. E. Hauser, and G. Thoma. Automatically identifying health outcome information in MEDLINE records. Journal of the American Medical Informatics Association, 13(1):52–60, 2006.
Article Google Scholar
D. Demner-Fushman and J. Lin. Knowledge exraction for clinical question answering: Preliminary results. In Proceedings of the AAAI 2005 Workshop on Question Ansering in Restricted Domains, 2005.
Google Scholar
D. Demner-Fushman and J. Lin. Answer extraction, semantic clustering, and extractive summarization for clinical question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 841–848, 2006.
Google Scholar
D. Demner-Fushman and J. Lin. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63–103, 2007.
Article Google Scholar
D. Demner-Fushman, C. Seckman, C. Fisher, S. E. Hauser, J. Clayton, and G. R.1. Thoma. A prototype system to support evidencebased practice. In AMIA Annual Symposium Proceedings, pages 151–155, 2008.
Google Scholar
S. Dipper, M. Götze, and M. Stede. Simple annotation tools for complex annotation tasks: An evaluation. In Proceedings of the LREC Workshop on XML-Based Richly Annotated Corpora, pages 54–62, 2004.
Google Scholar
eHOST: The extensible human oracle suite of tools. http://code.google.com/p/ehost/.
Google Scholar
N. Elhadad, M.-Y. Kan, J. L. Klavans, and K. R. McKeown. Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2):179–198, 2005.
Article Google Scholar
J. W. Ely, J. A. Osheroff, M. H. Ebell, M. L. Chambliss, D. C. Vinson, J. J. Stevermer, and E. A. Pifer. Obstacles to answering doctors’ questions about patient care with evidence: qualitative study. British Medical Journal, 324(7339):710, 2002.
Google Scholar
Electronic medical records and genomics. https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page.
Google Scholar
European bioinformatics institute. http://www.ebi.ac.uk/.
Google Scholar
D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty. Building Watson: An overview of the DeepQA project. AI Magazine, 31(3):59–79, 2010.
Google Scholar
D. Ferrucci and A. Lally. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327–348, 2004.
Article Google Scholar
J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair. Exploiting context for biomedical entity recognition: From syntax to the web. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 88–91, 2004.
Google Scholar
M. Fiszman, D. Demner-Fushman, H. Kilicoglu, and T. C. Rindflesch. Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Journal of Biomedical Informatics, 42(5):801–813, 2009.
Article Google Scholar
K. Franzén, G. Eriksson, F. Olsson, L. Asker, P. Lidén, and J. Cöster. Protein names and how to find them. International Journal of Medical Informatics, 67(1-3):49–61, 2002.
Article Google Scholar
C. Friedman, G. Hripcsak, L. Shagina, and H. Liu. Arepresenting information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association, 6:76–87, 1999.
Article Google Scholar
K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi. Toward information extraction: Identifying protein names from biological papers. In Pacific Symposium on Biocomputing, pages 707–718, 1998.
Google Scholar
K. Fundel, R. Küffner, and R. Zimmer. RelEx—relation extraction using dependency parse trees. Bioinformatics, 23(3):365–371, 2007.
Article Google Scholar
R. Gaizauskas, G. Demetriou, P. J. Artymiuk, and P. Willett. Protein structures and information extraction from biological texts: The PASTA system. Bioinformatics, 19(1):135–143, 2003.
Article Google Scholar
B. Gu. Recognizing nested named entities in GENIA corpus. In Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis, pages 112–113, 2006.
Google Scholar
J. Hakenberg, S. Bickel, C. Plake, U. Brefeld, H. Zahn, L. Faulstich, U. Leser, and T. Scheffer. Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(Suppl 1):S9, 2005.
Google Scholar
J. Hakenberg, C. Plake, and U. Leser. LLL’05 challenge: Genic interaction extraction - identification of language patterns based on alignment and finite state automata. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 38–45, 2005.
Google Scholar
W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Health Informatics. Springer, third edition, 2005.
Google Scholar
HighWire press. http://highwire.org/.
Google Scholar
L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh. Overview of BioCreAtIvE task 1B: Normalized gene lists. BMC Bioinformatics, 6(Suppl 1):S11, 2005.
Google Scholar
L. Hirschman, A. A. Morgan, and A. S. Yeh. Rutabaga by any other name: Extracting biological names. Journal of Biomedical Informatics, 35(4):247–259, 2002.
Article Google Scholar
L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia. Overview of BioCreAtIvE: Critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1):S1, 2005.
Google Scholar
W.-J. Hou and H.-H. Chen. Enhancing performance of protein name recognizers using collocation. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 25–32, 2003.
Google Scholar
D. Hristovski, C. Friedman, T. C. Rindflesch, and B. Peterlin. Exploiting semantic relations for literature-based discovery. In AMIA Anual Symposium Proceedings, pages 349–353, 2006.
Google Scholar
D. Hristovski, B. Peterlin, S. Džeroski, and J. Stare. Literaturebased discovery support system and its application to disease gene identification. In S. Džeroski and L. Todorovski, editors, Computational Discovery of Scientific Knowledge, volume 4660 of Lecture Notes in Computer Science, pages 307–326. Springer Berlin / Heidelberg, 2007.
Google Scholar
D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Improving literature-based discovery support by genetic knowledge integration. Studies in Health Technogy and Informatics, 95:68–73, 2003.
Google Scholar
D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2-4):289–298, 2005.
Article Google Scholar
D. Hristovski, J. Stare, B. Peterlin, and S. Džeroski. Supporting discovery in medicine by association rule mining in MEDLINE and UMLS. In V. L. Patel, R. Rogers, and R. Haux, editors, Proceedings of the 10th World Congress on Medical Informatics, volume 84/2001 of Studies in Health Technology and Informatics, pages 1344–1348. IOS Press, 2001.
Google Scholar
X. Hu, X. Zhang, I. Yoo, X. Wang, and J. Feng. Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule. International Journal of Intelligent Systems, 25(2):207–223, 2010.
Google Scholar
X. Huang, J. Lin, and D. Demner-Fushman. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA Annual Symposium Proceedings, pages 359–363, 2006.
Google Scholar
K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science yournal articles: Enzyme interactions and protein structures. In Pacific Symposium on Biocomputing, pages 502–513, 2000.
Google Scholar
L. Hunter, Z. Lu, J. Firby, W. Baumgartner, H. Johnson, P. Ogren, and K. B. Cohen. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-typespecific gene expression. BMC Bioinformatics, 9(1):78, 2008.
Google Scholar
Informatics for integrating biology and the bedside. https://www.i2b2.org/resrcs/hive.html.
Google Scholar
P. Jacqumart and P. Zweigenbaum. Towards a medical questionanswering system: A feasibility study. Studies in Health Technology and Informatics, 95:463–468, 2003.
Google Scholar
R. Jelier, G. Jenster, L. Dorssers, B. Wouters, P. Hendriksen, B. Mons, R. Delwel, and J. Kors. Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC Bioinformatics, 8(1):14, 2007.
Google Scholar
R. Kabiljo, A. B. Clegg, and A. J. Shepherd. A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics, 10:233, 2008.
Article Google Scholar
J. Kalpathy-Cramer, H. Müler, S. Bedrick, I. Eggel, A. de Herrera, and T. Tsikrika. The CLEF 2011 medical image retrieval and classification tasks. In CLEF 2011 Working Notes, 2011.
Google Scholar
H. Karsten and H. Suominen. Mining of clinical and biomedical text and data. International Journal of Medical Informatics, 78(12):786–787, 2009.
Article Google Scholar
J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. Tuning support vector machines for biomedical named entity recognition. In Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, pages 1–8, 2002.
Google Scholar
H. Kilicoglu and S. Bergler. Syntactic dependency based heuristics for biological event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 119–127, 2009.
Google Scholar
J.-D. Kim, T. Ohta, N. Nguyen, S. Pyysalo, R. Bossy, and J. Tsujii. Overview of BioNLP shared task 2011. In Proceedings of the BioNLP Shared Task 2011 Workshop, pages 1–6, 2011.
Google Scholar
J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 1–9, 2009.
Google Scholar
J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics, 19(Suppl 1):i180–i182, 2003.
Article Google Scholar
J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier. Introduction to the bio-entity recognition task at JNLPBA. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 70–75, 2004.
Google Scholar
S. Kim, J. Yoon, and J. Yang. Kernel approaches for genic interaction extraction. Bioinformatics, 24(1):118–126, 2008. [93] S. Kinoshita, K. B. Cohen, P. Ogren, and L. Hunter. BioCreAtIvE task 1A: Entity identification with a stochastic tagger. BMC Bioinformatics, 6(Suppl 1):S4, 2005.
Google Scholar
J. Kontos, J. Lekakis, I. Malagardi, and J. Peros. Grammars for question answering systems based on intelligent text mining in biomedicine. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005. [95] J. Kontos, I. Malagardi, and J. Peros. Question answering and rhetoric analysis of biomedical texts in the AROMA system. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005.
Google Scholar
M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia. Overview of the protein-protein interaction annotation extraction task of BioCreAtIve II. Genome Biology, 9(Suppl 2):S4, 2008.
Google Scholar
M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia. Evaluation of textmining systems for biology: Overview of the second BioCreAtIvE community challenge. Genome Biology, 9(Suppl 2):S1, 2008.
Google Scholar
M. Krallinger, A. Valencia, and L. Hirschman. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome biology, 9(Suppl 2):S8, 2008.
Google Scholar
M. Krauthammer and G. Nenadic. Term identification in the biomedical literature. Journal of Biomedical Informatics, 37(6):512–526, 2004.
Article Google Scholar
M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman. Using BLAST for identifying gene and protein names in journal articles. Gene, 259(1-2):245–252, 2000.
Article Google Scholar
R. Leaman and G. Gonzalez. BANNER: An executable survey of advances in biomedical named entity recognition. In Pacific Symposium on Biocomputing, pages 652–663, 2008.
Google Scholar
L. C. Lee, F. Horn, and F. E. Cohen. Automatic extraction of protein point mutations using a graph bigram association. PLoS Computational Biology, 3(2):e16, 2007.
Google Scholar
G. Leech. Adding linguistic annotation. In M. Wynne, editor, Developing Linguistic Corpora: A Guide to Good Practice, pages 17–29. Oxbow Books, 2005.
Google Scholar
U. Leser and J. Hakenberg. What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics, 6(4):357–369, 2005.
Article Google Scholar
M. Liberman, M. Mandel, and GlaxoSmithKline Pharmaceuticals R&D. PennBioIE CYP 1.0, 2008.
Google Scholar
M. Liberman, M. Mandel, and P. White. PennBioIE Oncology 1.0, 2008.
Google Scholar
C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.
Google Scholar
C.-Y. Lin, G. Cao, J. Gao, and J.-Y. Nie. An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 463–470, 2006.
Google Scholar
J. Lin and D. Demner-Fushman. The role of knowledge in conceptual retrieval: A study in the domain of clinical medicine. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99–106, 2006.
Google Scholar
R. T. K. Lin, J. Liang-Te Chiu, H.-J. Dai, M.-Y. Day, R. T.-H. Tsai, and W.-L. Hsu. Biological question answering with syntactic and semantic feature matching and an improved mean reciprocal ranking measurement. In Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration, pages 184–189, 2008.
Google Scholar
D. A. Lindberg, B. L. Humphreys, and A. T. McCray. The unified medical language system. Methods of Information in Medicine, 32(4):281–291, 1993.
Google Scholar
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai, and B. Schatz. Generating gene summaries from biomedical literature: A study of semi-structured summarization. Information Processing & Management, 43(6):1777–1791, 2007.
Article Google Scholar
Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PheneGo: Assigning phenotypic context to gene ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 64–75, 2006.
Google Scholar
Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PhenoGo: Assigning phenotypic context to Gene Ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 64–75, 2006.
Google Scholar
D. Maynard. D1.2.2.1.3 benchmarking of annotation tools, 2007. http://knowledgeweb.semanticweb.org/semanticportal/deliverables/D1.2.2.1.3.pdf.
Google Scholar
K. R. McKeown, S.-F. Chang, J. Cimino, S. K. Feiner, C. Friedman, L. Gravano, V. Hatzivassiloglou, S. Johnson, D. A. Jordan, J. L. Klavans, A. Kushniruk, V. Patel, and S. Teufel. PERSIVAL, a system for personalized search and summarization over multimedia healthcare information. In Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pages 331–340, 2001.
Google Scholar
S. Mika and B. Rost. Protein names precisely peeled off free text. Bioinformatics, 20(suppl 1):i241–i247, 2004.
Article Google Scholar
T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi. Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics, 6(Suppl 1):S8, 2005.
Google Scholar
M. Miwa, R. Satre, and J.-D. Kim. Event extraction with complex event classification using rich features. Journal of Bioinformatics and Computational Biology, 8(1):131–146, 2010.
Article Google Scholar
M. Miwa, R. Satre, Y. Miyao, and J. Tsujii. Protein-protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, 78(12):e39–e46, 2009.
Article Google Scholar
Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, and J. Tsujii. Semantic retrieval for the accurate identification of relational concepts in massive textbases. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 1017–1024, 2006.
Google Scholar
Y. Miyao, K. Sagae, R. Satre, T. Matsuzaki, and J. Tsujii. Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics, 25(3):394–400, 2009.
Article Google Scholar
L. P. Morales, A. D. Esteban, and P. Gervás. Concept-graph based biomedical automatic summarization using ontologies. In Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, pages 53–56, 2008.
Google Scholar
A. Morgan, L. Hirschman, A. Yeh, and M. Colosimo. Gene name extraction using FlyBase resources. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine -Volume 13, pages 1–8, 2003.
Google Scholar
A. A. Morgan, L. Hirschman, M. Colosimo, A. S. Yeh, and J. B. Colombe. Gene name identification and normalization using a model organism database. Journal of Biomedical Informatics, 37(6):396–410, 2004.
Article Google Scholar
A. A. Morgan, Z. Lu, X. Want, A. M. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H.-h. Liu, R. Torres, M. Krauthammer, W. W. Lau, H. Liu, C.-N. Hsu, M. Scheumie, K. B. Cohen, and L. Hirschman. Overview of BioCre-AtIvE II: Gene normalization. Genome Biology, 9(Suppl 2):S3, 2008.
Google Scholar
H. Müller, J. Kalpathy-Cramer, I. Eggel, S. Bedrick, C. E. Charles E. Kahn, Jr., and W. Hersh. Overview of the clef 2010 medical image retrieval track. In Working Notes of CLEF 2010, 2010.
Google Scholar
M. Narayanaswamy, K. E. Ravikumar, and K. Vijay-Shanker. A biological named entity recognizer. In Pacific Symposium on Biocomputing, pages 427–438, 2003.
Google Scholar
National center for biomedical ontology. http://www.bioontology.org/.
Google Scholar
NCBO BioPortal. http://bioportal.bioontology.org/.
Google Scholar
National Center for Biotechnology Information. Entrez Programming Utilities Help, 2010. http://www.ncbi.nlm.nih.gov/books/NBK25501/.
Google Scholar
National centre for text mining. http://www.nactem.ac.uk/.
Google Scholar
C. Nédellec. Learning language in logic - genic interaction extraction challenge. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 31–37, 2005.
Google Scholar
Neuroscience information framework. http://neuinfo.org/.
Google Scholar
Y. Niu and G. Hirst. Analysis and semantic classes in medical text for question answering. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2004.
Google Scholar
Y. Niu, G. Hirst, G. McArthur, and R.-G. P. Answering clinical questions with role identification. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pages 73–80, 2003.
Google Scholar
Y. Niu, X. Zhu, and G. Hirst. Using outcome polarity in sentence extraction for medical question-answering. In AMIA Anual Symposium Proceedings, pages 599–603, 2006.
Google Scholar
Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity information in medical text. In AMIA Anual Symposium Proceedings, pages 570–574, 2005.
Google Scholar
C. Nobata, N. Collier, and J.-i. Tsujii. Automatic term identification and classification in biology texts. In Proceedings of the Natural Language Pacific Rim Symposium, pages 369–374, 1999.
Google Scholar
P. V. Ogren. Knowtator: A protégé plug-in for annotated corpus construction. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 273–275, 2006.
Google Scholar
D. Okanohara, Y. Miyao, Y. Tsuruoka, and J. Tsujii. Improving the scalability of semi-Markov conditional random fields for named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 465–472, 2006.
Google Scholar
F. Olsson, G. Eriksson, K. Franzén, L. Asker, and P. Lidén. Notions of correctness when evaluating protein name taggers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, pages 1–7, 2002.
Google Scholar
Open biological and biomedical ontologies. http://www.obofoundry.org/.
Google Scholar
ORBIT project. http://orbit.nlm.nih.gov/.
Google Scholar
A. Özgür, T. Vu, G. Erkan, and D. R. Radev. Identifying genedisease associations using centrality on a literature mined geneinteraction network. Bioinformatics, 24(13):i277–i285, 2008.
Article Google Scholar
A. Özgür, Z. Xiang, D. R. Radev, and Y. He. Literature-based discovery of IFN-γ and vaccine-mediated gene interaction networks. Journal of Biomedicine & Biotechnology, page 426479, 2010.
Google Scholar
E. Pafilis, S. O’Donoghue, L. Jensen, H. Horn, M. Kuhn, N. Brown, and R. Schneider. Reflect - augmented browsing for the life scientist. Nature Biotechnology, 27:508–510, 2009.
Article Google Scholar
S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 160–167, 2002.
Google Scholar
M. Palakal, J. Bright, T. Sebastian, and S. Hartanto. A comparative study of cells in inflammation, EAE and MS using biomedical literature data mining. Journal of Biomedical Science, 14(1):67–85, 2007.
Google Scholar
V. Petri, M. Shimoyama, G. Hayman, J. Smith, M. Tutaj, J. de Pons, M. Dwinell, D. Munzenmaier, S. Twigger, and H. Jacob. The rat genome database pathway portal. Database, 2011.
Google Scholar
I. Petrič, U. Tanja, B. Cestnik, and M. Macedoni-Lukšič. Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics, 42(2):219–227, 2009.
Article Google Scholar
Pharmacogenomics knowledge base. http://www.pharmgkb.org/.
Google Scholar
H. Poon and L. Vanderwende. Joint inference for knowledge extraction from biomedical literature. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 813–821, 2010.
Google Scholar
PubMed central open access subset. http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.
Google Scholar
S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9(Suppl 3):S6, 2008.
Google Scholar
S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski. BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(1):50, 2007.
Google Scholar
L. A. Ramshaw and M. P. Marcus. Text chunking using transformation-based learning. In 3rd ACL SIGDAT Workshop on Very Large Corpora, pages 82–94, 1995.
Google Scholar
L. H. Reeve, H. Han, and A. D. Brooks. The use of domainspecific concepts in biomedical text summarization. Information Processing & Management, 43(6):1765–1776, 2007.
Article Google Scholar
W. S. Richardson, M. C. Wilson, J. Nishikawa, and R. S. Hayward. The well-built clinical question: A key to evidence-based decisions. ACP Journal Club, 123(3):A12–A13, 1995.
Google Scholar
S. Riedel, H.-W. Chun, T. Takagi, and J. Tsujii. A Markov logic approach to bio-molecular event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 41–49, 2009.
Google Scholar
S. Riedel and A. McCallum. Fast and robust joint models for biomedical event extraction. In Proceedings of the 2011 Conference on Emperical Methods in Natural Language Processing, pages 1–12, 2011.
Google Scholar
F. Rinaldi, J. Dowdall, G. Schneider, and A. Persidis. Answering questions in the genomics domain. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2005.
Google Scholar
F. Rinaldi, K. Kaljurand, and R. Saetre. Terminological resources for text mining over biomedical scientific literature. Artificial Intelligence in Medicine, 52(2):107–114, 2011.
Article Google Scholar
F. Rinaldi, G. Schneider, K. Kaljurand, M. Hess, C. Andronis, O. Konstandi, and A. Persidis. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artificial Intelligence in Medicine, 39(2):127–136, 2007.
Article Google Scholar
T. C. Rindflesch and M. Fiszman. The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6):462–477, 2003.
Article Google Scholar
T. C. Rindflesch, H. Kilicoglu, M. Fiszman, G. Rosemblat, and D. Shin. Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, 31:15–21, 2011.
Google Scholar
B. Rink, S. Harabagiu, and K. Roberts. Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association, 18(5):594–600, 2011.
Article Google Scholar
A. Roberts, R. Gaizauskas, andM. Hepple. Extracting clinical relationships from patient narratives. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 10–18, 2008.
Google Scholar
P. Ruch, C. Boyer, C. Chichester, I. Tbahriti, A. Geissbühler, P. Fabry, J. Gobeill, V. Pillet, D. Rebholz-Schuhmann, C. Lovis, and A.-L. Veuthey. Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics, 76(2-3):195–200, 2007.
Article Google Scholar
D. L. Sackett, W. M. C. Rosenberg, J. A. M. Gray, and R. B. Haynes. Evidence based medicine: What it is and what it isn’t. British Medical Journal, 312(7023):71–72, 1996.
Article Google Scholar
M. Saeed, M. Villarroel, A. Reisner, G. Clifford, L. Lehman, G. Moody, T. Heldt, T. Kyaw, B. Moody, and R. Mark. Multiparameter intelligent monitoring in intensive care II (MIMICII): A public-access intensive care unit database. Crit Care Med, 39(5):952–960, 2011.
Article Google Scholar
J. Šarić, L. J. Jensen, R. Ouzounova, I. Rojas, and P. Bork. Extraction of regulatory gene/protein networks from MEDLINE. Bioinformatics, 22(6):645–650, 2006.
Article Google Scholar
Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou. How to make the most of NE dictionaries in statistical NER. BMC Bioinformatics, 9(Suppl 11):S5, 2008.
Google Scholar
J. Seki, K. Mostafa. Discovering implicit associations between genes and hereditary diseases. In Pacific Symposium on Biocomputing, pages 316–327, 2007.
Google Scholar
B. Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 104–107, 2004.
Google Scholar
B. Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(4):3191–3192, 2005.
Article Google Scholar
H. Shatkay, F. Pan, A. Rzhetsky, and W. Wilbur. Multidimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 24(18):2086–2093, 2008.
Article Google Scholar
H. Shatkay, J. W. Wilbur, and A. Rzhetsky. Annotation guidelines, 2005. http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/AnnotationGuidelines.pdf.
Google Scholar
D. Shen, J. Zhang, G. Zhou, J. Su, and C.-L. Tan. Effective adaptation of a hidden markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 49–56, 2003.
Google Scholar
Z. Shi, G. Melli, Y. Wang, Y. Liu, B. Gu, M. Kashani, A. Sarkar, and F. Popowich. Question answering summarization of multiple biomedical documents. In Z. Kobti and D. Wu, editors, Advances in Artificial Intelligence, volume 4509 of Lecture Notes in Computer Science, pages 284–295. Springer Berlin / Heidelberg, 2007.
Google Scholar
M. S. Simpson, D. Demner-Fushman, and G. R. Thoma. Evaluating the importance of image-related text for ad-hoc and case-based biomedical article retrieval. In AMIA Annual Symposium Proceedings, pages 752–756, 2010.
Google Scholar
N. Smalheiser. The Arrowsmith project: 2005 status report. In A. Hoffmann, H. Motoda, and T. Scheffer, editors, Discovery Science, volume 3735 of Lecture Notes in Computer Science, pages 26–43. Springer Berlin / Heidelberg, 2005.
Google Scholar
N. Smalheiser, V. Torvik, A. Bischoff-Grethe, L. Burhans, M. Gabriel, R. Homayouni, A. Kashef, M. Martone, G. Perkins, D. Price, A. Talk, and R. West. Collaborative development of the arrowsmith two node search interface designed for laboratory investigators. Journal of Biomedical Discovery and Collaboration, 1(1):8, 2006.
Google Scholar
N. Smalheiser, W. Zhou, and V. Torvik. Anne O’Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results. Journal of Biomedical Discovery and Collaboration, 3(1):2, 2008.
Google Scholar
N. R. Smalheiser and D. R. Swanson. Using Arrowsmith: A computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3):149–153, 1998.
Article Google Scholar
N. R. Smalheiser, V. I. Torvik, andW. Zhou. Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Computer Methods and Programs in Biomedicine, 94(2):190–197, 2009.
Google Scholar
L. Smith, L. Tanabe, R. Johnson nee Ando, C.-J. Kuo, I.-F. Chung, C.-N. Hsu, Y.-S. Lin, R. Klinger, C. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C. Struble, R. Povinelli, A. Vlachos, W. Baumgartner, L. Hunter, B. Carpenter, R. Tzong-Han Tsai, H.-J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. Mana-Lopez, J. Mata, and W. Wilbur. Overview of BioCreAtIve II: Gene mention recognition. Genome Biology, 9(Suppl 2):S2, 2008.
Google Scholar
M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang. SNOWMED clinical terms: Overview of the development process and project status. In Proceedings of the AMIA Symposium, pages 662–666, 2001.
Google Scholar
D. R. Swanson. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):7–18, 1986.
Google Scholar
D. R. Swanson. Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4):526–557, 1988.
Google Scholar
D. R. Swanson. Somatomedin C and arginine: Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2):157–186, 1990.
Google Scholar
D. R. Swanson. Complementary structures in disjoint science literatures. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 280–289, 1991.
Google Scholar
D. R. Swanson and N. R. Smalheiser. An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2):183–203, 1997.
Article MATH Google Scholar
D. R. Swanson, N. R. Smalheiser, and A. Bookstein. Information discovery from complementary literatures: Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10):797–812, 2001.
Article Google Scholar
K. Takahashi, A. Koike, and T. Takagi. Question answering system in biomedical domain. In Proceedings of the 15th International Conference on Genome Informatics, pages 161–162, 2004.
Google Scholar
K. Takeuchi and N. Collier. Bio-medical entity extraction using support vector machines. Artificial Intelligence in Medicine, 33(2):125–137, 2005.
Article Google Scholar
R. M. Terol, P. Martínez-Barco, and M. Palomar. A knowledge based method for the medical question answering problem. Computers in Biology and Medicine, 37(10):1511–1521, 2007.
Article Google Scholar
P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou. Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics, 10(1):349, 2009.
Google Scholar
V. I. Torvik and N. R. Smalheiser. A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics, 23(13):1658–1665, 2007.
Article Google Scholar
TREC-9 filtering track collections. http://trec.nist.gov/data/t9_filtering.html.
Google Scholar
TREC genomics track data. http://ir.ohsu.edu/genomics/data.html.
Google Scholar
R. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu. BIOSMILE: A semantic role labeling system for biomedical berbs using a maximumentropy model with automatically generated template features. BMC Bioinformatics, 8(1):325, 2007.
Google Scholar
Y. Tsuruoka, M. Miwa, K. Hamamoto, J. Tsujii, and S. Ananiadou. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics, 27(13):i111–i119, 2011.
Article Google Scholar
Y. Tsuruoka and J. Tsujii. Boosting precision and recall of dictionary-based protein name recognition. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 41–48, 2003.
Google Scholar
Y. Tsuruoka and J. Tsujii. Probabilistic term variant generator for biomedical terms. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 167–173, 2003.
Google Scholar
Y. Tsuruoka, J. Tsujii, and S. Ananiadou. FACTA: A text search engine for finding associated biomedical concepts. Bioinformatics, 24(21):2559–2560, 2008.
Article Google Scholar
O. Tuason, L. Chen, L. H., and C. Friedman. Biological nomenclatures: A source of lexical knowledge and ambiguity. In Pacific Symposium on Biocomputing, pages 238–249, 2004.
Google Scholar
H. Turtle and W. B. Croft. Evaluation of an inference networkbased retrieval model. ACM Transactions on Information Systems, 9:187–222, 1991.
Article Google Scholar
Orange book: Approved drug products with therapeutic equivalence evaluations. http://www.accessdata.fda.gov/scripts/cder/ob/default.cfm.
Google Scholar
Databases, resources & APIs. http://wwwcf2.nlm.nih.gov/nlm_eresources/eresources/search_database.cfm.
Google Scholar
University of Pittsburgh NLP repository. http://www.dbmi.pitt.edu/nlpfront.
Google Scholar
Y. Usami, H.-C. Cho, N. Okazaki, and J. Tsujii. Automatic acquisition of huge training data for bio-medical named entity recognition. In Proceedings of BioNLP 2011 Workshop, pages 65–73, 2011.
Google Scholar
O. Uzuner. Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16(5):561–570, 2009.
Article Google Scholar
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane. Identifyingn patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 15(1):14–24, 2008.
Article Google Scholar
O. Uzuner, I. Solti, and E. Cadag. Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5):514–518, 2010.
Article Google Scholar
O. Uzuner, B. R. South, S. Shen, and S. L. DuVall. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552–556, 2011.
Article Google Scholar
V. Vincze, G. Szarvas, R. Farkas, G. Mora, and J. Csirik. The Bio-Scope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.
Google Scholar
A. Vlachos and C. Gasperin. Bootstrapping and evaluating named entity recognition in the biomedical domain. In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 138–145, 2006.
Google Scholar
T. Wattarujeekrit, P. Shah, and N. Collier. PASBio: Predicateargument structures for event extraction in molecular biology. BMC Bioinformatics, 5(1):155, 2004.
Google Scholar
M. Weeber, H. Klein, L. T. W. de Jong-van den Berg, and R. Vos. Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7):548–557, 2001.
Google Scholar
W. Weiming, D. Hu, M. Feng, and L. Wenyin. Automatic clinical question answering based on UMLS relations. In Third International Conference on Semantics, Knowledge and Grid, pages 495–498, 2007.
Google Scholar
J. W. Wilbur, A. Rzhetsky, and H. Shatkay. New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7:356, 2006.
Article Google Scholar
G. Williams, P. Davis, A. Rogers, T. Bieri, P. Ozersky, and J. Spieth. Methods and strategies for gene structure curation in wormbase. Database, 2011.
Google Scholar
K. Yamamoto, T. Kudo, A. Konagaya, and Y. Matsumoto. Protein name tagging for biomedical annotation in text. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 65–72, 2003.
Google Scholar
J. Yang, A. M. Cohen, and W. Hersh. Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. In AMIA Annual Symposium Proceedings, pages 831–835, 2007.
Google Scholar
A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. BioCreAtIvE task 1A: Gene mention finding evaluation. BMC Bioinformatics, 6(Suppl 1):S2, 2005.
Google Scholar
M. Yetisgen-Yildiz and W. Pratt. Using statistical and knowledgebased approaches for literature-based discovery. Journal of Biomedical Informatics, 39(6):600–611, 2006.
Article Google Scholar
M. Yetisgen-Yildiz and W. Pratt. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics, 42(4):633–643, 2009.
Article Google Scholar
I. Yoo, X. Hu, and I.-Y. Song. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinformatics, 8(Suppl 9):S4, 2007.
Google Scholar
H. Yu, S. Agarwal, M. Johnston, and A. Cohen. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration, 4(1):1, 2009.
Google Scholar
H. Yu and Y.-G. Cao. Automatically extracting information needs from ad hoc clinical questions. In AMIA Annual Symposium Proceedings, pages 96–100, 2008.
Google Scholar
H. Yu and M. Lee. Accessing bioscience images from abstract sentences. Bioinformatics, 22(14):e547–e556, 2006.
Article Google Scholar
H. Yu, M. Lee, D. Kaufman, J. Ely, J. A. Osheroff, G. Hripcsak, and J. Cimino. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. Journal of Biomedical Informatics, 40(3):236–251, 2007.
Article Google Scholar
H. Yu and C. Sable. Being Erlang Shen: Identifying answerable questions. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence on Knowledge and Reasonin for Answering Questions, pages 6–14, 2005.
Google Scholar
H. Yu, C. Sable, and H. Zhu. Classifying medical questions based on an evidence taxonomy. In Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, 2005.
Google Scholar
G. Zhou, D. Shen, J. Zhang, J. Su, and S. Tan. Recognition of protein/ gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(Suppl 1):S7, 2005.
Google Scholar
P. Zweigenbaum and D. Demner-Fushman. Advanced literaturemining tools. In D. Edwards, J. Stajich, and D. Hansen, editors, Bioinformatics: Tools and Applications, pages 347–380. Springer, 2009.
Google Scholar
P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen. Frontiers of biomedical text mining: Current progress. Briefings in Bioinformatics, 8(5):358–375, 2007.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Lister Hill National Center for Biomedical Communications United States National Library of Medicine, National Institutes of Health, Bethesda, USA
Matthew S. Simpson & Dina Demner-Fushman

Authors

Matthew S. Simpson
View author publications
You can also search for this author in PubMed Google Scholar
Dina Demner-Fushman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew S. Simpson .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
Charu C. Aggarwal
at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA
ChengXiang Zhai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Simpson, M.S., Demner-Fushman, D. (2012). Biomedical Text Mining: A Survey of Recent Progress. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_14

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3223-4_14
Published: 07 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics