Abstract
The biomedical community makes extensive use of text mining technology. In the past several years, enormous progress has been made in developing tools and methods, and the community has been witness to some exciting developments. Although the state of the community is regularly reviewed, the sheer volume of work related to biomedical text mining and the rapid pace in which progress continues to be made make this a worthwhile, if not necessary, endeavor. This chapter provides a brief overview of the current state of text mining in the biomedical domain. Emphasis is placed on the resources and tools available to biomedical researchers and practitioners, as well as the major text mining tasks of interest to the community. These tasks include the recognition of explicit facts from biomedical literature, the discovery of previously unknown or implicit facts, document summarization, and question answering. For each topic, its basic challenges and methods are outlined and recent and influential work is reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. B. Abacha and P. Zweigenbaum. A hybrid approach for the extraction of semantic relations from MEDLINE abstracts. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 6609 of Lecture Notes in Computer Science, pages 139–150. Springer Berlin / Heidelberg, 2011.
A. B. Abacha and P. Zweigenbaum. Medical entity recognition: A comparison of semantic and statistical methods. In Proceedings of BioNLP 2011 Workshop, pages 56–64, 2011.
S. Afantenos, V. Karkaletsis, and P. Stamatopoulos. Summarization from medical documents: A survey. Artificial Intelligence in Medicine, 33(2):157–177, 2005.
S. Agarwal and H. Yu. Automatically classifying sentences in fulltext biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23):3174–3180, 2009.
S. Agarwal and H. Yu. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, pages 6–10, 2009.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. American Association for Artificial Intelligence, 1996.
A. Airola, S. Pyysalo, J. Bjorne, T. Pahikkala, F. Ginter, and T. Salakoski. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics, 9(Suppl 11):S2, 2008.
B. Alex, B. Haddow, and C. Grover. Recognising nested named entities in biomedical text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pages 65–72, 2007.
R. B. Altman, C. M. Bergman, J. Blake, C. Blaschke, A. Cohen, F. Gannon, L. Grivell, U. Hahn, W. Hersh, L. Hirschman, L. J. Jensen, M. Krallinger, B. Mons, S. I. O’Donoghue, M. C. Peitsch, D. Rebholz-Schuhmann, H. Shatkay, and A. Valencia. Text mining for biology - the way forward: opinions from leading scientists. Genome Biology, 9(Suppl 2):S7, 2008.
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410, 1990.
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, 1997.
S. Ananiadou and J. Mcnaught. Text Mining for Biology And Biomedicine. Artech House, Inc., 2005.
S. Ananiadou, S. Pyysalo, J. Tsujii, and D. B. Kell. Event extraction for systems biology by text mining the literature. Trends in Biotechnology, 28(7):381–390, 2010.
A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236, 2010.
R. Artstein and M. Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596, 2008.
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cheryy, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1):25–29, 2000.
S. J. Athenikos and H. Han. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine, 99(1):1–24, 2010.
B. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, and J. H. Holmes. Identifying potential adverse effects using the web: A new approach to medical hypothesis generation. In Press, 2011.
BioNLP. http://www.bionlp.org/.
J. Björne, F. Ginter, S. Pyysalo, J. Tsujii, and T. Salakoski. Complex event extraction at PubMed scale. Bioinformatics, 26(12):i382–i390, 2010.
J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski. Extracting complex biological events with rich graphbased feature sets. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 10–18, 2009.
K. W. Boyack, D. Newman, R. J. Duhon, R. Klavans, M. Patek, J. R. Biberstine, B. Schijvenaars, A. Skupin, N. Ma, and K. Borner. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3):e18029, 2011.
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H.-P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 9(1):207, 2008.
E. Buyko, E. Faessler, J. Wermter, and U. Hahn. Event extraction from trimmed dependency graphs. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 19–27, 2009.
Y. Cai and X. Cheng. Biomedical named entity recognition with tri-training learning. In Proceedings of the 2009 2nd International Conference on Biomedical Engineering and Informatics, pages 1–5, 2009.
CALBC challenge. http://www.calbc.eu/.
Y. Cao, F. Liu, P. Simpson, L. Antieau, A. Bennett, J. J. Cimino, J. Ely, and H. Yu. AskHERMES: An online question answering system for complex clinical questions. Journal of Biomedical Informatics, 44(2):277–288, 2011.
D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang. Protemot: Prediction of protein binding sites with automatically extracted geometrical templates. Nucleic Acids Research, 34(suppl 2):W303–W309, 2006.
W. W. Chapman and K. B. Cohen. Current issues in biomedical text mining and natural language processing. Journal of Biomedical Informatics, 42(5):757–759, 2009.
E. S. Chen, G. Hripcsak, H. Xu, M. Markatou, and C. Friedman. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study. Journal of the American Medical Informatics Association, 15(1):87–98, 2008.
H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In Pacific Symposium on Biocomputing, pages 4–15, 2006.
A. M. Cohen andW. R. Hersh. A survey of current work in biomedical text mining. Briefings in Bioinformatics, 6(1):57–71, 2005.
K. B. Cohen and L. Hunter. Getting started in text mining. PLoS Computational Biology, 4(1):e20, 2008.
K. B. Cohen, K. Verspoor, H. L. Johnson, C. Roeder, P. V. Ogren, W. A. Baumgartner, Jr., E. White, H. Tipney, and L. Hunter. High-precision biological event extraction with a concept recognizer. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 50–58, 2009.
T. Cohen, G. K. Whitfield, R. W. Schvaneveldt, K. Mukund, and T. Rindflesch. EpiphaNet: An interactive tool to support biomedical discoveries. Journal of Biomedical Discovery and Collaboration, 5:21–49, 2010.
N. Collier, C. Nobata, and J.-i. Tsujii. Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1, pages 201–207, 2000.
P. Corbett and A. Copestake. Cascaded classifiers for confidencebased chemical named entity recognition. BMC Bioinformatics, 9(Suppl 11):S4, 2008.
CRAFT: The colorado richly annotated full text corpus. http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Daml janovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. Text Processing with GATE (Version 6). GATE, 2011.
T. Delbecque, P. Jacquemart, and P. Zweigenbaum. Indexing UMLS semantic types for medical question-answering. In R. Engelbrecht, A. Geissbuhler, C. Lovis, and G. Mihalas, editors, Connecting Medical Informatics and Bio-Informatics: Proceedings of MIE2005 - The XIXth International Congress of the European Federation for Medical Informatics, pages 805–810. IOS Press, 2005.
D. Demner-Fushman, W. W. Chapman, and C. J. McDonald. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5):760–772, 2009.
D. Demner-Fushman, B. Few, S. E. Hauser, and G. Thoma. Automatically identifying health outcome information in MEDLINE records. Journal of the American Medical Informatics Association, 13(1):52–60, 2006.
D. Demner-Fushman and J. Lin. Knowledge exraction for clinical question answering: Preliminary results. In Proceedings of the AAAI 2005 Workshop on Question Ansering in Restricted Domains, 2005.
D. Demner-Fushman and J. Lin. Answer extraction, semantic clustering, and extractive summarization for clinical question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 841–848, 2006.
D. Demner-Fushman and J. Lin. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63–103, 2007.
D. Demner-Fushman, C. Seckman, C. Fisher, S. E. Hauser, J. Clayton, and G. R.1. Thoma. A prototype system to support evidencebased practice. In AMIA Annual Symposium Proceedings, pages 151–155, 2008.
S. Dipper, M. Götze, and M. Stede. Simple annotation tools for complex annotation tasks: An evaluation. In Proceedings of the LREC Workshop on XML-Based Richly Annotated Corpora, pages 54–62, 2004.
eHOST: The extensible human oracle suite of tools. http://code.google.com/p/ehost/.
N. Elhadad, M.-Y. Kan, J. L. Klavans, and K. R. McKeown. Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2):179–198, 2005.
J. W. Ely, J. A. Osheroff, M. H. Ebell, M. L. Chambliss, D. C. Vinson, J. J. Stevermer, and E. A. Pifer. Obstacles to answering doctors’ questions about patient care with evidence: qualitative study. British Medical Journal, 324(7339):710, 2002.
Electronic medical records and genomics. https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page.
European bioinformatics institute. http://www.ebi.ac.uk/.
D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, and C. Welty. Building Watson: An overview of the DeepQA project. AI Magazine, 31(3):59–79, 2010.
D. Ferrucci and A. Lally. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327–348, 2004.
J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair. Exploiting context for biomedical entity recognition: From syntax to the web. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 88–91, 2004.
M. Fiszman, D. Demner-Fushman, H. Kilicoglu, and T. C. Rindflesch. Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Journal of Biomedical Informatics, 42(5):801–813, 2009.
K. Franzén, G. Eriksson, F. Olsson, L. Asker, P. Lidén, and J. Cöster. Protein names and how to find them. International Journal of Medical Informatics, 67(1-3):49–61, 2002.
C. Friedman, G. Hripcsak, L. Shagina, and H. Liu. Arepresenting information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association, 6:76–87, 1999.
K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi. Toward information extraction: Identifying protein names from biological papers. In Pacific Symposium on Biocomputing, pages 707–718, 1998.
K. Fundel, R. Küffner, and R. Zimmer. RelEx—relation extraction using dependency parse trees. Bioinformatics, 23(3):365–371, 2007.
R. Gaizauskas, G. Demetriou, P. J. Artymiuk, and P. Willett. Protein structures and information extraction from biological texts: The PASTA system. Bioinformatics, 19(1):135–143, 2003.
B. Gu. Recognizing nested named entities in GENIA corpus. In Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis, pages 112–113, 2006.
J. Hakenberg, S. Bickel, C. Plake, U. Brefeld, H. Zahn, L. Faulstich, U. Leser, and T. Scheffer. Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(Suppl 1):S9, 2005.
J. Hakenberg, C. Plake, and U. Leser. LLL’05 challenge: Genic interaction extraction - identification of language patterns based on alignment and finite state automata. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 38–45, 2005.
W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Health Informatics. Springer, third edition, 2005.
HighWire press. http://highwire.org/.
L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh. Overview of BioCreAtIvE task 1B: Normalized gene lists. BMC Bioinformatics, 6(Suppl 1):S11, 2005.
L. Hirschman, A. A. Morgan, and A. S. Yeh. Rutabaga by any other name: Extracting biological names. Journal of Biomedical Informatics, 35(4):247–259, 2002.
L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia. Overview of BioCreAtIvE: Critical assessment of information extraction for biology. BMC Bioinformatics, 6(Suppl 1):S1, 2005.
W.-J. Hou and H.-H. Chen. Enhancing performance of protein name recognizers using collocation. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 25–32, 2003.
D. Hristovski, C. Friedman, T. C. Rindflesch, and B. Peterlin. Exploiting semantic relations for literature-based discovery. In AMIA Anual Symposium Proceedings, pages 349–353, 2006.
D. Hristovski, B. Peterlin, S. Džeroski, and J. Stare. Literaturebased discovery support system and its application to disease gene identification. In S. Džeroski and L. Todorovski, editors, Computational Discovery of Scientific Knowledge, volume 4660 of Lecture Notes in Computer Science, pages 307–326. Springer Berlin / Heidelberg, 2007.
D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Improving literature-based discovery support by genetic knowledge integration. Studies in Health Technogy and Informatics, 95:68–73, 2003.
D. Hristovski, B. Peterlin, J. A. Mitchell, and S. M. Humphrey. Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2-4):289–298, 2005.
D. Hristovski, J. Stare, B. Peterlin, and S. Džeroski. Supporting discovery in medicine by association rule mining in MEDLINE and UMLS. In V. L. Patel, R. Rogers, and R. Haux, editors, Proceedings of the 10th World Congress on Medical Informatics, volume 84/2001 of Studies in Health Technology and Informatics, pages 1344–1348. IOS Press, 2001.
X. Hu, X. Zhang, I. Yoo, X. Wang, and J. Feng. Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule. International Journal of Intelligent Systems, 25(2):207–223, 2010.
X. Huang, J. Lin, and D. Demner-Fushman. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA Annual Symposium Proceedings, pages 359–363, 2006.
K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science yournal articles: Enzyme interactions and protein structures. In Pacific Symposium on Biocomputing, pages 502–513, 2000.
L. Hunter, Z. Lu, J. Firby, W. Baumgartner, H. Johnson, P. Ogren, and K. B. Cohen. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-typespecific gene expression. BMC Bioinformatics, 9(1):78, 2008.
Informatics for integrating biology and the bedside. https://www.i2b2.org/resrcs/hive.html.
P. Jacqumart and P. Zweigenbaum. Towards a medical questionanswering system: A feasibility study. Studies in Health Technology and Informatics, 95:463–468, 2003.
R. Jelier, G. Jenster, L. Dorssers, B. Wouters, P. Hendriksen, B. Mons, R. Delwel, and J. Kors. Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC Bioinformatics, 8(1):14, 2007.
R. Kabiljo, A. B. Clegg, and A. J. Shepherd. A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics, 10:233, 2008.
J. Kalpathy-Cramer, H. Müler, S. Bedrick, I. Eggel, A. de Herrera, and T. Tsikrika. The CLEF 2011 medical image retrieval and classification tasks. In CLEF 2011 Working Notes, 2011.
H. Karsten and H. Suominen. Mining of clinical and biomedical text and data. International Journal of Medical Informatics, 78(12):786–787, 2009.
J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. Tuning support vector machines for biomedical named entity recognition. In Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, pages 1–8, 2002.
H. Kilicoglu and S. Bergler. Syntactic dependency based heuristics for biological event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 119–127, 2009.
J.-D. Kim, T. Ohta, N. Nguyen, S. Pyysalo, R. Bossy, and J. Tsujii. Overview of BioNLP shared task 2011. In Proceedings of the BioNLP Shared Task 2011 Workshop, pages 1–6, 2011.
J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 1–9, 2009.
J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics, 19(Suppl 1):i180–i182, 2003.
J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier. Introduction to the bio-entity recognition task at JNLPBA. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 70–75, 2004.
S. Kim, J. Yoon, and J. Yang. Kernel approaches for genic interaction extraction. Bioinformatics, 24(1):118–126, 2008. [93] S. Kinoshita, K. B. Cohen, P. Ogren, and L. Hunter. BioCreAtIvE task 1A: Entity identification with a stochastic tagger. BMC Bioinformatics, 6(Suppl 1):S4, 2005.
J. Kontos, J. Lekakis, I. Malagardi, and J. Peros. Grammars for question answering systems based on intelligent text mining in biomedicine. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005. [95] J. Kontos, I. Malagardi, and J. Peros. Question answering and rhetoric analysis of biomedical texts in the AROMA system. In Proceedings of the 7th Hellenic Europeoan Conference on Computer Mathematics and its Applications, 2005.
M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia. Overview of the protein-protein interaction annotation extraction task of BioCreAtIve II. Genome Biology, 9(Suppl 2):S4, 2008.
M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia. Evaluation of textmining systems for biology: Overview of the second BioCreAtIvE community challenge. Genome Biology, 9(Suppl 2):S1, 2008.
M. Krallinger, A. Valencia, and L. Hirschman. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome biology, 9(Suppl 2):S8, 2008.
M. Krauthammer and G. Nenadic. Term identification in the biomedical literature. Journal of Biomedical Informatics, 37(6):512–526, 2004.
M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman. Using BLAST for identifying gene and protein names in journal articles. Gene, 259(1-2):245–252, 2000.
R. Leaman and G. Gonzalez. BANNER: An executable survey of advances in biomedical named entity recognition. In Pacific Symposium on Biocomputing, pages 652–663, 2008.
L. C. Lee, F. Horn, and F. E. Cohen. Automatic extraction of protein point mutations using a graph bigram association. PLoS Computational Biology, 3(2):e16, 2007.
G. Leech. Adding linguistic annotation. In M. Wynne, editor, Developing Linguistic Corpora: A Guide to Good Practice, pages 17–29. Oxbow Books, 2005.
U. Leser and J. Hakenberg. What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics, 6(4):357–369, 2005.
M. Liberman, M. Mandel, and GlaxoSmithKline Pharmaceuticals R&D. PennBioIE CYP 1.0, 2008.
M. Liberman, M. Mandel, and P. White. PennBioIE Oncology 1.0, 2008.
C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.
C.-Y. Lin, G. Cao, J. Gao, and J.-Y. Nie. An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 463–470, 2006.
J. Lin and D. Demner-Fushman. The role of knowledge in conceptual retrieval: A study in the domain of clinical medicine. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99–106, 2006.
R. T. K. Lin, J. Liang-Te Chiu, H.-J. Dai, M.-Y. Day, R. T.-H. Tsai, and W.-L. Hsu. Biological question answering with syntactic and semantic feature matching and an improved mean reciprocal ranking measurement. In Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration, pages 184–189, 2008.
D. A. Lindberg, B. L. Humphreys, and A. T. McCray. The unified medical language system. Methods of Information in Medicine, 32(4):281–291, 1993.
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai, and B. Schatz. Generating gene summaries from biomedical literature: A study of semi-structured summarization. Information Processing & Management, 43(6):1777–1791, 2007.
Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PheneGo: Assigning phenotypic context to gene ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 64–75, 2006.
Y. Lussier, T. Borlawsky, D. Rappaport, Y. Liu, and C. Friedman. PhenoGo: Assigning phenotypic context to Gene Ontology annotations with natural language processing. In Pacific Symposium on Biocomputing, pages 64–75, 2006.
D. Maynard. D1.2.2.1.3 benchmarking of annotation tools, 2007. http://knowledgeweb.semanticweb.org/semanticportal/deliverables/D1.2.2.1.3.pdf.
K. R. McKeown, S.-F. Chang, J. Cimino, S. K. Feiner, C. Friedman, L. Gravano, V. Hatzivassiloglou, S. Johnson, D. A. Jordan, J. L. Klavans, A. Kushniruk, V. Patel, and S. Teufel. PERSIVAL, a system for personalized search and summarization over multimedia healthcare information. In Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pages 331–340, 2001.
S. Mika and B. Rost. Protein names precisely peeled off free text. Bioinformatics, 20(suppl 1):i241–i247, 2004.
T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi. Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics, 6(Suppl 1):S8, 2005.
M. Miwa, R. Satre, and J.-D. Kim. Event extraction with complex event classification using rich features. Journal of Bioinformatics and Computational Biology, 8(1):131–146, 2010.
M. Miwa, R. Satre, Y. Miyao, and J. Tsujii. Protein-protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, 78(12):e39–e46, 2009.
Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, and J. Tsujii. Semantic retrieval for the accurate identification of relational concepts in massive textbases. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 1017–1024, 2006.
Y. Miyao, K. Sagae, R. Satre, T. Matsuzaki, and J. Tsujii. Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics, 25(3):394–400, 2009.
L. P. Morales, A. D. Esteban, and P. Gervás. Concept-graph based biomedical automatic summarization using ontologies. In Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, pages 53–56, 2008.
A. Morgan, L. Hirschman, A. Yeh, and M. Colosimo. Gene name extraction using FlyBase resources. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine -Volume 13, pages 1–8, 2003.
A. A. Morgan, L. Hirschman, M. Colosimo, A. S. Yeh, and J. B. Colombe. Gene name identification and normalization using a model organism database. Journal of Biomedical Informatics, 37(6):396–410, 2004.
A. A. Morgan, Z. Lu, X. Want, A. M. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H.-h. Liu, R. Torres, M. Krauthammer, W. W. Lau, H. Liu, C.-N. Hsu, M. Scheumie, K. B. Cohen, and L. Hirschman. Overview of BioCre-AtIvE II: Gene normalization. Genome Biology, 9(Suppl 2):S3, 2008.
H. Müller, J. Kalpathy-Cramer, I. Eggel, S. Bedrick, C. E. Charles E. Kahn, Jr., and W. Hersh. Overview of the clef 2010 medical image retrieval track. In Working Notes of CLEF 2010, 2010.
M. Narayanaswamy, K. E. Ravikumar, and K. Vijay-Shanker. A biological named entity recognizer. In Pacific Symposium on Biocomputing, pages 427–438, 2003.
National center for biomedical ontology. http://www.bioontology.org/.
NCBO BioPortal. http://bioportal.bioontology.org/.
National Center for Biotechnology Information. Entrez Programming Utilities Help, 2010. http://www.ncbi.nlm.nih.gov/books/NBK25501/.
National centre for text mining. http://www.nactem.ac.uk/.
C. Nédellec. Learning language in logic - genic interaction extraction challenge. In In Proceedings of the ICML 2005 Workshop on Learning Language in Logic, pages 31–37, 2005.
Neuroscience information framework. http://neuinfo.org/.
Y. Niu and G. Hirst. Analysis and semantic classes in medical text for question answering. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2004.
Y. Niu, G. Hirst, G. McArthur, and R.-G. P. Answering clinical questions with role identification. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pages 73–80, 2003.
Y. Niu, X. Zhu, and G. Hirst. Using outcome polarity in sentence extraction for medical question-answering. In AMIA Anual Symposium Proceedings, pages 599–603, 2006.
Y. Niu, X. Zhu, J. Li, and G. Hirst. Analysis of polarity information in medical text. In AMIA Anual Symposium Proceedings, pages 570–574, 2005.
C. Nobata, N. Collier, and J.-i. Tsujii. Automatic term identification and classification in biology texts. In Proceedings of the Natural Language Pacific Rim Symposium, pages 369–374, 1999.
P. V. Ogren. Knowtator: A protégé plug-in for annotated corpus construction. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 273–275, 2006.
D. Okanohara, Y. Miyao, Y. Tsuruoka, and J. Tsujii. Improving the scalability of semi-Markov conditional random fields for named entity recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 465–472, 2006.
F. Olsson, G. Eriksson, K. Franzén, L. Asker, and P. Lidén. Notions of correctness when evaluating protein name taggers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, pages 1–7, 2002.
Open biological and biomedical ontologies. http://www.obofoundry.org/.
ORBIT project. http://orbit.nlm.nih.gov/.
A. Özgür, T. Vu, G. Erkan, and D. R. Radev. Identifying genedisease associations using centrality on a literature mined geneinteraction network. Bioinformatics, 24(13):i277–i285, 2008.
A. Özgür, Z. Xiang, D. R. Radev, and Y. He. Literature-based discovery of IFN-γ and vaccine-mediated gene interaction networks. Journal of Biomedicine & Biotechnology, page 426479, 2010.
E. Pafilis, S. O’Donoghue, L. Jensen, H. Horn, M. Kuhn, N. Brown, and R. Schneider. Reflect - augmented browsing for the life scientist. Nature Biotechnology, 27:508–510, 2009.
S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 160–167, 2002.
M. Palakal, J. Bright, T. Sebastian, and S. Hartanto. A comparative study of cells in inflammation, EAE and MS using biomedical literature data mining. Journal of Biomedical Science, 14(1):67–85, 2007.
V. Petri, M. Shimoyama, G. Hayman, J. Smith, M. Tutaj, J. de Pons, M. Dwinell, D. Munzenmaier, S. Twigger, and H. Jacob. The rat genome database pathway portal. Database, 2011.
I. Petrič, U. Tanja, B. Cestnik, and M. Macedoni-Lukšič. Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics, 42(2):219–227, 2009.
Pharmacogenomics knowledge base. http://www.pharmgkb.org/.
H. Poon and L. Vanderwende. Joint inference for knowledge extraction from biomedical literature. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 813–821, 2010.
PubMed central open access subset. http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.
S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9(Suppl 3):S6, 2008.
S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski. BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(1):50, 2007.
L. A. Ramshaw and M. P. Marcus. Text chunking using transformation-based learning. In 3rd ACL SIGDAT Workshop on Very Large Corpora, pages 82–94, 1995.
L. H. Reeve, H. Han, and A. D. Brooks. The use of domainspecific concepts in biomedical text summarization. Information Processing & Management, 43(6):1765–1776, 2007.
W. S. Richardson, M. C. Wilson, J. Nishikawa, and R. S. Hayward. The well-built clinical question: A key to evidence-based decisions. ACP Journal Club, 123(3):A12–A13, 1995.
S. Riedel, H.-W. Chun, T. Takagi, and J. Tsujii. A Markov logic approach to bio-molecular event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pages 41–49, 2009.
S. Riedel and A. McCallum. Fast and robust joint models for biomedical event extraction. In Proceedings of the 2011 Conference on Emperical Methods in Natural Language Processing, pages 1–12, 2011.
F. Rinaldi, J. Dowdall, G. Schneider, and A. Persidis. Answering questions in the genomics domain. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2005.
F. Rinaldi, K. Kaljurand, and R. Saetre. Terminological resources for text mining over biomedical scientific literature. Artificial Intelligence in Medicine, 52(2):107–114, 2011.
F. Rinaldi, G. Schneider, K. Kaljurand, M. Hess, C. Andronis, O. Konstandi, and A. Persidis. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artificial Intelligence in Medicine, 39(2):127–136, 2007.
T. C. Rindflesch and M. Fiszman. The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6):462–477, 2003.
T. C. Rindflesch, H. Kilicoglu, M. Fiszman, G. Rosemblat, and D. Shin. Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, 31:15–21, 2011.
B. Rink, S. Harabagiu, and K. Roberts. Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association, 18(5):594–600, 2011.
A. Roberts, R. Gaizauskas, andM. Hepple. Extracting clinical relationships from patient narratives. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 10–18, 2008.
P. Ruch, C. Boyer, C. Chichester, I. Tbahriti, A. Geissbühler, P. Fabry, J. Gobeill, V. Pillet, D. Rebholz-Schuhmann, C. Lovis, and A.-L. Veuthey. Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics, 76(2-3):195–200, 2007.
D. L. Sackett, W. M. C. Rosenberg, J. A. M. Gray, and R. B. Haynes. Evidence based medicine: What it is and what it isn’t. British Medical Journal, 312(7023):71–72, 1996.
M. Saeed, M. Villarroel, A. Reisner, G. Clifford, L. Lehman, G. Moody, T. Heldt, T. Kyaw, B. Moody, and R. Mark. Multiparameter intelligent monitoring in intensive care II (MIMICII): A public-access intensive care unit database. Crit Care Med, 39(5):952–960, 2011.
J. Šarić, L. J. Jensen, R. Ouzounova, I. Rojas, and P. Bork. Extraction of regulatory gene/protein networks from MEDLINE. Bioinformatics, 22(6):645–650, 2006.
Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou. How to make the most of NE dictionaries in statistical NER. BMC Bioinformatics, 9(Suppl 11):S5, 2008.
J. Seki, K. Mostafa. Discovering implicit associations between genes and hereditary diseases. In Pacific Symposium on Biocomputing, pages 316–327, 2007.
B. Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pages 104–107, 2004.
B. Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(4):3191–3192, 2005.
H. Shatkay, F. Pan, A. Rzhetsky, and W. Wilbur. Multidimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 24(18):2086–2093, 2008.
H. Shatkay, J. W. Wilbur, and A. Rzhetsky. Annotation guidelines, 2005. http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/AnnotationGuidelines.pdf.
D. Shen, J. Zhang, G. Zhou, J. Su, and C.-L. Tan. Effective adaptation of a hidden markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 49–56, 2003.
Z. Shi, G. Melli, Y. Wang, Y. Liu, B. Gu, M. Kashani, A. Sarkar, and F. Popowich. Question answering summarization of multiple biomedical documents. In Z. Kobti and D. Wu, editors, Advances in Artificial Intelligence, volume 4509 of Lecture Notes in Computer Science, pages 284–295. Springer Berlin / Heidelberg, 2007.
M. S. Simpson, D. Demner-Fushman, and G. R. Thoma. Evaluating the importance of image-related text for ad-hoc and case-based biomedical article retrieval. In AMIA Annual Symposium Proceedings, pages 752–756, 2010.
N. Smalheiser. The Arrowsmith project: 2005 status report. In A. Hoffmann, H. Motoda, and T. Scheffer, editors, Discovery Science, volume 3735 of Lecture Notes in Computer Science, pages 26–43. Springer Berlin / Heidelberg, 2005.
N. Smalheiser, V. Torvik, A. Bischoff-Grethe, L. Burhans, M. Gabriel, R. Homayouni, A. Kashef, M. Martone, G. Perkins, D. Price, A. Talk, and R. West. Collaborative development of the arrowsmith two node search interface designed for laboratory investigators. Journal of Biomedical Discovery and Collaboration, 1(1):8, 2006.
N. Smalheiser, W. Zhou, and V. Torvik. Anne O’Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results. Journal of Biomedical Discovery and Collaboration, 3(1):2, 2008.
N. R. Smalheiser and D. R. Swanson. Using Arrowsmith: A computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3):149–153, 1998.
N. R. Smalheiser, V. I. Torvik, andW. Zhou. Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Computer Methods and Programs in Biomedicine, 94(2):190–197, 2009.
L. Smith, L. Tanabe, R. Johnson nee Ando, C.-J. Kuo, I.-F. Chung, C.-N. Hsu, Y.-S. Lin, R. Klinger, C. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C. Struble, R. Povinelli, A. Vlachos, W. Baumgartner, L. Hunter, B. Carpenter, R. Tzong-Han Tsai, H.-J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. Mana-Lopez, J. Mata, and W. Wilbur. Overview of BioCreAtIve II: Gene mention recognition. Genome Biology, 9(Suppl 2):S2, 2008.
M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang. SNOWMED clinical terms: Overview of the development process and project status. In Proceedings of the AMIA Symposium, pages 662–666, 2001.
D. R. Swanson. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):7–18, 1986.
D. R. Swanson. Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4):526–557, 1988.
D. R. Swanson. Somatomedin C and arginine: Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2):157–186, 1990.
D. R. Swanson. Complementary structures in disjoint science literatures. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 280–289, 1991.
D. R. Swanson and N. R. Smalheiser. An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2):183–203, 1997.
D. R. Swanson, N. R. Smalheiser, and A. Bookstein. Information discovery from complementary literatures: Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10):797–812, 2001.
K. Takahashi, A. Koike, and T. Takagi. Question answering system in biomedical domain. In Proceedings of the 15th International Conference on Genome Informatics, pages 161–162, 2004.
K. Takeuchi and N. Collier. Bio-medical entity extraction using support vector machines. Artificial Intelligence in Medicine, 33(2):125–137, 2005.
R. M. Terol, P. Martínez-Barco, and M. Palomar. A knowledge based method for the medical question answering problem. Computers in Biology and Medicine, 37(10):1511–1521, 2007.
P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou. Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics, 10(1):349, 2009.
V. I. Torvik and N. R. Smalheiser. A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics, 23(13):1658–1665, 2007.
TREC-9 filtering track collections. http://trec.nist.gov/data/t9_filtering.html.
TREC genomics track data. http://ir.ohsu.edu/genomics/data.html.
R. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu. BIOSMILE: A semantic role labeling system for biomedical berbs using a maximumentropy model with automatically generated template features. BMC Bioinformatics, 8(1):325, 2007.
Y. Tsuruoka, M. Miwa, K. Hamamoto, J. Tsujii, and S. Ananiadou. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics, 27(13):i111–i119, 2011.
Y. Tsuruoka and J. Tsujii. Boosting precision and recall of dictionary-based protein name recognition. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 41–48, 2003.
Y. Tsuruoka and J. Tsujii. Probabilistic term variant generator for biomedical terms. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 167–173, 2003.
Y. Tsuruoka, J. Tsujii, and S. Ananiadou. FACTA: A text search engine for finding associated biomedical concepts. Bioinformatics, 24(21):2559–2560, 2008.
O. Tuason, L. Chen, L. H., and C. Friedman. Biological nomenclatures: A source of lexical knowledge and ambiguity. In Pacific Symposium on Biocomputing, pages 238–249, 2004.
H. Turtle and W. B. Croft. Evaluation of an inference networkbased retrieval model. ACM Transactions on Information Systems, 9:187–222, 1991.
Orange book: Approved drug products with therapeutic equivalence evaluations. http://www.accessdata.fda.gov/scripts/cder/ob/default.cfm.
Databases, resources & APIs. http://wwwcf2.nlm.nih.gov/nlm_eresources/eresources/search_database.cfm.
University of Pittsburgh NLP repository. http://www.dbmi.pitt.edu/nlpfront.
Y. Usami, H.-C. Cho, N. Okazaki, and J. Tsujii. Automatic acquisition of huge training data for bio-medical named entity recognition. In Proceedings of BioNLP 2011 Workshop, pages 65–73, 2011.
O. Uzuner. Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16(5):561–570, 2009.
O. Uzuner, I. Goldstein, Y. Luo, and I. Kohane. Identifyingn patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 15(1):14–24, 2008.
O. Uzuner, I. Solti, and E. Cadag. Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5):514–518, 2010.
O. Uzuner, B. R. South, S. Shen, and S. L. DuVall. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552–556, 2011.
V. Vincze, G. Szarvas, R. Farkas, G. Mora, and J. Csirik. The Bio-Scope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.
A. Vlachos and C. Gasperin. Bootstrapping and evaluating named entity recognition in the biomedical domain. In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 138–145, 2006.
T. Wattarujeekrit, P. Shah, and N. Collier. PASBio: Predicateargument structures for event extraction in molecular biology. BMC Bioinformatics, 5(1):155, 2004.
M. Weeber, H. Klein, L. T. W. de Jong-van den Berg, and R. Vos. Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7):548–557, 2001.
W. Weiming, D. Hu, M. Feng, and L. Wenyin. Automatic clinical question answering based on UMLS relations. In Third International Conference on Semantics, Knowledge and Grid, pages 495–498, 2007.
J. W. Wilbur, A. Rzhetsky, and H. Shatkay. New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7:356, 2006.
G. Williams, P. Davis, A. Rogers, T. Bieri, P. Ozersky, and J. Spieth. Methods and strategies for gene structure curation in wormbase. Database, 2011.
K. Yamamoto, T. Kudo, A. Konagaya, and Y. Matsumoto. Protein name tagging for biomedical annotation in text. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 65–72, 2003.
J. Yang, A. M. Cohen, and W. Hersh. Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. In AMIA Annual Symposium Proceedings, pages 831–835, 2007.
A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. BioCreAtIvE task 1A: Gene mention finding evaluation. BMC Bioinformatics, 6(Suppl 1):S2, 2005.
M. Yetisgen-Yildiz and W. Pratt. Using statistical and knowledgebased approaches for literature-based discovery. Journal of Biomedical Informatics, 39(6):600–611, 2006.
M. Yetisgen-Yildiz and W. Pratt. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics, 42(4):633–643, 2009.
I. Yoo, X. Hu, and I.-Y. Song. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinformatics, 8(Suppl 9):S4, 2007.
H. Yu, S. Agarwal, M. Johnston, and A. Cohen. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration, 4(1):1, 2009.
H. Yu and Y.-G. Cao. Automatically extracting information needs from ad hoc clinical questions. In AMIA Annual Symposium Proceedings, pages 96–100, 2008.
H. Yu and M. Lee. Accessing bioscience images from abstract sentences. Bioinformatics, 22(14):e547–e556, 2006.
H. Yu, M. Lee, D. Kaufman, J. Ely, J. A. Osheroff, G. Hripcsak, and J. Cimino. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. Journal of Biomedical Informatics, 40(3):236–251, 2007.
H. Yu and C. Sable. Being Erlang Shen: Identifying answerable questions. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence on Knowledge and Reasonin for Answering Questions, pages 6–14, 2005.
H. Yu, C. Sable, and H. Zhu. Classifying medical questions based on an evidence taxonomy. In Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, 2005.
G. Zhou, D. Shen, J. Zhang, J. Su, and S. Tan. Recognition of protein/ gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(Suppl 1):S7, 2005.
P. Zweigenbaum and D. Demner-Fushman. Advanced literaturemining tools. In D. Edwards, J. Stajich, and D. Hansen, editors, Bioinformatics: Tools and Applications, pages 347–380. Springer, 2009.
P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen. Frontiers of biomedical text mining: Current progress. Briefings in Bioinformatics, 8(5):358–375, 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Simpson, M.S., Demner-Fushman, D. (2012). Biomedical Text Mining: A Survey of Recent Progress. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_14
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3223-4_14
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)