Biomedical Literature Mining

Zhang, Chaolin; Zhang, Michael Q.

doi:10.1007/978-0-387-84870-9_10

Chaolin Zhang³ &
Michael Q. Zhang⁴

2908 Accesses
1 Citations

Abstract

A hurdle of large-scale genomic studies is to incorporate existing knowledge from published literature. This is accomplished by human experts but suffers from the heavy labor and the difficulty to keep knowledge up to date. Biomedical literature mining provides a potential solution to extracting and integrating useful information from literature automatically, which can lead to new discoveries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adar, E. (2004) SaRAD: a Simple and Robust Abbreviation Dictionary. Bioinformatics20(4), 527–533.
Article PubMed CAS Google Scholar
Aderem, A. (2005) Systems biology: its practice and challenges. Cell121(4), 511–3
Article PubMed CAS Google Scholar
Ashburner, M., Ball, C.A., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet25(1), 25–29
Article PubMed CAS Google Scholar
Bader, G.D., Donaldson, I., et al. (2001) BIND-The Biomolecular Interaction Network Database. Nucl. Acids. Res.29(1), 242–245
Article PubMed CAS Google Scholar
Becker, K., Hosack, D., et al. (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics4(1), 61.
Article PubMed Google Scholar
Boeckmann, B., Bairoch, A., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res.31(1), 365–370.
Article PubMed CAS Google Scholar
Cavalli-Sforza, L.L. (2005) The Human Genome Diversity Project: past, present and future. Nat Rev Genet6(4), 333–40.
Article PubMed CAS Google Scholar
Chang, J.T., Raychaudhuri, S., et al. (2001). Including biological literature improves homology search. Pac Symp Biocomput.
Google Scholar
Chen, L., Liu, H., et al. (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics4(1), 11
Google Scholar
Cohen, A., Hersh, W., et al. (2005) Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics6(1),103
Article PubMed CAS Google Scholar
Collier, N., Nobata, C, et al. (2000). Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th International Conference on Computational Linguistics (COLING2000), Saarbruck, Allemagne.
Google Scholar
Ding, J., Berleant, D., et al. (2002). Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp Biocomput
Google Scholar
Donaldson, I., Martin, J., et al. (2003) PreBIND and Textomy — mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics4(1), 11
Google Scholar
Emili, A.Q. and Cagney, G. (2000) Large-scale functional analysis using peptide or protein arrays. Nat Biotechnol18(4), 393–7.
Article PubMed CAS Google Scholar
Fukuda, K., Tsunoda, T., et al. (1998). Torward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing(PSB98), Hawaii.
Google Scholar
Hamosh, A., Scott, A.F., et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl. Acids Res.30(1), 52–55.
Article PubMed CAS Google Scholar
Hirschman, L., Park, J.C., et al. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics18(12), 1553–1561.
Article PubMed CAS Google Scholar
Hoffmann, R. and Valencia, A. (2004) A gene network for navigating the literature. Nat Genet36(7), 664.
Article PubMed CAS Google Scholar
Impey, S., McCorkle, S.R., et al. (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell119(7), 1041–54.
PubMed CAS Google Scholar
Jenssen, T.K., Laegreid, A., et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet28(1), 21–28.
Article PubMed CAS Google Scholar
Jeong, H., Tombor, B., et al. (2000) The large-scale organization of metabolic networks. Nature407(6804), 651–654.
Article PubMed CAS Google Scholar
Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids. Res.28(1), 27–30.
Article PubMed CAS Google Scholar
Kim, T.H., Barrera, L.O., et al. (2005) A high-resolution map of active promoters in the human genome. Nature436(7052), 876–80.
Article PubMed CAS Google Scholar
Kirschner, M.W. (2005) The meaning of systems biology. Cell121(4), 503–4
Article PubMed CAS Google Scholar
Krallinger, M. and Valencia, A. (2005) Text-mining and information-retrieval services for molecular biology. Genome Biology6(7), 224
Article PubMed Google Scholar
Leek, T.R. (1997). Information extraction using hidden Markov models. Department of Computer Science, University of California,
Google Scholar
San Diego. Lenhard, B., Hayes, W.S., et al. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res11(12), 2151–7.
Google Scholar
Liu, E.T. (2005) Systems biology, integrative biology, predictive biology. Cell121(4), 505–6.
Article PubMed CAS Google Scholar
Lockhart, D.J., Dong, H., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol14(13), 1675–80.
Article PubMed CAS Google Scholar
Matsunaga, T. and Muramatsu, M.-a. (2005) Knowledge-based computational search for genes associated with the metabolic syndrome. Bioinformatics21(14), 3146–3154.
Article PubMed CAS Google Scholar
Palla, G., Derenyi, I., et al. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature435(7043), 814–818.
Article PubMed CAS Google Scholar
Ramani, A., Bunescu, R., et al. (2005) Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology6(5),R40.
Article PubMed Google Scholar
Raychaudhuri, S., Schutze, H., et al. (2003) Inclusion of textual documentation in the analysis of multidimensional data sets: Application to gene expression data. Machine Learning 52(1-2), 119–145
Article Google Scholar
Reiner, A., Yekutieli, D., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics19(3), 368–375.
Article PubMed CAS Google Scholar
Rubinstein, R. and Simon, I. (2005) MILANO - custom annotation of microarray results using automatic literature searches. BMC Bioinformatics6(1), 12.
Article PubMed Google Scholar
Safran, M., Solomon, I., et al. (2002) GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics18(11), 1542–3.
Article PubMed CAS Google Scholar
Salwinski, L., Miller, C.S., et al. (2004) The Database of Interacting Proteins: 2004 update. Nucl. Acids Res.32(90001), D449–451
Article PubMed CAS Google Scholar
Schuemie, M.J., Weeber, M., et al. (2004) Distribution of information in biomedical abstracts and full-text publications. Bioinformatics20(16), 2597–2604.
Article PubMed CAS Google Scholar
Shatkay, H. and Feldman, R. (2003) Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology10(6), 821–855.
Article PubMed CAS Google Scholar
Shen, D., Zhang, J., et al. (2003). Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. ACL-03 Workshop on Natural Language Processing in Biomedicine
Google Scholar
Shi, L. and Campagne, F. (2005) Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics6(1), 88.
Article PubMed Google Scholar
Sokal, R.R. and Rohlf, F.J. (1995). Biometry. New York, W. H. Freeman.
Google Scholar
Stephens, M., Palakal, M., et al. (2001). Detecting gene relationships from MEDLINE abatracts. Pac Symp Biocomput.
Google Scholar
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. PNAS 100(16), 9440–9445.
Google Scholar
Temkin, J.M. and Gilder, M.R. (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics19(16), 2046–2053.
Article PubMed CAS Google Scholar
Venter, J.C., Adams, M.D., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.
Article PubMed CAS Google Scholar
Watson, J.D. (1990) The human genome project: past, present, and future. Science248(4951), 44–9.
Article PubMed CAS Google Scholar
Wilkinson, D.M. and Huberman, B.A. (2004) A method for finding communities of related genes. PNAS101(suppl_l), 5241–5248
Google Scholar
Wren, J.D., Bekeredjian, R., et al. (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics20(3), 389–398.
Article PubMed CAS Google Scholar
Wren, J.D., Chang, J.T., et al. (2005) Biomedical term mapping databases. Nucl. Acids Res. 33(suppl_l), D289–293.
Google Scholar
Yuan, G.C., Liu, Y.J., et al. (2005) Genome-scale identification of nucleosome positions in S. cerevisiae. Science309(5734), 626–30.
Article PubMed CAS Google Scholar
Zanzoni, A., Montecchi-Palazzi, L., et al. (2002) MINT: a Molecular INTeraction database. FEBS Letters513(1), 135–140.
Article PubMed CAS Google Scholar
Zhang, C. and Li, S. (2004). Modeling of neuro-endoimmune network via subject oriented literature mining. The Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS2004).
Google Scholar
Zhou, G., Zhang, J., et al. (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics20(7), 1178–1190.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Cold Spring Harbor Laboratory, 1 Bungtwon Road, Cold Spring Harbor, NY, 11724
Chaolin Zhang
Department of Biomedical Engineering, State University ofNewyork at Stony Brook, NY 11794
Michael Q. Zhang

Authors

Chaolin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Q. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Roskamp Institute, 2040 Whitfield Avenue, Sarasota, FL 34243
Venkatarajan S. Mathura
Biomed-Informatics, 17A Main Road, Irulan Chandai Annex, Pondicherry 607 402, India
Pandjassarame Kangueane

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, C., Zhang, M.Q. (2009). Biomedical Literature Mining. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_10

Download citation

DOI: https://doi.org/10.1007/978-0-387-84870-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84869-3
Online ISBN: 978-0-387-84870-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics