Skip to main content

Abstract

A hurdle of large-scale genomic studies is to incorporate existing knowledge from published literature. This is accomplished by human experts but suffers from the heavy labor and the difficulty to keep knowledge up to date. Biomedical literature mining provides a potential solution to extracting and integrating useful information from literature automatically, which can lead to new discoveries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adar, E. (2004) SaRAD: a Simple and Robust Abbreviation Dictionary. Bioinformatics20(4), 527–533.

    Article  PubMed  CAS  Google Scholar 

  • Aderem, A. (2005) Systems biology: its practice and challenges. Cell121(4), 511–3

    Article  PubMed  CAS  Google Scholar 

  • Ashburner, M., Ball, C.A., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet25(1), 25–29

    Article  PubMed  CAS  Google Scholar 

  • Bader, G.D., Donaldson, I., et al. (2001) BIND-The Biomolecular Interaction Network Database. Nucl. Acids. Res.29(1), 242–245

    Article  PubMed  CAS  Google Scholar 

  • Becker, K., Hosack, D., et al. (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics4(1), 61.

    Article  PubMed  Google Scholar 

  • Boeckmann, B., Bairoch, A., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res.31(1), 365–370.

    Article  PubMed  CAS  Google Scholar 

  • Cavalli-Sforza, L.L. (2005) The Human Genome Diversity Project: past, present and future. Nat Rev Genet6(4), 333–40.

    Article  PubMed  CAS  Google Scholar 

  • Chang, J.T., Raychaudhuri, S., et al. (2001). Including biological literature improves homology search. Pac Symp Biocomput.

    Google Scholar 

  • Chen, L., Liu, H., et al. (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics4(1), 11

    Google Scholar 

  • Cohen, A., Hersh, W., et al. (2005) Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics6(1),103

    Article  PubMed  CAS  Google Scholar 

  • Collier, N., Nobata, C, et al. (2000). Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th International Conference on Computational Linguistics (COLING2000), Saarbruck, Allemagne.

    Google Scholar 

  • Ding, J., Berleant, D., et al. (2002). Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp Biocomput

    Google Scholar 

  • Donaldson, I., Martin, J., et al. (2003) PreBIND and Textomy — mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics4(1), 11

    Google Scholar 

  • Emili, A.Q. and Cagney, G. (2000) Large-scale functional analysis using peptide or protein arrays. Nat Biotechnol18(4), 393–7.

    Article  PubMed  CAS  Google Scholar 

  • Fukuda, K., Tsunoda, T., et al. (1998). Torward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing(PSB98), Hawaii.

    Google Scholar 

  • Hamosh, A., Scott, A.F., et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl. Acids Res.30(1), 52–55.

    Article  PubMed  CAS  Google Scholar 

  • Hirschman, L., Park, J.C., et al. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics18(12), 1553–1561.

    Article  PubMed  CAS  Google Scholar 

  • Hoffmann, R. and Valencia, A. (2004) A gene network for navigating the literature. Nat Genet36(7), 664.

    Article  PubMed  CAS  Google Scholar 

  • Impey, S., McCorkle, S.R., et al. (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell119(7), 1041–54.

    PubMed  CAS  Google Scholar 

  • Jenssen, T.K., Laegreid, A., et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet28(1), 21–28.

    Article  PubMed  CAS  Google Scholar 

  • Jeong, H., Tombor, B., et al. (2000) The large-scale organization of metabolic networks. Nature407(6804), 651–654.

    Article  PubMed  CAS  Google Scholar 

  • Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids. Res.28(1), 27–30.

    Article  PubMed  CAS  Google Scholar 

  • Kim, T.H., Barrera, L.O., et al. (2005) A high-resolution map of active promoters in the human genome. Nature436(7052), 876–80.

    Article  PubMed  CAS  Google Scholar 

  • Kirschner, M.W. (2005) The meaning of systems biology. Cell121(4), 503–4

    Article  PubMed  CAS  Google Scholar 

  • Krallinger, M. and Valencia, A. (2005) Text-mining and information-retrieval services for molecular biology. Genome Biology6(7), 224

    Article  PubMed  Google Scholar 

  • Leek, T.R. (1997). Information extraction using hidden Markov models. Department of Computer Science, University of California,

    Google Scholar 

  • San Diego. Lenhard, B., Hayes, W.S., et al. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res11(12), 2151–7.

    Google Scholar 

  • Liu, E.T. (2005) Systems biology, integrative biology, predictive biology. Cell121(4), 505–6.

    Article  PubMed  CAS  Google Scholar 

  • Lockhart, D.J., Dong, H., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol14(13), 1675–80.

    Article  PubMed  CAS  Google Scholar 

  • Matsunaga, T. and Muramatsu, M.-a. (2005) Knowledge-based computational search for genes associated with the metabolic syndrome. Bioinformatics21(14), 3146–3154.

    Article  PubMed  CAS  Google Scholar 

  • Palla, G., Derenyi, I., et al. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature435(7043), 814–818.

    Article  PubMed  CAS  Google Scholar 

  • Ramani, A., Bunescu, R., et al. (2005) Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology6(5),R40.

    Article  PubMed  Google Scholar 

  • Raychaudhuri, S., Schutze, H., et al. (2003) Inclusion of textual documentation in the analysis of multidimensional data sets: Application to gene expression data. Machine Learning 52(1-2), 119–145

    Article  Google Scholar 

  • Reiner, A., Yekutieli, D., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics19(3), 368–375.

    Article  PubMed  CAS  Google Scholar 

  • Rubinstein, R. and Simon, I. (2005) MILANO - custom annotation of microarray results using automatic literature searches. BMC Bioinformatics6(1), 12.

    Article  PubMed  Google Scholar 

  • Safran, M., Solomon, I., et al. (2002) GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics18(11), 1542–3.

    Article  PubMed  CAS  Google Scholar 

  • Salwinski, L., Miller, C.S., et al. (2004) The Database of Interacting Proteins: 2004 update. Nucl. Acids Res.32(90001), D449–451

    Article  PubMed  CAS  Google Scholar 

  • Schuemie, M.J., Weeber, M., et al. (2004) Distribution of information in biomedical abstracts and full-text publications. Bioinformatics20(16), 2597–2604.

    Article  PubMed  CAS  Google Scholar 

  • Shatkay, H. and Feldman, R. (2003) Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology10(6), 821–855.

    Article  PubMed  CAS  Google Scholar 

  • Shen, D., Zhang, J., et al. (2003). Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. ACL-03 Workshop on Natural Language Processing in Biomedicine

    Google Scholar 

  • Shi, L. and Campagne, F. (2005) Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics6(1), 88.

    Article  PubMed  Google Scholar 

  • Sokal, R.R. and Rohlf, F.J. (1995). Biometry. New York, W. H. Freeman.

    Google Scholar 

  • Stephens, M., Palakal, M., et al. (2001). Detecting gene relationships from MEDLINE abatracts. Pac Symp Biocomput.

    Google Scholar 

  • Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. PNAS 100(16), 9440–9445.

    Google Scholar 

  • Temkin, J.M. and Gilder, M.R. (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics19(16), 2046–2053.

    Article  PubMed  CAS  Google Scholar 

  • Venter, J.C., Adams, M.D., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.

    Article  PubMed  CAS  Google Scholar 

  • Watson, J.D. (1990) The human genome project: past, present, and future. Science248(4951), 44–9.

    Article  PubMed  CAS  Google Scholar 

  • Wilkinson, D.M. and Huberman, B.A. (2004) A method for finding communities of related genes. PNAS101(suppl_l), 5241–5248

    Google Scholar 

  • Wren, J.D., Bekeredjian, R., et al. (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics20(3), 389–398.

    Article  PubMed  CAS  Google Scholar 

  • Wren, J.D., Chang, J.T., et al. (2005) Biomedical term mapping databases. Nucl. Acids Res. 33(suppl_l), D289–293.

    Google Scholar 

  • Yuan, G.C., Liu, Y.J., et al. (2005) Genome-scale identification of nucleosome positions in S. cerevisiae. Science309(5734), 626–30.

    Article  PubMed  CAS  Google Scholar 

  • Zanzoni, A., Montecchi-Palazzi, L., et al. (2002) MINT: a Molecular INTeraction database. FEBS Letters513(1), 135–140.

    Article  PubMed  CAS  Google Scholar 

  • Zhang, C. and Li, S. (2004). Modeling of neuro-endoimmune network via subject oriented literature mining. The Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS2004).

    Google Scholar 

  • Zhou, G., Zhang, J., et al. (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics20(7), 1178–1190.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zhang, C., Zhang, M.Q. (2009). Biomedical Literature Mining. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_10

Download citation

Publish with us

Policies and ethics