Abstract
A hurdle of large-scale genomic studies is to incorporate existing knowledge from published literature. This is accomplished by human experts but suffers from the heavy labor and the difficulty to keep knowledge up to date. Biomedical literature mining provides a potential solution to extracting and integrating useful information from literature automatically, which can lead to new discoveries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adar, E. (2004) SaRAD: a Simple and Robust Abbreviation Dictionary. Bioinformatics20(4), 527–533.
Aderem, A. (2005) Systems biology: its practice and challenges. Cell121(4), 511–3
Ashburner, M., Ball, C.A., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet25(1), 25–29
Bader, G.D., Donaldson, I., et al. (2001) BIND-The Biomolecular Interaction Network Database. Nucl. Acids. Res.29(1), 242–245
Becker, K., Hosack, D., et al. (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics4(1), 61.
Boeckmann, B., Bairoch, A., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res.31(1), 365–370.
Cavalli-Sforza, L.L. (2005) The Human Genome Diversity Project: past, present and future. Nat Rev Genet6(4), 333–40.
Chang, J.T., Raychaudhuri, S., et al. (2001). Including biological literature improves homology search. Pac Symp Biocomput.
Chen, L., Liu, H., et al. (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics4(1), 11
Cohen, A., Hersh, W., et al. (2005) Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics6(1),103
Collier, N., Nobata, C, et al. (2000). Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th International Conference on Computational Linguistics (COLING2000), Saarbruck, Allemagne.
Ding, J., Berleant, D., et al. (2002). Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp Biocomput
Donaldson, I., Martin, J., et al. (2003) PreBIND and Textomy — mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics4(1), 11
Emili, A.Q. and Cagney, G. (2000) Large-scale functional analysis using peptide or protein arrays. Nat Biotechnol18(4), 393–7.
Fukuda, K., Tsunoda, T., et al. (1998). Torward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing(PSB98), Hawaii.
Hamosh, A., Scott, A.F., et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl. Acids Res.30(1), 52–55.
Hirschman, L., Park, J.C., et al. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics18(12), 1553–1561.
Hoffmann, R. and Valencia, A. (2004) A gene network for navigating the literature. Nat Genet36(7), 664.
Impey, S., McCorkle, S.R., et al. (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell119(7), 1041–54.
Jenssen, T.K., Laegreid, A., et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet28(1), 21–28.
Jeong, H., Tombor, B., et al. (2000) The large-scale organization of metabolic networks. Nature407(6804), 651–654.
Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids. Res.28(1), 27–30.
Kim, T.H., Barrera, L.O., et al. (2005) A high-resolution map of active promoters in the human genome. Nature436(7052), 876–80.
Kirschner, M.W. (2005) The meaning of systems biology. Cell121(4), 503–4
Krallinger, M. and Valencia, A. (2005) Text-mining and information-retrieval services for molecular biology. Genome Biology6(7), 224
Leek, T.R. (1997). Information extraction using hidden Markov models. Department of Computer Science, University of California,
San Diego. Lenhard, B., Hayes, W.S., et al. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res11(12), 2151–7.
Liu, E.T. (2005) Systems biology, integrative biology, predictive biology. Cell121(4), 505–6.
Lockhart, D.J., Dong, H., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol14(13), 1675–80.
Matsunaga, T. and Muramatsu, M.-a. (2005) Knowledge-based computational search for genes associated with the metabolic syndrome. Bioinformatics21(14), 3146–3154.
Palla, G., Derenyi, I., et al. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature435(7043), 814–818.
Ramani, A., Bunescu, R., et al. (2005) Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology6(5),R40.
Raychaudhuri, S., Schutze, H., et al. (2003) Inclusion of textual documentation in the analysis of multidimensional data sets: Application to gene expression data. Machine Learning 52(1-2), 119–145
Reiner, A., Yekutieli, D., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics19(3), 368–375.
Rubinstein, R. and Simon, I. (2005) MILANO - custom annotation of microarray results using automatic literature searches. BMC Bioinformatics6(1), 12.
Safran, M., Solomon, I., et al. (2002) GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics18(11), 1542–3.
Salwinski, L., Miller, C.S., et al. (2004) The Database of Interacting Proteins: 2004 update. Nucl. Acids Res.32(90001), D449–451
Schuemie, M.J., Weeber, M., et al. (2004) Distribution of information in biomedical abstracts and full-text publications. Bioinformatics20(16), 2597–2604.
Shatkay, H. and Feldman, R. (2003) Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology10(6), 821–855.
Shen, D., Zhang, J., et al. (2003). Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. ACL-03 Workshop on Natural Language Processing in Biomedicine
Shi, L. and Campagne, F. (2005) Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics6(1), 88.
Sokal, R.R. and Rohlf, F.J. (1995). Biometry. New York, W. H. Freeman.
Stephens, M., Palakal, M., et al. (2001). Detecting gene relationships from MEDLINE abatracts. Pac Symp Biocomput.
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. PNAS 100(16), 9440–9445.
Temkin, J.M. and Gilder, M.R. (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics19(16), 2046–2053.
Venter, J.C., Adams, M.D., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.
Watson, J.D. (1990) The human genome project: past, present, and future. Science248(4951), 44–9.
Wilkinson, D.M. and Huberman, B.A. (2004) A method for finding communities of related genes. PNAS101(suppl_l), 5241–5248
Wren, J.D., Bekeredjian, R., et al. (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics20(3), 389–398.
Wren, J.D., Chang, J.T., et al. (2005) Biomedical term mapping databases. Nucl. Acids Res. 33(suppl_l), D289–293.
Yuan, G.C., Liu, Y.J., et al. (2005) Genome-scale identification of nucleosome positions in S. cerevisiae. Science309(5734), 626–30.
Zanzoni, A., Montecchi-Palazzi, L., et al. (2002) MINT: a Molecular INTeraction database. FEBS Letters513(1), 135–140.
Zhang, C. and Li, S. (2004). Modeling of neuro-endoimmune network via subject oriented literature mining. The Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS2004).
Zhou, G., Zhang, J., et al. (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics20(7), 1178–1190.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zhang, C., Zhang, M.Q. (2009). Biomedical Literature Mining. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-84870-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84869-3
Online ISBN: 978-0-387-84870-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)