Abstract
The measurement of the simultaneous expression values of thousands of genes or proteins from high throughput Omics platforms creates a large amount of data whose interpretation by inspection can be a daunting task. A major challenge of using such data is to translate these lists of genes/proteins into a better understanding of the underlying biological phenomena. We describe approaches to identify biological concepts in the form of Medical Subject Headings (MeSH terms) as extracted from MEDLINE that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying Omics platform. The method’s principle strength is its ability to simultaneously depict similarities that may exist at the level of biological structure, molecular function, physiology, genetics, and clinically manifest diseases, just as a single published article about a gene of interest may report findings within several of these same dimensions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrade, M.A., and Valencia, A. (1997) Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system. Proc Int Conf Intell Syst Mol Biol 5, 25–32.
Hanisch, D., Fluck, J., Mevissen, H.T., and Zimmer, R. (2003) Playing biology’s name game: identifying protein names in scientific text. Pac Symp Biocomput, 403–14.
Jenssen, T.K., Laegreid, A., Komorowski, J., and Hovig, E. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28, 21–8.
Shatkay, H., Edwards, S., Wilbur, W.J., and Boguski, M. (2000) Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol 8, 317–28.
Yandell, M.D., and Majoros, W.H. (2002) Genomics and natural language processing. Nat Rev Genet 3, 601–10.
Wheeler, D.L. et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36, D13–21.
Edgar, R., Domrachev, M., and Lash, A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207–10.
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514–7.
Ashburner, M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–9.
Zeeberg, B.R. et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4, R28.
Cheng, J. et al. (2004) NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 20, 1462–3.
Srinivasan, P. (2001) MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp, 642–6.
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30.
BioCarta, http://www.biocarta.com, 2009.
Dahlquist, K.D., Salomonis, N., Vranizan, K., Lawlor, S.C., and Conklin, B.R. (2002) GenMAPP, a new tool for viewing and Âanalyzing microarray data on biological pathways. Nat Genet 31, 19–20.
Masys, D.R. et al. (2001) Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 17, 319–26.
Srinivasan, P., and Hristovski, D. (2004) Distilling conceptual connections from MeSH co-occurrences. Stud Health Technol Inform 107, 808–12.
Agarwal, P., and Searls, D.B. (2008) Literature mining in support of drug discovery. Brief Bioinform 9, 479–92.
McCray, A.T. (2003) An upper-level ontology for the biomedical domain. Comp Funct Genomics 4, 80–4.
Mitchell, J.A. et al. (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 460–4.
Al-Shahrour, F. et al. (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35, W91–6.
Johnson, R.J. et al. (2005) Analysis of gene ontology features in microarray data using the Proteome BioKnowledge Library. In Silico Biol 5, 389–99.
Storey, J.D., and Tibshirani, R. (2003) Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol 224, 149–57.
Storey, J.D., and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100, 9440–5.
Zhao, R. et al. (2000) Analysis of p53-Âregulated gene expression patterns using oligonucleotide arrays. Genes Dev 14, 981–93.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863–8.
Lopez, I.P. et al. (2003) DNA microarray analysis of genes differentially expressed in diet-induced (cafeteria) obese rats. Obes Res 11, 188–94.
Furukawa, S. et al. (2004) Increased oxidative stress in obesity and its impact on metabolic syndrome. J Clin Invest 114, 1752–61.
Grimsrud, P.A., Picklo, M.J., Sr., Griffin, T.J., and Bernlohr, D.A. (2007) Carbonylation of adipose proteins in obesity and insulin resistance: identification of adipocyte fatty acid-binding protein as a cellular target of 4-hydroxynonenal. Mol Cell Proteomics 6, 624–37.
Yan, H., Kermouni, A., bdel-Hafez, M., and Lau, D.C. (2003) Role of cyclooxygenases COX-1 and COX-2 in modulating adipogenesis in 3T3-L1 cells. J Lipid Res 44, 424–9.
Fain, J.N., Ballou, L.R., and Bahouth, S.W. (2001) Obesity is induced in mice Âheterozygous for cyclooxygenase-2. Prostaglandins Other Lipid Mediat 65, 199–209.
Stapley, B.J., and Benoit, G. (2000) Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput, 529–40.
Acknowledgements
The author would like to thank Craig Volker, Pankaj Agarwal, Liwen Liu, Tom White, Dilip Rajagopalan, William Reisdorf, Karen Kabnick, and David Searls for their contribution towards the development of this approach.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Kumar, V. (2011). Omics and Literature Mining. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_21
Download citation
DOI: https://doi.org/10.1007/978-1-61779-027-0_21
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols