Omics and Literature Mining

Kumar, Vinod

doi:10.1007/978-1-61779-027-0_21

Vinod Kumar²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 719))

5175 Accesses
3 Citations

Abstract

The measurement of the simultaneous expression values of thousands of genes or proteins from high throughput Omics platforms creates a large amount of data whose interpretation by inspection can be a daunting task. A major challenge of using such data is to translate these lists of genes/proteins into a better understanding of the underlying biological phenomena. We describe approaches to identify biological concepts in the form of Medical Subject Headings (MeSH terms) as extracted from MEDLINE that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying Omics platform. The method’s principle strength is its ability to simultaneously depict similarities that may exist at the level of biological structure, molecular function, physiology, genetics, and clinically manifest diseases, just as a single published article about a gene of interest may report findings within several of these same dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrade, M.A., and Valencia, A. (1997) Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system. Proc Int Conf Intell Syst Mol Biol 5, 25–32.
PubMed CAS Google Scholar
Hanisch, D., Fluck, J., Mevissen, H.T., and Zimmer, R. (2003) Playing biology’s name game: identifying protein names in scientific text. Pac Symp Biocomput, 403–14.
Google Scholar
Jenssen, T.K., Laegreid, A., Komorowski, J., and Hovig, E. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28, 21–8.
PubMed CAS Google Scholar
Shatkay, H., Edwards, S., Wilbur, W.J., and Boguski, M. (2000) Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol 8, 317–28.
PubMed CAS Google Scholar
Yandell, M.D., and Majoros, W.H. (2002) Genomics and natural language processing. Nat Rev Genet 3, 601–10.
Article PubMed CAS Google Scholar
Wheeler, D.L. et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36, D13–21.
Article PubMed CAS Google Scholar
Edgar, R., Domrachev, M., and Lash, A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207–10.
Article PubMed CAS Google Scholar
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514–7.
Article PubMed CAS Google Scholar
Ashburner, M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–9.
Article PubMed CAS Google Scholar
Zeeberg, B.R. et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4, R28.
Article PubMed Google Scholar
Cheng, J. et al. (2004) NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 20, 1462–3.
Article PubMed CAS Google Scholar
Srinivasan, P. (2001) MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp, 642–6.
Google Scholar
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30.
Article PubMed CAS Google Scholar
BioCarta, http://www.biocarta.com, 2009.
Dahlquist, K.D., Salomonis, N., Vranizan, K., Lawlor, S.C., and Conklin, B.R. (2002) GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 31, 19–20.
Article PubMed CAS Google Scholar
Masys, D.R. et al. (2001) Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 17, 319–26.
Article PubMed CAS Google Scholar
Srinivasan, P., and Hristovski, D. (2004) Distilling conceptual connections from MeSH co-occurrences. Stud Health Technol Inform 107, 808–12.
PubMed Google Scholar
Agarwal, P., and Searls, D.B. (2008) Literature mining in support of drug discovery. Brief Bioinform 9, 479–92.
Article PubMed CAS Google Scholar
McCray, A.T. (2003) An upper-level ontology for the biomedical domain. Comp Funct Genomics 4, 80–4.
Article PubMed Google Scholar
Mitchell, J.A. et al. (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 460–4.
Google Scholar
Al-Shahrour, F. et al. (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35, W91–6.
Article PubMed Google Scholar
Johnson, R.J. et al. (2005) Analysis of gene ontology features in microarray data using the Proteome BioKnowledge Library. In Silico Biol 5, 389–99.
PubMed CAS Google Scholar
Storey, J.D., and Tibshirani, R. (2003) Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol 224, 149–57.
PubMed CAS Google Scholar
Storey, J.D., and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100, 9440–5.
Article PubMed CAS Google Scholar
Zhao, R. et al. (2000) Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. Genes Dev 14, 981–93.
Article PubMed CAS Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863–8.
Article PubMed CAS Google Scholar
Lopez, I.P. et al. (2003) DNA microarray analysis of genes differentially expressed in diet-induced (cafeteria) obese rats. Obes Res 11, 188–94.
Article PubMed CAS Google Scholar
Furukawa, S. et al. (2004) Increased oxidative stress in obesity and its impact on metabolic syndrome. J Clin Invest 114, 1752–61.
PubMed CAS Google Scholar
Grimsrud, P.A., Picklo, M.J., Sr., Griffin, T.J., and Bernlohr, D.A. (2007) Carbonylation of adipose proteins in obesity and insulin resistance: identification of adipocyte fatty acid-binding protein as a cellular target of 4-hydroxynonenal. Mol Cell Proteomics 6, 624–37.
Article PubMed CAS Google Scholar
Yan, H., Kermouni, A., bdel-Hafez, M., and Lau, D.C. (2003) Role of cyclooxygenases COX-1 and COX-2 in modulating adipogenesis in 3T3-L1 cells. J Lipid Res 44, 424–9.
Article PubMed CAS Google Scholar
Fain, J.N., Ballou, L.R., and Bahouth, S.W. (2001) Obesity is induced in mice heterozygous for cyclooxygenase-2. Prostaglandins Other Lipid Mediat 65, 199–209.
Article PubMed CAS Google Scholar
Stapley, B.J., and Benoit, G. (2000) Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput, 529–40.
Google Scholar

Download references

Acknowledgements

The author would like to thank Craig Volker, Pankaj Agarwal, Liwen Liu, Tom White, Dilip Rajagopalan, William Reisdorf, Karen Kabnick, and David Searls for their contribution towards the development of this approach.

Author information

Authors and Affiliations

Computational Biology, Quantitative Sciences, GlaxoSmithKline, King of Prussia, PA, USA
Vinod Kumar

Authors

Vinod Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinod Kumar .

Editor information

Editors and Affiliations

emergentec biodevelopment GmbH, Gersthofer Strasse 29-31, Vienna, 1180, Austria
Bernd Mayer

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kumar, V. (2011). Omics and Literature Mining. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_21

Download citation

DOI: https://doi.org/10.1007/978-1-61779-027-0_21
Published: 29 January 2011
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics