Skip to main content

Omics and Literature Mining

  • Protocol
  • First Online:
Bioinformatics for Omics Data

Part of the book series: Methods in Molecular Biology ((MIMB,volume 719))

Abstract

The measurement of the simultaneous expression values of thousands of genes or proteins from high throughput Omics platforms creates a large amount of data whose interpretation by inspection can be a daunting task. A major challenge of using such data is to translate these lists of genes/proteins into a better understanding of the underlying biological phenomena. We describe approaches to identify biological concepts in the form of Medical Subject Headings (MeSH terms) as extracted from MEDLINE that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying Omics platform. The method’s principle strength is its ability to simultaneously depict similarities that may exist at the level of biological structure, molecular function, physiology, genetics, and clinically manifest diseases, just as a single published article about a gene of interest may report findings within several of these same dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrade, M.A., and Valencia, A. (1997) Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system. Proc Int Conf Intell Syst Mol Biol 5, 25–32.

    PubMed  CAS  Google Scholar 

  2. Hanisch, D., Fluck, J., Mevissen, H.T., and Zimmer, R. (2003) Playing biology’s name game: identifying protein names in scientific text. Pac Symp Biocomput, 403–14.

    Google Scholar 

  3. Jenssen, T.K., Laegreid, A., Komorowski, J., and Hovig, E. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28, 21–8.

    PubMed  CAS  Google Scholar 

  4. Shatkay, H., Edwards, S., Wilbur, W.J., and Boguski, M. (2000) Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol 8, 317–28.

    PubMed  CAS  Google Scholar 

  5. Yandell, M.D., and Majoros, W.H. (2002) Genomics and natural language processing. Nat Rev Genet 3, 601–10.

    Article  PubMed  CAS  Google Scholar 

  6. Wheeler, D.L. et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36, D13–21.

    Article  PubMed  CAS  Google Scholar 

  7. Edgar, R., Domrachev, M., and Lash, A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207–10.

    Article  PubMed  CAS  Google Scholar 

  8. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514–7.

    Article  PubMed  CAS  Google Scholar 

  9. Ashburner, M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–9.

    Article  PubMed  CAS  Google Scholar 

  10. Zeeberg, B.R. et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4, R28.

    Article  PubMed  Google Scholar 

  11. Cheng, J. et al. (2004) NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 20, 1462–3.

    Article  PubMed  CAS  Google Scholar 

  12. Srinivasan, P. (2001) MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp, 642–6.

    Google Scholar 

  13. Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30.

    Article  PubMed  CAS  Google Scholar 

  14. BioCarta, http://www.biocarta.com, 2009.

  15. Dahlquist, K.D., Salomonis, N., Vranizan, K., Lawlor, S.C., and Conklin, B.R. (2002) GenMAPP, a new tool for viewing and ­analyzing microarray data on biological pathways. Nat Genet 31, 19–20.

    Article  PubMed  CAS  Google Scholar 

  16. Masys, D.R. et al. (2001) Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 17, 319–26.

    Article  PubMed  CAS  Google Scholar 

  17. Srinivasan, P., and Hristovski, D. (2004) Distilling conceptual connections from MeSH co-occurrences. Stud Health Technol Inform 107, 808–12.

    PubMed  Google Scholar 

  18. Agarwal, P., and Searls, D.B. (2008) Literature mining in support of drug discovery. Brief Bioinform 9, 479–92.

    Article  PubMed  CAS  Google Scholar 

  19. McCray, A.T. (2003) An upper-level ontology for the biomedical domain. Comp Funct Genomics 4, 80–4.

    Article  PubMed  Google Scholar 

  20. Mitchell, J.A. et al. (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 460–4.

    Google Scholar 

  21. Al-Shahrour, F. et al. (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35, W91–6.

    Article  PubMed  Google Scholar 

  22. Johnson, R.J. et al. (2005) Analysis of gene ontology features in microarray data using the Proteome BioKnowledge Library. In Silico Biol 5, 389–99.

    PubMed  CAS  Google Scholar 

  23. Storey, J.D., and Tibshirani, R. (2003) Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol 224, 149–57.

    PubMed  CAS  Google Scholar 

  24. Storey, J.D., and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100, 9440–5.

    Article  PubMed  CAS  Google Scholar 

  25. Zhao, R. et al. (2000) Analysis of p53-­regulated gene expression patterns using oligonucleotide arrays. Genes Dev 14, 981–93.

    Article  PubMed  CAS  Google Scholar 

  26. Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863–8.

    Article  PubMed  CAS  Google Scholar 

  27. Lopez, I.P. et al. (2003) DNA microarray analysis of genes differentially expressed in diet-induced (cafeteria) obese rats. Obes Res 11, 188–94.

    Article  PubMed  CAS  Google Scholar 

  28. Furukawa, S. et al. (2004) Increased oxidative stress in obesity and its impact on metabolic syndrome. J Clin Invest 114, 1752–61.

    PubMed  CAS  Google Scholar 

  29. Grimsrud, P.A., Picklo, M.J., Sr., Griffin, T.J., and Bernlohr, D.A. (2007) Carbonylation of adipose proteins in obesity and insulin resistance: identification of adipocyte fatty acid-binding protein as a cellular target of 4-hydroxynonenal. Mol Cell Proteomics 6, 624–37.

    Article  PubMed  CAS  Google Scholar 

  30. Yan, H., Kermouni, A., bdel-Hafez, M., and Lau, D.C. (2003) Role of cyclooxygenases COX-1 and COX-2 in modulating adipogenesis in 3T3-L1 cells. J Lipid Res 44, 424–9.

    Article  PubMed  CAS  Google Scholar 

  31. Fain, J.N., Ballou, L.R., and Bahouth, S.W. (2001) Obesity is induced in mice ­heterozygous for cyclooxygenase-2. Prostaglandins Other Lipid Mediat 65, 199–209.

    Article  PubMed  CAS  Google Scholar 

  32. Stapley, B.J., and Benoit, G. (2000) Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput, 529–40.

    Google Scholar 

Download references

Acknowledgements

The author would like to thank Craig Volker, Pankaj Agarwal, Liwen Liu, Tom White, Dilip Rajagopalan, William Reisdorf, Karen Kabnick, and David Searls for their contribution towards the development of this approach.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinod Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Kumar, V. (2011). Omics and Literature Mining. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-027-0_21

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-026-3

  • Online ISBN: 978-1-61779-027-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics