Skip to main content

Extracting Information for Meaningful Function Inference through Text-Mining

  • Chapter
Discovering Biomolecular Mechanisms with Computational Biology

Abstract

One of the emerging technologies in computational biology is text-mining which includes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems which cover different facets of text-mining biological information with applications in transcription control, metabolic pathways, and bacterial cross-species comparison. We demonstrate how this technology can efficiently support biologists and medical scientists to infer function of biological entities and save them a lot of time, paving way for more focused and detailed follow-up research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wheeler DL, Church DM, Edgar R et al. Database resources of the National Center for Biotech nology Information: Update. Nucleic Acids Res 2004; 32:D35–40.

    Article  PubMed  CAS  Google Scholar 

  2. Dickman S. Tough Mining: The challenges of searching the scientific literature. PLoS Biol 2003; 1(2):E48.

    Article  PubMed  Google Scholar 

  3. de Bruijn B, Martin J. Getting to the (c)ore of knowledge: Mining biomedical literature. Int J Med Inf 2002; 67(1–3):7–18.

    Article  Google Scholar 

  4. Grivell L. Mining the bibliome: Searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep 2003; 3(3):200–203.

    Article  Google Scholar 

  5. Andrade MA, Bork P. Automated extraction of information in molecular biology. FEBS Lett 2000; 476(1–2):12–17.

    Article  PubMed  CAS  Google Scholar 

  6. Schulze-Kremer S. Ontologies for molecular biology and bioinformatics. In Silico Biol 2002; 2(3):179–193.

    PubMed  CAS  Google Scholar 

  7. Jenssen TK, Laegreid A, Komorowski J et al. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001; 28(1):21–28.

    Article  PubMed  CAS  Google Scholar 

  8. Tanabe L, Scherf U, Smith LH et al. An Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 1999; 27(6):1210–4, (1216–7).

    PubMed  CAS  Google Scholar 

  9. Perez-Iratxeta C, Perez AJ, Bork P et al. Update on XplorMed: A web server for exploring scientific literature. Nucleic Acids Res 2003; 31(13):3866–3868.

    Article  PubMed  CAS  Google Scholar 

  10. Becker KG, Hosack DA, Dennis Jr G et al. PubMatrix: A tool for multiplex literature mining. BMC Bioinformatics 2003; 4(1):61.

    Article  PubMed  Google Scholar 

  11. Asher B. Decision analytics software solutions for proteomics analysis. J Mol Graph Model 2000; 18:79–82.

    PubMed  CAS  Google Scholar 

  12. Hosack DA, Dennis G, Sherman BT et al. Identifying biological themes within lists of genes with EASE. Genome Biology 2003; 4:R70.

    Article  PubMed  Google Scholar 

  13. Kim SK, Lund J, Kiraly M et al. A gene expression map for Caenorhabditis elegans. Science 2001; 293:2087–2092.

    Article  PubMed  CAS  Google Scholar 

  14. Blaschke C, Valencia A. The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 2001; 12:123–34.

    CAS  Google Scholar 

  15. Chiang JH, Yu HC, Hsu HJ. GIS: A biomedical text-mining system for gene information discov ery. Bioinformatics 2004; 20(1):120–121.

    Article  PubMed  CAS  Google Scholar 

  16. Donaldson I, Martin J, de Bruijn B et al. PreBIND and Textomy—mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 2003; 4(1):11.

    Article  PubMed  Google Scholar 

  17. Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nature Genetics 2002; 31:316–319.

    PubMed  CAS  Google Scholar 

  18. Chiang JH, Yu HC. MeKE: Discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 2003; 19(11):1417–1422.

    Article  PubMed  CAS  Google Scholar 

  19. Srinivasan P. MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp 2001; 642–646.

    Google Scholar 

  20. Lee TI, Rinaldi NJ, Robert F et al. Transcriptional regulatory networks in saccharomyces cerevisiae. Science 2002; 298:799–804.

    Article  PubMed  CAS  Google Scholar 

  21. Werner T, Fessele S, Maier H et al. Computer modeling of promoter organization as a tool to study transcriptional co regulation. FASEB J 2003; 17(10):1228–37.

    Article  PubMed  CAS  Google Scholar 

  22. Kel-Margoulis OV, Kel AE, Reuter I et al. A database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res 2002; 30(1):332–4.

    Article  PubMed  CAS  Google Scholar 

  23. Thomas J, Milward D, Ouzounis C et al. Automatic extraction of protein interactions from scientific abstracts. Pacific Symposium on Biocomputing 2000; 5:538–549.

    Google Scholar 

  24. Blaschke C, Valencia A. The frame-based module of the Suiseki information extraction system. IEEE Intelligent Systems 2002; 17:14–20.

    Google Scholar 

  25. Ono T, Hishigaki H, Tanigami A et al. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2):155–161.

    Article  PubMed  CAS  Google Scholar 

  26. Appelt DE, Israel D. Introduction to information, extraction technology. Proc of International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden: 1999, (URL: http://www.ai.sri.com/~appelt/ie-tutorial/).

    Google Scholar 

  27. Muslea I. Extracting patterns for information extraction tasks: A survey. The AAAI Workshop on Machine Learning for Information Extraction 1999, (URL: http://www.ai.sri.com/-muslea/papers.html).

    Google Scholar 

  28. Bunescu R, Ge RF, Kate RJ et al. Learning to extract proteins and their interactions from medline abstracts. Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics 2003; 46–53.

    Google Scholar 

  29. Harris MA, Clark J, Ireland A et al. Gene ontology consortium. The Gene Ontology (GO) data base and informatics resource. Nucleic Acids Res 2004; 32:D258–61.

    Article  PubMed  CAS  Google Scholar 

  30. Telepnev M, Golovliov I, Grundstrom T et al. Francisella tularensis inhibits Toll-like receptor-mediated activation of intracellular signaling and secretion of TNF-alpha and IL-1 from murine macrophages. Cell Microbiol 2003; 5(1):41–51.

    Article  PubMed  CAS  Google Scholar 

  31. Takeuchi O, Akira S. Toll-like receptors; their physiological role and signal transduction system. Int Immunopharmacol 2001; 1(4):625–35.

    Article  PubMed  CAS  Google Scholar 

  32. Lee SJ, Lee S. Toll-like receptors and inflammation in the CNS. Curr Drug Targets Inflamm Allergy 2002; 1(2):181–91.

    Article  PubMed  CAS  Google Scholar 

  33. The arabidopsis genome initiative, analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000; 408:796.

    Google Scholar 

  34. Mueller. AraCyc: A biochemical pathway database for arabidopsis. Plant Physiol 2003; 132:453–460.

    Article  PubMed  CAS  Google Scholar 

  35. Rhee SYl. The Arabidopsis Information Resource (TAIR): A model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Res 2003; 31:224–228.

    Article  PubMed  CAS  Google Scholar 

  36. Krieger CJ, Zhang P, Mueller LA et al. MetaCyc: A multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res 2004; 32:D438–442.

    Article  PubMed  CAS  Google Scholar 

  37. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000; 28:304–305.

    Article  PubMed  CAS  Google Scholar 

  38. Pharkya P, Nikolaev EV, Maranas CD. Review of the BRENDA database. Metab Eng 2003; 5(2):71–3.

    Article  PubMed  CAS  Google Scholar 

  39. Fleischmann A, Darsow M, Degtyarenko K et al. IntEnz, the integrated relational enzyme data base. Nucleic Acids Res 2004; 32:D434–7.

    Article  PubMed  CAS  Google Scholar 

  40. Kanehisa M, Goto S, Kawashima S et al. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004; 32:D277–80.

    Article  PubMed  CAS  Google Scholar 

  41. Ellis LB, Hershberger CD, Bryan EM et al. The university of minnesota biocatalysis/biodegradation database: Emphasizing enzymes. Nucleic Acids Res 2001; 29(1):340–3.

    Article  PubMed  CAS  Google Scholar 

  42. D’Souza M, Romine MF, Maltsev N. SENTRA, a database of signal transduction proteins. Nucleic Acids Res 2000; 28(1):335–6.

    Article  PubMed  CAS  Google Scholar 

  43. Johnson ET, Yi H, Shin B et al. Cymbidium hybrida dihydroflavonol 4-reductase does not efficiently reduce dihydrokaempferol to produce orange pelargonidin-type anthocyanins. Plant J 1999; 19(1):81–5.

    Article  PubMed  CAS  Google Scholar 

  44. Owens DK, Hale T, Wilson LJ et al. Quantification of the production of dihydrokaempferol by flavanone 3-hydroxytransferase using capillary electrophoresis. Phytochem Anal 2002; 13(2):69–74.

    Article  PubMed  CAS  Google Scholar 

  45. Prescott AG, Stamford NP, Wheeler G et al. In vitro properties of a recombinant flavonol synthase from arabidopsis thaliana. Photochemistry 2002; 60(6):589–93.

    Article  CAS  Google Scholar 

  46. Macnab RM. How bacteria assemble flagella. Annu Rev Microbiol 2003; 57:77–100.

    Article  PubMed  CAS  Google Scholar 

  47. Wall D, Kaiser D. Type VI pili and cell motility. Mol Microbiol 1999; 32:1–10.

    Article  PubMed  CAS  Google Scholar 

  48. Bardy SL, Ng SYM, Jarrell KF. Prokaryotic motility structures. Microbiology 2003; 149:295–304.

    Article  PubMed  CAS  Google Scholar 

  49. Manning CD, Schutze H. Foundations of statistical natural language processing. MIT Press, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Landes Bioscience and Springer Science+Business Media

About this chapter

Cite this chapter

Pan, H. et al. (2006). Extracting Information for Meaningful Function Inference through Text-Mining. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_5

Download citation

Publish with us

Policies and ethics