Abstract
One of the emerging technologies in computational biology is text-mining which includes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems which cover different facets of text-mining biological information with applications in transcription control, metabolic pathways, and bacterial cross-species comparison. We demonstrate how this technology can efficiently support biologists and medical scientists to infer function of biological entities and save them a lot of time, paving way for more focused and detailed follow-up research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wheeler DL, Church DM, Edgar R et al. Database resources of the National Center for Biotech nology Information: Update. Nucleic Acids Res 2004; 32:D35–40.
Dickman S. Tough Mining: The challenges of searching the scientific literature. PLoS Biol 2003; 1(2):E48.
de Bruijn B, Martin J. Getting to the (c)ore of knowledge: Mining biomedical literature. Int J Med Inf 2002; 67(1–3):7–18.
Grivell L. Mining the bibliome: Searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep 2003; 3(3):200–203.
Andrade MA, Bork P. Automated extraction of information in molecular biology. FEBS Lett 2000; 476(1–2):12–17.
Schulze-Kremer S. Ontologies for molecular biology and bioinformatics. In Silico Biol 2002; 2(3):179–193.
Jenssen TK, Laegreid A, Komorowski J et al. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001; 28(1):21–28.
Tanabe L, Scherf U, Smith LH et al. An Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 1999; 27(6):1210–4, (1216–7).
Perez-Iratxeta C, Perez AJ, Bork P et al. Update on XplorMed: A web server for exploring scientific literature. Nucleic Acids Res 2003; 31(13):3866–3868.
Becker KG, Hosack DA, Dennis Jr G et al. PubMatrix: A tool for multiplex literature mining. BMC Bioinformatics 2003; 4(1):61.
Asher B. Decision analytics software solutions for proteomics analysis. J Mol Graph Model 2000; 18:79–82.
Hosack DA, Dennis G, Sherman BT et al. Identifying biological themes within lists of genes with EASE. Genome Biology 2003; 4:R70.
Kim SK, Lund J, Kiraly M et al. A gene expression map for Caenorhabditis elegans. Science 2001; 293:2087–2092.
Blaschke C, Valencia A. The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 2001; 12:123–34.
Chiang JH, Yu HC, Hsu HJ. GIS: A biomedical text-mining system for gene information discov ery. Bioinformatics 2004; 20(1):120–121.
Donaldson I, Martin J, de Bruijn B et al. PreBIND and Textomy—mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 2003; 4(1):11.
Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nature Genetics 2002; 31:316–319.
Chiang JH, Yu HC. MeKE: Discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 2003; 19(11):1417–1422.
Srinivasan P. MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp 2001; 642–646.
Lee TI, Rinaldi NJ, Robert F et al. Transcriptional regulatory networks in saccharomyces cerevisiae. Science 2002; 298:799–804.
Werner T, Fessele S, Maier H et al. Computer modeling of promoter organization as a tool to study transcriptional co regulation. FASEB J 2003; 17(10):1228–37.
Kel-Margoulis OV, Kel AE, Reuter I et al. A database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res 2002; 30(1):332–4.
Thomas J, Milward D, Ouzounis C et al. Automatic extraction of protein interactions from scientific abstracts. Pacific Symposium on Biocomputing 2000; 5:538–549.
Blaschke C, Valencia A. The frame-based module of the Suiseki information extraction system. IEEE Intelligent Systems 2002; 17:14–20.
Ono T, Hishigaki H, Tanigami A et al. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2):155–161.
Appelt DE, Israel D. Introduction to information, extraction technology. Proc of International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden: 1999, (URL: http://www.ai.sri.com/~appelt/ie-tutorial/).
Muslea I. Extracting patterns for information extraction tasks: A survey. The AAAI Workshop on Machine Learning for Information Extraction 1999, (URL: http://www.ai.sri.com/-muslea/papers.html).
Bunescu R, Ge RF, Kate RJ et al. Learning to extract proteins and their interactions from medline abstracts. Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics 2003; 46–53.
Harris MA, Clark J, Ireland A et al. Gene ontology consortium. The Gene Ontology (GO) data base and informatics resource. Nucleic Acids Res 2004; 32:D258–61.
Telepnev M, Golovliov I, Grundstrom T et al. Francisella tularensis inhibits Toll-like receptor-mediated activation of intracellular signaling and secretion of TNF-alpha and IL-1 from murine macrophages. Cell Microbiol 2003; 5(1):41–51.
Takeuchi O, Akira S. Toll-like receptors; their physiological role and signal transduction system. Int Immunopharmacol 2001; 1(4):625–35.
Lee SJ, Lee S. Toll-like receptors and inflammation in the CNS. Curr Drug Targets Inflamm Allergy 2002; 1(2):181–91.
The arabidopsis genome initiative, analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000; 408:796.
Mueller. AraCyc: A biochemical pathway database for arabidopsis. Plant Physiol 2003; 132:453–460.
Rhee SYl. The Arabidopsis Information Resource (TAIR): A model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Res 2003; 31:224–228.
Krieger CJ, Zhang P, Mueller LA et al. MetaCyc: A multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res 2004; 32:D438–442.
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000; 28:304–305.
Pharkya P, Nikolaev EV, Maranas CD. Review of the BRENDA database. Metab Eng 2003; 5(2):71–3.
Fleischmann A, Darsow M, Degtyarenko K et al. IntEnz, the integrated relational enzyme data base. Nucleic Acids Res 2004; 32:D434–7.
Kanehisa M, Goto S, Kawashima S et al. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004; 32:D277–80.
Ellis LB, Hershberger CD, Bryan EM et al. The university of minnesota biocatalysis/biodegradation database: Emphasizing enzymes. Nucleic Acids Res 2001; 29(1):340–3.
D’Souza M, Romine MF, Maltsev N. SENTRA, a database of signal transduction proteins. Nucleic Acids Res 2000; 28(1):335–6.
Johnson ET, Yi H, Shin B et al. Cymbidium hybrida dihydroflavonol 4-reductase does not efficiently reduce dihydrokaempferol to produce orange pelargonidin-type anthocyanins. Plant J 1999; 19(1):81–5.
Owens DK, Hale T, Wilson LJ et al. Quantification of the production of dihydrokaempferol by flavanone 3-hydroxytransferase using capillary electrophoresis. Phytochem Anal 2002; 13(2):69–74.
Prescott AG, Stamford NP, Wheeler G et al. In vitro properties of a recombinant flavonol synthase from arabidopsis thaliana. Photochemistry 2002; 60(6):589–93.
Macnab RM. How bacteria assemble flagella. Annu Rev Microbiol 2003; 57:77–100.
Wall D, Kaiser D. Type VI pili and cell motility. Mol Microbiol 1999; 32:1–10.
Bardy SL, Ng SYM, Jarrell KF. Prokaryotic motility structures. Microbiology 2003; 149:295–304.
Manning CD, Schutze H. Foundations of statistical natural language processing. MIT Press, 1999.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2006 Landes Bioscience and Springer Science+Business Media
About this chapter
Cite this chapter
Pan, H. et al. (2006). Extracting Information for Meaningful Function Inference through Text-Mining. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_5
Download citation
DOI: https://doi.org/10.1007/0-387-36747-0_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34527-7
Online ISBN: 978-0-387-36747-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)