Extracting Information for Meaningful Function Inference through Text-Mining

Pan, Hong; Zuo, Li; Kanagasabai, Rajaraman; Zhang, Zhuo; Choudhary, Vidhu; Mohanty, Bijayalaxmi; Tan, Sin Lam; Krishnan, S. P. T.; Veladandi, Pardha Sarathi; Meka, Archana; Choy, Weng Keong; Swarup, Sanjay; Bajic, Vladimir B.

doi:10.1007/0-387-36747-0_5

Hong Pan²,
Li Zuo²,
Rajaraman Kanagasabai²,
Zhuo Zhang²,
Vidhu Choudhary²,
Bijayalaxmi Mohanty²,
Sin Lam Tan²,
S. P. T. Krishnan²,
Pardha Sarathi Veladandi³,
Archana Meka³,
Weng Keong Choy³,
Sanjay Swarup³ &
…
Vladimir B. Bajic²

Part of the book series: Molecular Biology Intelligence Unit ((MBIU))

666 Accesses
2 Citations

Abstract

One of the emerging technologies in computational biology is text-mining which includes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems which cover different facets of text-mining biological information with applications in transcription control, metabolic pathways, and bacterial cross-species comparison. We demonstrate how this technology can efficiently support biologists and medical scientists to infer function of biological entities and save them a lot of time, paving way for more focused and detailed follow-up research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wheeler DL, Church DM, Edgar R et al. Database resources of the National Center for Biotech nology Information: Update. Nucleic Acids Res 2004; 32:D35–40.
Article PubMed CAS Google Scholar
Dickman S. Tough Mining: The challenges of searching the scientific literature. PLoS Biol 2003; 1(2):E48.
Article PubMed Google Scholar
de Bruijn B, Martin J. Getting to the (c)ore of knowledge: Mining biomedical literature. Int J Med Inf 2002; 67(1–3):7–18.
Article Google Scholar
Grivell L. Mining the bibliome: Searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep 2003; 3(3):200–203.
Article Google Scholar
Andrade MA, Bork P. Automated extraction of information in molecular biology. FEBS Lett 2000; 476(1–2):12–17.
Article PubMed CAS Google Scholar
Schulze-Kremer S. Ontologies for molecular biology and bioinformatics. In Silico Biol 2002; 2(3):179–193.
PubMed CAS Google Scholar
Jenssen TK, Laegreid A, Komorowski J et al. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001; 28(1):21–28.
Article PubMed CAS Google Scholar
Tanabe L, Scherf U, Smith LH et al. An Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 1999; 27(6):1210–4, (1216–7).
PubMed CAS Google Scholar
Perez-Iratxeta C, Perez AJ, Bork P et al. Update on XplorMed: A web server for exploring scientific literature. Nucleic Acids Res 2003; 31(13):3866–3868.
Article PubMed CAS Google Scholar
Becker KG, Hosack DA, Dennis Jr G et al. PubMatrix: A tool for multiplex literature mining. BMC Bioinformatics 2003; 4(1):61.
Article PubMed Google Scholar
Asher B. Decision analytics software solutions for proteomics analysis. J Mol Graph Model 2000; 18:79–82.
PubMed CAS Google Scholar
Hosack DA, Dennis G, Sherman BT et al. Identifying biological themes within lists of genes with EASE. Genome Biology 2003; 4:R70.
Article PubMed Google Scholar
Kim SK, Lund J, Kiraly M et al. A gene expression map for Caenorhabditis elegans. Science 2001; 293:2087–2092.
Article PubMed CAS Google Scholar
Blaschke C, Valencia A. The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 2001; 12:123–34.
CAS Google Scholar
Chiang JH, Yu HC, Hsu HJ. GIS: A biomedical text-mining system for gene information discov ery. Bioinformatics 2004; 20(1):120–121.
Article PubMed CAS Google Scholar
Donaldson I, Martin J, de Bruijn B et al. PreBIND and Textomy—mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 2003; 4(1):11.
Article PubMed Google Scholar
Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nature Genetics 2002; 31:316–319.
PubMed CAS Google Scholar
Chiang JH, Yu HC. MeKE: Discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 2003; 19(11):1417–1422.
Article PubMed CAS Google Scholar
Srinivasan P. MeSHmap: A text mining tool for MEDLINE. Proc AMIA Symp 2001; 642–646.
Google Scholar
Lee TI, Rinaldi NJ, Robert F et al. Transcriptional regulatory networks in saccharomyces cerevisiae. Science 2002; 298:799–804.
Article PubMed CAS Google Scholar
Werner T, Fessele S, Maier H et al. Computer modeling of promoter organization as a tool to study transcriptional co regulation. FASEB J 2003; 17(10):1228–37.
Article PubMed CAS Google Scholar
Kel-Margoulis OV, Kel AE, Reuter I et al. A database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res 2002; 30(1):332–4.
Article PubMed CAS Google Scholar
Thomas J, Milward D, Ouzounis C et al. Automatic extraction of protein interactions from scientific abstracts. Pacific Symposium on Biocomputing 2000; 5:538–549.
Google Scholar
Blaschke C, Valencia A. The frame-based module of the Suiseki information extraction system. IEEE Intelligent Systems 2002; 17:14–20.
Google Scholar
Ono T, Hishigaki H, Tanigami A et al. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2):155–161.
Article PubMed CAS Google Scholar
Appelt DE, Israel D. Introduction to information, extraction technology. Proc of International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden: 1999, (URL: http://www.ai.sri.com/~appelt/ie-tutorial/).
Google Scholar
Muslea I. Extracting patterns for information extraction tasks: A survey. The AAAI Workshop on Machine Learning for Information Extraction 1999, (URL: http://www.ai.sri.com/-muslea/papers.html).
Google Scholar
Bunescu R, Ge RF, Kate RJ et al. Learning to extract proteins and their interactions from medline abstracts. Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics 2003; 46–53.
Google Scholar
Harris MA, Clark J, Ireland A et al. Gene ontology consortium. The Gene Ontology (GO) data base and informatics resource. Nucleic Acids Res 2004; 32:D258–61.
Article PubMed CAS Google Scholar
Telepnev M, Golovliov I, Grundstrom T et al. Francisella tularensis inhibits Toll-like receptor-mediated activation of intracellular signaling and secretion of TNF-alpha and IL-1 from murine macrophages. Cell Microbiol 2003; 5(1):41–51.
Article PubMed CAS Google Scholar
Takeuchi O, Akira S. Toll-like receptors; their physiological role and signal transduction system. Int Immunopharmacol 2001; 1(4):625–35.
Article PubMed CAS Google Scholar
Lee SJ, Lee S. Toll-like receptors and inflammation in the CNS. Curr Drug Targets Inflamm Allergy 2002; 1(2):181–91.
Article PubMed CAS Google Scholar
The arabidopsis genome initiative, analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000; 408:796.
Google Scholar
Mueller. AraCyc: A biochemical pathway database for arabidopsis. Plant Physiol 2003; 132:453–460.
Article PubMed CAS Google Scholar
Rhee SYl. The Arabidopsis Information Resource (TAIR): A model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Res 2003; 31:224–228.
Article PubMed CAS Google Scholar
Krieger CJ, Zhang P, Mueller LA et al. MetaCyc: A multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res 2004; 32:D438–442.
Article PubMed CAS Google Scholar
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000; 28:304–305.
Article PubMed CAS Google Scholar
Pharkya P, Nikolaev EV, Maranas CD. Review of the BRENDA database. Metab Eng 2003; 5(2):71–3.
Article PubMed CAS Google Scholar
Fleischmann A, Darsow M, Degtyarenko K et al. IntEnz, the integrated relational enzyme data base. Nucleic Acids Res 2004; 32:D434–7.
Article PubMed CAS Google Scholar
Kanehisa M, Goto S, Kawashima S et al. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004; 32:D277–80.
Article PubMed CAS Google Scholar
Ellis LB, Hershberger CD, Bryan EM et al. The university of minnesota biocatalysis/biodegradation database: Emphasizing enzymes. Nucleic Acids Res 2001; 29(1):340–3.
Article PubMed CAS Google Scholar
D’Souza M, Romine MF, Maltsev N. SENTRA, a database of signal transduction proteins. Nucleic Acids Res 2000; 28(1):335–6.
Article PubMed CAS Google Scholar
Johnson ET, Yi H, Shin B et al. Cymbidium hybrida dihydroflavonol 4-reductase does not efficiently reduce dihydrokaempferol to produce orange pelargonidin-type anthocyanins. Plant J 1999; 19(1):81–5.
Article PubMed CAS Google Scholar
Owens DK, Hale T, Wilson LJ et al. Quantification of the production of dihydrokaempferol by flavanone 3-hydroxytransferase using capillary electrophoresis. Phytochem Anal 2002; 13(2):69–74.
Article PubMed CAS Google Scholar
Prescott AG, Stamford NP, Wheeler G et al. In vitro properties of a recombinant flavonol synthase from arabidopsis thaliana. Photochemistry 2002; 60(6):589–93.
Article CAS Google Scholar
Macnab RM. How bacteria assemble flagella. Annu Rev Microbiol 2003; 57:77–100.
Article PubMed CAS Google Scholar
Wall D, Kaiser D. Type VI pili and cell motility. Mol Microbiol 1999; 32:1–10.
Article PubMed CAS Google Scholar
Bardy SL, Ng SYM, Jarrell KF. Prokaryotic motility structures. Microbiology 2003; 149:295–304.
Article PubMed CAS Google Scholar
Manning CD, Schutze H. Foundations of statistical natural language processing. MIT Press, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Extraction Lab, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Hong Pan, Li Zuo, Rajaraman Kanagasabai, Zhuo Zhang, Vidhu Choudhary, Bijayalaxmi Mohanty, Sin Lam Tan, S. P. T. Krishnan & Vladimir B. Bajic
Department of Biological Sciences, National University of Singapore, Singapore
Pardha Sarathi Veladandi, Archana Meka, Weng Keong Choy & Sanjay Swarup

Authors

Hong Pan
View author publications
You can also search for this author in PubMed Google Scholar
Li Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Rajaraman Kanagasabai
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Vidhu Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Bijayalaxmi Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Sin Lam Tan
View author publications
You can also search for this author in PubMed Google Scholar
S. P. T. Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Pardha Sarathi Veladandi
View author publications
You can also search for this author in PubMed Google Scholar
Archana Meka
View author publications
You can also search for this author in PubMed Google Scholar
Weng Keong Choy
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Swarup
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir B. Bajic
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pan, H. et al. (2006). Extracting Information for Meaningful Function Inference through Text-Mining. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_5

Download citation

DOI: https://doi.org/10.1007/0-387-36747-0_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34527-7
Online ISBN: 978-0-387-36747-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics