Skip to main content
Log in

Enhancing Automatic Construction of Gene Subnetworks by Integrating Multiple Sources of Information

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

We present an approach to extracting information from textual documents of biological knowledge and demonstrate how cellular gene pathways may be inferred. Natural language processing techniques are used to represent title and abstract fields of publications to derive a gene similarity vectors which are subject to cluster analysis. Gene interactions are derived by parsing sentences in the abstracts to infer causal relationships. We show how high throughput transcriptome data may then be used to enhance the construction of gene pathways from information derived from text. Subnetworks constructed by integrating information automatically derived from literature with gene expression data is validated by comparing biological processes defined in the Gene Ontology 2(GO) database. We find that precision increases in \(58\%\) of the clusters when enhanced in this manner while a decrease in precision is observed in a relatively small number of clusters. These results are compared to similar attempts at the same problem and appear to be better in terms of precision of network construction. We also show an example of a subnetwork found by this analysis that overlaps a known gene pathway in KEGG and MIPS databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

Notes

  1. http://www.ncbi.nlm.nih.gov/entrez

  2. http://www.tartarus.org/~martin/PorterStemmer/

  3. http://rana.lbl.gov/EisenData.html

  4. http://mips.gsf.de/genre/proj/yeast/

  5. http://www.graphviz.org/

  6. http://www.geneontology.org

  7. redrawn to look similar to http://www.genome.jp/kegg/pathway/ sce/sce03050.html

References

  1. Benthem, J. F., & Meulen, A. G. (1997). Handbook of logic and language. Elsevier.

  2. Corney, D. P. A., Buxton, B. F., Langdon, W. B., & Jones, D. T. (2004). BioRAT: Extracting biological information from full-length papers. Bioinformatics, 20(17), 3206–3213.

    Article  Google Scholar 

  3. Brown, M. S. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., et al. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences of the USA, 97(1), 262–267.

    Article  Google Scholar 

  4. Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the USA, 95, 14863–14868.

    Article  Google Scholar 

  5. Grossman, D., & Frieder, O. (1999). Introduction to modern information retrieval. London: Library Association Publishing.

    Google Scholar 

  6. Iliopoulos, I., Enright, A., & Ouzounis, C. (2001). Textquest: Document clustering of medline abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput., 199, 384–395.

    Google Scholar 

  7. Jenssen, T., Laegreid, A., Komorowski, J., & Hovig, R. (2001). A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28, 21–28.

    Article  Google Scholar 

  8. Kanehisa, M., Goto, S., Kawashima, S., & Nakaya, A. (2002). The KEGG databases at GenomeNet. Nucleic Acids Research, 30(1), 42–46.

    Article  Google Scholar 

  9. Karopka, T., Scheel, T., Bansemer, S., & Glass, A. (2004). Automatic construction of gene relation networks using text mining and gene expression data. Medical Informatics and the Internet in Medicine, 29(2), 169-183.

    Article  Google Scholar 

  10. Mering, C. V., Zdobnov, E. M., Tsoka, S., Ciccarelli, F. D., Pereira-Leal, J. B., Ouzounis, C. A., et al. (2003). Genome evolution reveals biochemical networks and functional modules. Proceedings of the National Academy of Sciences of the USA, 100(26), 15428–15433.

    Article  Google Scholar 

  11. Pavlidis, P., & Grundy, W. N. (2000). Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines. Technical report, Columbia University Department of Computer Science.

  12. Raychaudhuri, S., Schutze, H., & Altman, R. B. (2003). Inclusion of textual documentation in the analysis of multidimensional data sets: Application to gene expression data. Machine Learning, 52, 119–145.

    Article  MATH  Google Scholar 

  13. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.

    Article  Google Scholar 

  14. Schlitt, T., Palin, K., Rung, J., Dietmann, S., Lappe, M., Ukkonen, E., et al. (2003). From gene networks to gene function. Genome Research, 13, 2568–2576.

    Article  Google Scholar 

  15. Schultz, J. M., & Liberman, M. (1999). Topic detection and tracking using idf-weighted cosine coefficient. Proceedings of the DARPA Broadcast News Workshop, pp. 189–192.

  16. Schwikowski, B., Uetz, P., & Fields, S. (2000). A network of protein–protein interactions in yeast. Nature Biotechnology, 18, 1257–1261.

    Article  Google Scholar 

  17. Sekimizu, T., Park, H., & Tsujii, J. (1998). Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. Genome Informatics, 9, 62–71.

    Google Scholar 

  18. Stein, L. (2003). Integrating biological databases. Nature, 4, 337–345.

    Google Scholar 

  19. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et. al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the USA, 96, 2907–2912.

    Article  Google Scholar 

Download references

Acknowledgement

We are grateful to the Language Technology Group, the University of Edinburgh (LT CHUNK) and AT&T Labs-Research (Graphviz), SGD (GoTermFinder) for making software available in the public domain. SS was funded by The Royal Thai Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujimarn Suwannaroj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suwannaroj, S., Niranjan, M. Enhancing Automatic Construction of Gene Subnetworks by Integrating Multiple Sources of Information. J Sign Process Syst Sign Image 50, 331–340 (2008). https://doi.org/10.1007/s11265-007-0148-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0148-4

Keywords

Navigation