A Complex Bio-networks of the Function Profile of Genes
This paper presents a novel model of concept representation using a multilevel geometric structure, which is called Latent Semantic Networks. Given a set of documents, the associations among frequently co-occurring terms in any of the documents define naturally a geometric complex, which can then be decomposed into connected components at various levels.
This hierarchical model of knowledge representation was validated in the functional profiling of genes. Our approach excelled the traditional approach of vector-based document clustering by the geometrical forms of frequent itemsets generated by the association rules. The biological profiling of genes were a complex of concepts, which could be decomposed into primitive concepts, based on which the relevant literature could be clustered in adequate ”resolution” of contexts. The hierarchical representation could be validated with tree-based biomedical ontological frameworks, which had been applied for years, and been recently enriched by the online availability of Unified Medical Language System (UMLS) and Gene Ontology (GO).
Demonstration of the model and the clustering would be performed on the relevant GeneRIF (References into Function) document set of NOD2 gene. Our geometrical model is suitable for representation of bio-logical information, where hierarchical concepts in different complexity could be explored interactively according to the context of application and the various needs of the researchers. An online clustering search engine for use on general purpose and for biomedical use, managing the search results from Google or from PubMed, are constructed based on the methodology (http://ginni.bme.ntu.edu.tw). The hierarchical presentation of clustering results and the interactive graphical display of the contents of each cluster shows the merits of our approach.
KeywordsGene Ontology Association Rule NOD2 Gene Document Cluster Primitive Concept
Unable to display preview. Download preview PDF.
- 1.GO Consortium. Go tools: Editors, browsers, general go tools and other tools (2004), http://www.geneontology.org/doc/GO.tools.html
- 2.Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl. 1), S74–S82 (2001)Google Scholar
- 3.Girvan, M., Newman, M.: Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences, vol. 99, pp. 8271–8276 (2002)Google Scholar
- 5.Joshi, A., Jiang, Z.: Retriever: Improving web search engine results using clustering. In: Gangopadhyay, A. (ed.) Managing Business with Electronic Commerce: Issues and Trends, chapter 4, World Scientific, Singapore (2001)Google Scholar
- 6.Kankar, P., Adak, S., Sarkar, A., Murali, K., Sharma, G.: Medmesh summarizer: Text mining for gene clusters. In: Proceedings of the Second SIAM International Conference on Data Mining, SIAM, Philadelphia (April 2002)Google Scholar
- 8.Shatkay, H., Edwards, S., Wilbur, W.J., Boguski, M.: Genes, themes and microarrays: Using information retrieval for large-scale gene analysis. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 317–328 (2000)Google Scholar