Mutual Information Based Extrinsic Similarity for Microarray Analysis

  • Duygu Ucar
  • Fatih Altiparmak
  • Hakan Ferhatosmanoglu
  • Srinivasan Parthasarathy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5462)


Genes responding similarly to changing conditions are believed to be functionally related. Identification of such functional relations is crucial for annotation of unknown genes as well as the exploration of the underlying regulatory program. Gene expression profiling experiments provide noisy datasets about how cells respond to different experimental conditions. One way of analyzing these datasets is the identification of gene groups with similar expression patterns. A prevailing technique to find gene pairs with correlated expression profiles is to use linear measures like Pearson’s correlation coefficient or Euclidean distance. Similar genes are later compiled into a co-expression network to explore the system-level functionality of genes. However, the noise inherent in microarray datasets reduces the sensitivity of these measures and produces many spurious pairs with no real biological relevance. In this paper, we explore an extrinsic way of calculating similarity of two genes based on their relations with other genes. We show that ‘similar’ pairs identified by extrinsic measures overlap better with known biological annotations available in the Gene Ontology database. Our results also indicate that extrinsic measures are useful in enhancing the quality of co-expression networks and their functional subnetworks.


Gene Ontology Mutual Information Gene Pair Semantic Similarity Neighborhood List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. 96, 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Asur, S., Ucar, D., Parthasarathy, S.: An ensemble framework for clustering protein-protein interaction networks. In: Proc. 15th Annual Int’l Conference on Intelligent Systems for Molecular Biology (ISMB) (2007)Google Scholar
  3. 3.
    Bader, G., Hogue, C.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4(2) (2003)Google Scholar
  4. 4.
    Carter, S., Brechbhler, C., Griffin, M., Bond, A.T.: Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20(14), 2242–2250 (2004)CrossRefPubMedGoogle Scholar
  5. 5.
    Das, G., Mannila, H., Ronkainen, P.: Similarity of attributes by external probes. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), pp. 23–29 (1998)Google Scholar
  6. 6.
    Das, G., Mannila, H., Ronkainen, P.: Similarity of attributes by external probes. Report C-1997-66, University of Helsinki, Department of Computer Science (October 1997)Google Scholar
  7. 7.
    Datta, S., Datta, S.: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 7(397) (2006)Google Scholar
  8. 8.
    Dhillon, I., Guan, Y., Kulis, B.: Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1944–1957 (2007)Google Scholar
  9. 9.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Hughes, T., et al.: Functional discovery via a compendium of expression profiles. Cell, 102 (2000)Google Scholar
  11. 11.
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. Int’l Conf. Research in Computational Linguistics, ROCKLING X (1997)Google Scholar
  12. 12.
    Lee, H., Hsu, A., Sajdak, J., Qin, J., Pavlidis, P.: Coexpression analysis of human genes across many microarray data sets. Genome Research 14, 1085–1094 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th Int’l Conf. Machine Learning (1998)Google Scholar
  14. 14.
    Marselos, M., Michalopoulos, G.: Changes in the pattern of aldehyde dehydrogenase activity in primary and metastatic adenocarcinomas of the human colon. Cancer letters 34(1), 27–37 (1987)CrossRefPubMedGoogle Scholar
  15. 15.
    Näthke, I.: Cytoskeleton out of the cupboard: colon cancer and cytoskeletal changes induced by loss of apc. Nature Reviews Cancer 6, 967–974 (2006)CrossRefPubMedGoogle Scholar
  16. 16.
    Oshimoto, H., Okamura, S., Yoshida, M., Mori, M.: Increased Activity and Expression of Phospholipase D2 in Human Colorectal CancerGoogle Scholar
  17. 17.
    Ostel, B.: Statistics in research basic concepts and techniques for research workers. Iowa State University Press, Ames (1963)Google Scholar
  18. 18.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 1, pp. 448–453 (1995)Google Scholar
  19. 19.
    Palmer, C., Faloutsos, C.: Electricity based external similarity of categorical attributes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS, vol. 2637. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Ravasz, E., et al.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002)CrossRefPubMedGoogle Scholar
  21. 21.
    Sevilla, J.L., et al.: Correlation between gene expression and go semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4) (2005)Google Scholar
  22. 22.
    Snel, B., Bork, P., Huynen, M.: The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. 99, 5890–5895 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks. PNAS 100(21) (2003)Google Scholar
  24. 24.
    Stuart, J., Segal, E., Koller, D., Kim, S.: A gene coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)CrossRefPubMedGoogle Scholar
  25. 25.
    Ucar, D., Altiparmak, F., Ferhatosmanoglu, H., Parthasarathy, S.: Investigating the use of extrinsic similarity measures for microarray analysis. In: Proceedings of the BIOKDD workshop at the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) (2007)Google Scholar
  26. 26.
    Ucar, D., Asur, S., Catalyurek, U., Parthasarathy, S.: Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS, vol. 4213, pp. 371–382. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Ucar, D., Neuhaus, I., Ross-MacDonald, P., Tilford, C., Parthasarathy, S., Siemers, N., Ji, R.: Construction of a reference gene association network from multiple profiling data: application to data analysis. Bioinformatics 23(20), 2716 (2007)CrossRefPubMedGoogle Scholar
  28. 28.
    Wang, Q., Wang, X., Evers, B.: Induction of cIAP-2 in human colon cancer cells through PKC/NF-B. J. Biol. Chem. 278, 51091–51099 (2003)CrossRefPubMedGoogle Scholar
  29. 29.
    Zhang, B., Horvath, S.: A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology 4(1) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Duygu Ucar
    • 1
  • Fatih Altiparmak
    • 2
  • Hakan Ferhatosmanoglu
    • 1
  • Srinivasan Parthasarathy
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA
  2. 2.ASELSAN A.S. Radar, EW, and Intelligence Systems DivisionTurkey

Personalised recommendations