Skip to main content

Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10313))

Abstract

Clustering functionally similar genes helps in understanding the mechanism of a biological pathway. It also provides information of those genes whose biological importance is previously not known. Clustering of genes is highly dependent on the similarity or dissimilarity criterion. Usually, microarray gene expression data is used to cluster genes. However, a microarray data may contain noise that may lead to undesired results. Therefore, incorporating gene ontology information may improve the clustering solutions. In this regard, an integrated dissimilarity measure is introduced for grouping functionally similar genes. It is comprised of city block distance and gene ontology based semantic dissimilarity. While, the city block distance is used to compute distance between two gene expression vectors, gene ontology based semantic dissimilarity measure is used for incorporating biological knowledge. The importance of the integrated dissimilarity measure is shown by incorporating it in different c-means clustering algorithms including rough-fuzzy clustering algorithms. In this work it has been shown that incorporation of integrated dissimilarity measure increases the functional similarity of cluster of genes as compared to the methods that are based on either type of dissimilarity measure. It is also observed that the rough-fuzzy clustering algorithm performs better with the new dissimilarity measure compared to different c-means clustering algorithms.

This is a preview of subscription content, log in via an institution.

References

  1. Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: term finder open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715 (2004)

    Article  Google Scholar 

  2. Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)

    Article  Google Scholar 

  3. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  4. de Jong, S., Boks, M.P.M., Fuller, T.F., Strengman, E., Janson, E., de Kovel, C.G.F., Ori, A.P.S., Vi, N., Mulder, F., Blom, J.D., Glenthj, B., Schubart, C.D., Cahn, W., Kahn, R.S., Horvath, S., Ophoff, R.A.: A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLOS One 7(6), 1–10 (2012)

    Google Scholar 

  5. Dembele, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)

    Article  Google Scholar 

  6. Du, Z., Li, L., Chen, C.F., Yu, P.S., Wang, J.Z.: G-SESAME: web tools for go-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 37, W345–W349 (2009)

    Article  Google Scholar 

  7. Eisen, M.B., Spellman, P.T., Patrick, O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  8. Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9(11), 1106–1115 (1999)

    Article  Google Scholar 

  9. Kustra, R., Zagdanski, A.: Incorporating gene ontology in clustering gene expression data. In: 19th IEEE Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 555–563 (2006)

    Google Scholar 

  10. Li, J., Bushel, P.R.: EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data. BMC Genomics 17(1), 255 (2016)

    Article  Google Scholar 

  11. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  12. Maji, P., Pal, S.K.: RFCM: a hybrid clustering algorithm using rough and fuzzy sets. Fundam. Informaticae 80(4), 475–496 (2007)

    MathSciNet  MATH  Google Scholar 

  13. Maji, P., Paul, S.: Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 286–299 (2013)

    Article  Google Scholar 

  14. Pal, S.K., Ghosh, A., Shankar, B.U.: Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. Int. J. Remote Sens. 21(11), 2269–2300 (2000)

    Article  Google Scholar 

  15. Paul, S., Maji, P.: City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. Mol. BioSyst. 10(6), 1509–1523 (2014)

    Article  Google Scholar 

  16. Pramila, T., Miles, S., GuhaThakurta, D., Jemiolo, D., Breeden, L.L.: Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Genes Dev. 16(23), 3034–3045 (2002)

    Article  Google Scholar 

  17. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  18. Sapra, A.K., Arava, Y., Khandelia, P., Vijayraghavan, U.: Genome-wide analysis of pre-mRNA splicing: intron features govern the requirement for the second-step factor, Prp17 in Saccharomyces cerevisiae and Schizosaccharomyces pombe. J. Biol. Chem. 279(50), 52437–52446 (2004)

    Article  Google Scholar 

  19. Shamir, R., Sharan, R.: CLICK: a clustering algorithm for gene expression analysis. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (2000)

    Google Scholar 

  20. Wang, H., Wang, Z., Li, X., Gong, B., Feng, L., Zhou, Y.: A robust approach based on Weibull distribution for clustering gene expression data. Algorithms Mol. Biol. 6(1), 14 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

The author wants to acknowledge Dr. Pradipta Maji of Indian Statistical Institute, Kolkata, India for his valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sushmita Paul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Paul, S. (2017). Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes. In: Polkowski, L., et al. Rough Sets. IJCRS 2017. Lecture Notes in Computer Science(), vol 10313. Springer, Cham. https://doi.org/10.1007/978-3-319-60837-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60837-2_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60836-5

  • Online ISBN: 978-3-319-60837-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics