Abstract
Clustering functionally similar genes helps in understanding the mechanism of a biological pathway. It also provides information of those genes whose biological importance is previously not known. Clustering of genes is highly dependent on the similarity or dissimilarity criterion. Usually, microarray gene expression data is used to cluster genes. However, a microarray data may contain noise that may lead to undesired results. Therefore, incorporating gene ontology information may improve the clustering solutions. In this regard, an integrated dissimilarity measure is introduced for grouping functionally similar genes. It is comprised of city block distance and gene ontology based semantic dissimilarity. While, the city block distance is used to compute distance between two gene expression vectors, gene ontology based semantic dissimilarity measure is used for incorporating biological knowledge. The importance of the integrated dissimilarity measure is shown by incorporating it in different c-means clustering algorithms including rough-fuzzy clustering algorithms. In this work it has been shown that incorporation of integrated dissimilarity measure increases the functional similarity of cluster of genes as compared to the methods that are based on either type of dissimilarity measure. It is also observed that the rough-fuzzy clustering algorithm performs better with the new dissimilarity measure compared to different c-means clustering algorithms.
This is a preview of subscription content, log in via an institution.
References
Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: term finder open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715 (2004)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
de Jong, S., Boks, M.P.M., Fuller, T.F., Strengman, E., Janson, E., de Kovel, C.G.F., Ori, A.P.S., Vi, N., Mulder, F., Blom, J.D., Glenthj, B., Schubart, C.D., Cahn, W., Kahn, R.S., Horvath, S., Ophoff, R.A.: A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLOS One 7(6), 1–10 (2012)
Dembele, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Du, Z., Li, L., Chen, C.F., Yu, P.S., Wang, J.Z.: G-SESAME: web tools for go-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 37, W345–W349 (2009)
Eisen, M.B., Spellman, P.T., Patrick, O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9(11), 1106–1115 (1999)
Kustra, R., Zagdanski, A.: Incorporating gene ontology in clustering gene expression data. In: 19th IEEE Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 555–563 (2006)
Li, J., Bushel, P.R.: EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data. BMC Genomics 17(1), 255 (2016)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)
Maji, P., Pal, S.K.: RFCM: a hybrid clustering algorithm using rough and fuzzy sets. Fundam. Informaticae 80(4), 475–496 (2007)
Maji, P., Paul, S.: Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 286–299 (2013)
Pal, S.K., Ghosh, A., Shankar, B.U.: Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. Int. J. Remote Sens. 21(11), 2269–2300 (2000)
Paul, S., Maji, P.: City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. Mol. BioSyst. 10(6), 1509–1523 (2014)
Pramila, T., Miles, S., GuhaThakurta, D., Jemiolo, D., Breeden, L.L.: Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Genes Dev. 16(23), 3034–3045 (2002)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Sapra, A.K., Arava, Y., Khandelia, P., Vijayraghavan, U.: Genome-wide analysis of pre-mRNA splicing: intron features govern the requirement for the second-step factor, Prp17 in Saccharomyces cerevisiae and Schizosaccharomyces pombe. J. Biol. Chem. 279(50), 52437–52446 (2004)
Shamir, R., Sharan, R.: CLICK: a clustering algorithm for gene expression analysis. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (2000)
Wang, H., Wang, Z., Li, X., Gong, B., Feng, L., Zhou, Y.: A robust approach based on Weibull distribution for clustering gene expression data. Algorithms Mol. Biol. 6(1), 14 (2011)
Acknowledgements
The author wants to acknowledge Dr. Pradipta Maji of Indian Statistical Institute, Kolkata, India for his valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Paul, S. (2017). Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes. In: Polkowski, L., et al. Rough Sets. IJCRS 2017. Lecture Notes in Computer Science(), vol 10313. Springer, Cham. https://doi.org/10.1007/978-3-319-60837-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-60837-2_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60836-5
Online ISBN: 978-3-319-60837-2
eBook Packages: Computer ScienceComputer Science (R0)