Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes

Paul, Sushmita

doi:10.1007/978-3-319-60837-2_47

Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes

Sushmita Paul²⁰

Conference paper
First Online: 22 June 2017

1093 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10313))

Abstract

Clustering functionally similar genes helps in understanding the mechanism of a biological pathway. It also provides information of those genes whose biological importance is previously not known. Clustering of genes is highly dependent on the similarity or dissimilarity criterion. Usually, microarray gene expression data is used to cluster genes. However, a microarray data may contain noise that may lead to undesired results. Therefore, incorporating gene ontology information may improve the clustering solutions. In this regard, an integrated dissimilarity measure is introduced for grouping functionally similar genes. It is comprised of city block distance and gene ontology based semantic dissimilarity. While, the city block distance is used to compute distance between two gene expression vectors, gene ontology based semantic dissimilarity measure is used for incorporating biological knowledge. The importance of the integrated dissimilarity measure is shown by incorporating it in different c-means clustering algorithms including rough-fuzzy clustering algorithms. In this work it has been shown that incorporation of integrated dissimilarity measure increases the functional similarity of cluster of genes as compared to the methods that are based on either type of dissimilarity measure. It is also observed that the rough-fuzzy clustering algorithm performs better with the new dissimilarity measure compared to different c-means clustering algorithms.

This is a preview of subscription content, log in via an institution.

References

Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: term finder open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20, 3710–3715 (2004)
Article Google Scholar
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2(1), 65–73 (1998)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Article Google Scholar
de Jong, S., Boks, M.P.M., Fuller, T.F., Strengman, E., Janson, E., de Kovel, C.G.F., Ori, A.P.S., Vi, N., Mulder, F., Blom, J.D., Glenthj, B., Schubart, C.D., Cahn, W., Kahn, R.S., Horvath, S., Ophoff, R.A.: A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLOS One 7(6), 1–10 (2012)
Google Scholar
Dembele, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Article Google Scholar
Du, Z., Li, L., Chen, C.F., Yu, P.S., Wang, J.Z.: G-SESAME: web tools for go-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 37, W345–W349 (2009)
Article Google Scholar
Eisen, M.B., Spellman, P.T., Patrick, O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)
Article Google Scholar
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9(11), 1106–1115 (1999)
Article Google Scholar
Kustra, R., Zagdanski, A.: Incorporating gene ontology in clustering gene expression data. In: 19th IEEE Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 555–563 (2006)
Google Scholar
Li, J., Bushel, P.R.: EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data. BMC Genomics 17(1), 255 (2016)
Article Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)
Google Scholar
Maji, P., Pal, S.K.: RFCM: a hybrid clustering algorithm using rough and fuzzy sets. Fundam. Informaticae 80(4), 475–496 (2007)
MathSciNet MATH Google Scholar
Maji, P., Paul, S.: Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 286–299 (2013)
Article Google Scholar
Pal, S.K., Ghosh, A., Shankar, B.U.: Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. Int. J. Remote Sens. 21(11), 2269–2300 (2000)
Article Google Scholar
Paul, S., Maji, P.: City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. Mol. BioSyst. 10(6), 1509–1523 (2014)
Article Google Scholar
Pramila, T., Miles, S., GuhaThakurta, D., Jemiolo, D., Breeden, L.L.: Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Genes Dev. 16(23), 3034–3045 (2002)
Article Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Sapra, A.K., Arava, Y., Khandelia, P., Vijayraghavan, U.: Genome-wide analysis of pre-mRNA splicing: intron features govern the requirement for the second-step factor, Prp17 in Saccharomyces cerevisiae and Schizosaccharomyces pombe. J. Biol. Chem. 279(50), 52437–52446 (2004)
Article Google Scholar
Shamir, R., Sharan, R.: CLICK: a clustering algorithm for gene expression analysis. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (2000)
Google Scholar
Wang, H., Wang, Z., Li, X., Gong, B., Feng, L., Zhou, Y.: A robust approach based on Weibull distribution for clustering gene expression data. Algorithms Mol. Biol. 6(1), 14 (2011)
Article Google Scholar

Download references

Acknowledgements

The author wants to acknowledge Dr. Pradipta Maji of Indian Statistical Institute, Kolkata, India for his valuable suggestions.

Author information

Authors and Affiliations

Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Jodhpur, India
Sushmita Paul

Authors

Sushmita Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sushmita Paul .

Editor information

Editors and Affiliations

Polish-Japanese Academy of Information Technology, Warsaw, Poland
Lech Polkowski
University of Regina, Regina, SK, Canada
Yiyu Yao
University of Warmia and Mazury, Olsztyn, Poland
Piotr Artiemjew
University of Milano-Bicocca, Milano, Italy
Davide Ciucci
Southwest Jiaotong University, Chengdu, China
Dun Liu
Warsaw University, Warszawa, Poland
Dominik Ślęzak
Silesian University, Sosnowiec, Poland
Beata Zielosko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paul, S. (2017). Integration of Gene Expression and Ontology for Clustering Functionally Similar Genes. In: Polkowski, L., et al. Rough Sets. IJCRS 2017. Lecture Notes in Computer Science(), vol 10313. Springer, Cham. https://doi.org/10.1007/978-3-319-60837-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-60837-2_47
Published: 22 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60836-5
Online ISBN: 978-3-319-60837-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics