Application of the Bi-CoPaM Method to Five Escherichia Coli Datasets Generated under Various Biological Conditions
- 315 Downloads
The increasing amounts of high-throughput biological datasets stimulate the information engineering and machine learning research community to direct more studies towards designing and applying novel methods which are sophisticated and specialised to tackle the problems that are specific in such datasets. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method tackles the problem of scrutinising multiple gene expression microarray datasets to identify the subsets of genes which are consistently co-expressed across them. It allows for clustering results which better reflect the biological fact that most of the genes in any cell are expected to be irrelevant to the specific context in hand, as well as the fact that many genes might participate in multiple processes. This has been achieved by clustering the given set of genes while allowing any gene to have any of the three eventualities, to be exclusively assigned to a single cluster, to be simultaneously assigned to multiple clusters, or not to be assigned to any of the clusters. In this study, we expand the scope of application of the Bi-CoPaM method by applying it, for the first time, to bacterial datasets, namely to a set of five Escherichia coli bacterial datasets generated under different biological conditions, in order to identify the subsets of genes which are consistently co-expressed, i.e. well correlated with each other. We identify two clusters with such consistent co-expression, and interestingly, they themselves are consistently negatively correlated with each other. The first cluster is enriched with genes participating in protein synthesis and DNA repair while the second is enriched with transporting genes. Consequently, we draw biological hypotheses that relate some of the genes with currently unknown biological processes to their potential processes. These hypotheses can serve as pilots for focused future gene discovery studies.
KeywordsGenome-wide analysis Consistent co-expression Bi-CoPaM Escherichia coli Multiple datasets analysis
This article summarises independent research funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004). The views expressed are those of the author (s) and not necessarily those of the NHS, the NIHR or the Department of Health. A.K. Nandi would like to thank TEKES for their award of the Finland Distinguished Professorship.
No conflict of interest declared.
- 6.Xiao, X., Dow, E.R., Eberhart, R., Miled, Z.B., Oppelt, R.J. (2003). “Gene clustering using self-organizing maps and particle swarm optimization,” in IEEE Parallel and Distributed Processing Symposium Proceedings, Indianapolis, pp. 154–163.Google Scholar
- 9.Fred, A., Jain, A. K. (2002). “Data clustering using evidence accumulation,” in Proceedings of the Sixteenth International Conference on Pattern Recognition (ICPR), vol. 4, pp. 276–280.Google Scholar
- 12.Avogadri, R., Valentini, G. (2008). “Ensemble clustering with a fuzzy approach,” in Supervised and Unsupervised Ensemble Methods and their Applications Studies in Computational Intelligence, Okun, O., Ed. Berlin: Springer-Verlag, vol. 126.Google Scholar
- 13.Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013a). “Paradigm of Tunable Clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for Gene Discovery,” PLOS ONE, vol. 8, no. 2, doi: 10.1371/journal.pone.0056432.
- 14.Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013c). “Identification of genes consistently co-expressed in multiple microarrays by a genome-wide approach,” in ICASSP, Vancouver, Canada, p. In press.Google Scholar
- 15.Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013b). “Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments,” Journal of the Royal Society Interface, vol. 10, no. 81, doi: 10.1098/rsif.2012.0990.
- 16.Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013). “Method for the identification of the subsets of genes specifically consistently co-expressed in a set of datasets,” in Proceedings of the 2013 I.E. International Workshop on Machine Learning for Signal Processing (MLSP-2013), Southampton, UK.Google Scholar
- 20.Kamenšek, S. and Žgur-Bertok, D. (2013). “Global transcriptional responses to the bacteriocin colicin M in Escherichia coli,” BMC Microbiology, vol. 13, no. 42, doi: 10.1186/1471-2180-13-42.
- 22.Arunasri, K., Adil, M., Charan, K.V., Suvro, C., Reddy, S.H., and Shivaji, S. (2013). “Effect of simulated microgravity on E. coli K12 MG1655 growth and gene expression,” PLOS ONE, vol. 8, no. 3, doi: 10.1371/journal.pone.0057860.
- 27.AmiGO. (2014). [Online]. http://amigo.geneontology.org/cgi-bin/amigo/go.cgi