Skip to main content

Application of the Bi-CoPaM Method to Five Escherichia Coli Datasets Generated under Various Biological Conditions

Abstract

The increasing amounts of high-throughput biological datasets stimulate the information engineering and machine learning research community to direct more studies towards designing and applying novel methods which are sophisticated and specialised to tackle the problems that are specific in such datasets. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method tackles the problem of scrutinising multiple gene expression microarray datasets to identify the subsets of genes which are consistently co-expressed across them. It allows for clustering results which better reflect the biological fact that most of the genes in any cell are expected to be irrelevant to the specific context in hand, as well as the fact that many genes might participate in multiple processes. This has been achieved by clustering the given set of genes while allowing any gene to have any of the three eventualities, to be exclusively assigned to a single cluster, to be simultaneously assigned to multiple clusters, or not to be assigned to any of the clusters. In this study, we expand the scope of application of the Bi-CoPaM method by applying it, for the first time, to bacterial datasets, namely to a set of five Escherichia coli bacterial datasets generated under different biological conditions, in order to identify the subsets of genes which are consistently co-expressed, i.e. well correlated with each other. We identify two clusters with such consistent co-expression, and interestingly, they themselves are consistently negatively correlated with each other. The first cluster is enriched with genes participating in protein synthesis and DNA repair while the second is enriched with transporting genes. Consequently, we draw biological hypotheses that relate some of the genes with currently unknown biological processes to their potential processes. These hypotheses can serve as pilots for focused future gene discovery studies.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4

References

  1. Piro, R. M., Ala, U., Molineris, I., Grassi, E., Bracco, C., Perego, G. P., et al. (2011). An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction. European Journal of Human Genetics, 19, 1173–1180.

    Article  Google Scholar 

  2. Cahan, P., Rovegno, F., Mooney, D., Newman, J. C., Laurent, G. S., & McCaffrey, T. A. (2007). Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene, 401(1–2), 12–18.

    Article  Google Scholar 

  3. Nilsson, R., Schultz, I. J., Pierce, E. L., Soltis, K. A., Naranuntarat, A., Ward, D. M., et al. (2009). Discovery of genes essential for heme biosynthesis through large-scale gene expression analysis. Cell Metabolism, 10, 119–130.

    Article  Google Scholar 

  4. Pena, J. M., Lozano, J. A., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters, 20(10), 1027–1040.

    Article  Google Scholar 

  5. Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). “Cluster analysis and display of genome-wide expression patterns,”. Proceedings of the National Academy of Science, 95, 14863–14868.

    Article  Google Scholar 

  6. Xiao, X., Dow, E.R., Eberhart, R., Miled, Z.B., Oppelt, R.J. (2003). “Gene clustering using self-organizing maps and particle swarm optimization,” in IEEE Parallel and Distributed Processing Symposium Proceedings, Indianapolis, pp. 154–163.

  7. Salem, S. A., Jack, L. B., & Nandi, A. K. (2008). Investigation of self-organizing oscillator networks for use in clustering microarray data. IEEE Transactions on Nanobioscience, 7(1), 65–79.

    Article  Google Scholar 

  8. Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artifcial Intelligence, 25(3), 337–372.

    Article  MathSciNet  Google Scholar 

  9. Fred, A., Jain, A. K. (2002). “Data clustering using evidence accumulation,” in Proceedings of the Sixteenth International Conference on Pattern Recognition (ICPR), vol. 4, pp. 276–280.

  10. Yu, Z., Wong, H. S., & Wang, H. (2007). Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics, 23(21), 2888–2896.

    Article  Google Scholar 

  11. Zhou, X., & Mao, K. Z. (2005). LS bound based gene selection for DNA microarray data. Bioinformatics, 21(8), 1559–1564.

    Article  Google Scholar 

  12. Avogadri, R., Valentini, G. (2008). “Ensemble clustering with a fuzzy approach,” in Supervised and Unsupervised Ensemble Methods and their Applications Studies in Computational Intelligence, Okun, O., Ed. Berlin: Springer-Verlag, vol. 126.

  13. Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013a). “Paradigm of Tunable Clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for Gene Discovery,” PLOS ONE, vol. 8, no. 2, doi: 10.1371/journal.pone.0056432.

  14. Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013c). “Identification of genes consistently co-expressed in multiple microarrays by a genome-wide approach,” in ICASSP, Vancouver, Canada, p. In press.

  15. Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013b). “Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments,” Journal of the Royal Society Interface, vol. 10, no. 81, doi: 10.1098/rsif.2012.0990.

  16. Abu-Jamous, B., Fa, R., Roberts, D.J., Nandi, A.K. (2013). “Method for the identification of the subsets of genes specifically consistently co-expressed in a set of datasets,” in Proceedings of the 2013 I.E. International Workshop on Machine Learning for Signal Processing (MLSP-2013), Southampton, UK.

  17. Wade, C. H., Umbarger, M. A., & McAlear, M. A. (2006). The budding yeast rRNA and ribosome biosynthesis (RRB) regulon contains over 200 genes. Yeast, 23, 293–306.

    Article  Google Scholar 

  18. Lee, J., Zhang, X. S., Hegde, M., Bentley, W. E., Jayaraman, A., & Wood, T. K. (2008). Indole cell signaling occurs primarily at low temperatures in escherichia coli. The ISME Journal, 2, 1007–1023.

    Article  Google Scholar 

  19. Laubacher, M. E., & Ades, S. E. (2008). The Rcs phosphorelay is a cell envelope stress response activated by peptidoglycan stress and contributes to intrinsic antibiotic resistance. Journal of Bacteriology, 190(6), 2065–2074.

    Article  Google Scholar 

  20. Kamenšek, S. and Žgur-Bertok, D. (2013). “Global transcriptional responses to the bacteriocin colicin M in Escherichia coli,” BMC Microbiology, vol. 13, no. 42, doi: 10.1186/1471-2180-13-42.

  21. Holm, A. K., Blank, L. M., Oldiges, M., Schmid, A., Solem, C., Jensen, P. R., et al. (2010). Metabolic and transcriptional response to cofactor perturbations in escherichia coli. The Journal of Biological Chemistry, 285(23), 17498–17506.

    Article  Google Scholar 

  22. Arunasri, K., Adil, M., Charan, K.V., Suvro, C., Reddy, S.H., and Shivaji, S. (2013). “Effect of simulated microgravity on E. coli K12 MG1655 growth and gene expression,” PLOS ONE, vol. 8, no. 3, doi: 10.1371/journal.pone.0057860.

  23. The Gene Ontology Consortium. (2013). “Gene Ontology annotations and resources,”. Nucleic Acids Research, 41, D530–D535. Database.

    Article  Google Scholar 

  24. Barria, C., Malecki, M., & Arraiano, C. M. (2013). Bacterial adaptation to cold. Microbiology, 159(12), 2437–2443.

    Article  Google Scholar 

  25. Orelle, C., Carlson, S., Kaushal, B., Almutairi, M. M., Liu, H., Ochabowicz, A., et al. (2013). Tools for characterizing bacterial protein synthesis inhibitors. Antimicrobial Agents and Chemotherapy, 57(12), 5994–6004.

    Article  Google Scholar 

  26. Shalgi, R., Hurt, J. A., Krykbaeva, I., Taipale, M., Lindquist, S., & Burge, C. B. (2013). Widespread regulation of translation by elongation pausing in heat shock. Molecular Cell, 49(3), 439–452.

    Article  Google Scholar 

  27. AmiGO. (2014). [Online]. http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

  28. Partridge, J. D., Browning, D. F., Xu, M., Newnham, L. J., Scott, C., Roberts, R. E., et al. (2008). Characterization of the Escherichia coli K-12 ydhYVWXUT operon: regulation by FNR, NarL and NarP. Microbiology, 154(2), 608–618.

    Article  Google Scholar 

Download references

Acknowledgments

This article summarises independent research funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004). The views expressed are those of the author (s) and not necessarily those of the NHS, the NIHR or the Department of Health. A.K. Nandi would like to thank TEKES for their award of the Finland Distinguished Professorship.

Disclosure Declaration

No conflict of interest declared.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asoke K. Nandi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abu-Jamous, B., Fa, R., Roberts, D.J. et al. Application of the Bi-CoPaM Method to Five Escherichia Coli Datasets Generated under Various Biological Conditions. J Sign Process Syst 79, 159–166 (2015). https://doi.org/10.1007/s11265-014-0919-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0919-7

Keywords

  • Genome-wide analysis
  • Consistent co-expression
  • Bi-CoPaM
  • Escherichia coli
  • Multiple datasets analysis