Skip to main content
Log in

Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

  • Genomics, transcriptomics, proteomics
  • Published:
Applied Microbiology and Biotechnology Aims and scope Submit manuscript

Abstract

Interesting biological information as, for example, gene expression data (microarrays), can be extracted from publicly available genomic data. As a starting point in order to narrow down the great possibilities of wet lab experiments, global high throughput data and available knowledge should be used to infer biological knowledge and emit biological hypothesis. Here, based on microarray data, we propose the use of cluster and classification methods that have become very popular and are implemented in freely available software in order to predict the participation in virulence mechanisms of different proteins coded by genes of the pathogen Streptococcus pyogenes. Confidence of predictions is based on classification errors of known genes and repetitive prediction by more than three methods. A special emphasis is done on the nonlinear kernel classification methods used. We propose a list of interesting candidates that could be virulence factors or that participate in the virulence process of S. pyogenes. Biological validations should start using this list of candidates as they show similar behavior to known virulence factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For other interpretations of the values α i please refer to Schebesch and Stecking (2005).

References

  • Bisno A, Brito M, Collins CM (2003) Molecular basis of group A streptococcal virulence. Lancet Infect Dis 3:191–200

    Article  CAS  Google Scholar 

  • Bleakley K, Biau G, Vert JP (2007) Supervised reconstruction of biological networks with local models. Bioinformatics 23:i57–i65

    Article  CAS  Google Scholar 

  • Clarke B, Fokoué E, Zhang H (2009) Principles and theory for data mining and machine learning. Springer, New York

    Book  Google Scholar 

  • Cox KH, Ruiz-Bustos E, Courtney HS, Dale JB, Pence MA, Nizet V, Aziz RK, Gerling I, Price SM, Hasty DL (2009) Inactivation of DltA modulates virulence factor expression in Streptococcus pyogenes. PLoS One 4(4):e5366. doi:10.1371/journal.pone.0005366

    Article  Google Scholar 

  • Friedman JH (1989) Regularized discriminant analysis. JASA 84:165–175

    Google Scholar 

  • Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108

    Article  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  Google Scholar 

  • Kohonen T (2000) Self-organizing maps, 3rd edn. Springer-Verlag, Berlin

    Google Scholar 

  • Leiva-Valdebenito S, Torres-Avilés F (2010) A review of the most common partition algorithms in cluster analysis: a comparative study. Rev Colomb Estad 33(2):321–339

    Google Scholar 

  • López-Kleine L, Monnet V, Pechoux C, Trubuil A (2008) Role of bacterial peptidase F inferred by statistical analysis and further experimental validation. HFSP J 2:29–41

    Article  Google Scholar 

  • López-Kleine L, Ospina L, Molano N (2012) Using multivariate methods to infer knowledge from genomic data. International Journal of Bioinformatics Research and Applications. (in press)

  • Qi Y, Klein-Seetharaman J, Bar-Joseph Z (2005) Random forest similarity for protein–protein interaction prediction from multiple sources. Pac Symp Biocomput 10:531–542

    Article  Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

  • Sagar V, Kumar R, Ganguly NK, Chakraborti A (2008) Comparative analysis of emm type pattern of group A Streptococcus throat and skin isolates from India and their association with closely related SIC, a streptococcal virulence factor. BMC Microbiol 16(8):150

    Article  Google Scholar 

  • Schebesch B, Stecking R (2005) Support vector machines for classifying and describing credit applicants: detecting typical and critical regions. J Oper Res Soc 56(9):1082–1088

    Article  Google Scholar 

  • Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge

    Google Scholar 

  • Shelburne SA, Keith D, Horstmann N, Sumby P, Davenport MT, Graviss EA, Brennan RG, Musser JM (2008) A direct link between carbohydrate utilization and virulence in the major human pathogen group A Streptococcus. PNAS 105(5):1698–1703

    Article  CAS  Google Scholar 

  • Virtaneva K, Porcella SF, Graham MR, Ireland RM, Johnson CA, Ricklefs SM, Babar I, Parkins LD, Romero RA, Corn GJ, Gardner DJ, Bailey JR, Parnell MJ, Musser JM (2005) Longitudinal analysis of the group A Streptococcus transcriptome in experimental pharyngitis in cynomolgus macaques. PNAS 102:9014–9019

    Article  CAS  Google Scholar 

  • Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6:15

    Google Scholar 

  • Yamanishi Y, Vert JP, Nakaya A, Kanehisa M (2003) Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 19:i323–i330

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially financed by the Fundación para el avance de la ciencia of the Colombian Banco de la República.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liliana López-Kleine.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(XLSX 12 kb)

ESM 2

(XLSX 188 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Kleine, L., Torres-Avilés, F., Tejedor, F.H. et al. Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data. Appl Microbiol Biotechnol 93, 2091–2098 (2012). https://doi.org/10.1007/s00253-012-3917-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00253-012-3917-3

Keywords

Navigation