Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

López-Kleine, Liliana; Torres-Avilés, Francisco; Tejedor, Fabio H.; Gordillo, Luz A.

doi:10.1007/s00253-012-3917-3

Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Genomics, transcriptomics, proteomics
Published: 04 February 2012

Volume 93, pages 2091–2098, (2012)
Cite this article

Applied Microbiology and Biotechnology Aims and scope Submit manuscript

Liliana López-Kleine¹,
Francisco Torres-Avilés²,
Fabio H. Tejedor¹ &
…
Luz A. Gordillo¹

263 Accesses
6 Citations
Explore all metrics

Abstract

Interesting biological information as, for example, gene expression data (microarrays), can be extracted from publicly available genomic data. As a starting point in order to narrow down the great possibilities of wet lab experiments, global high throughput data and available knowledge should be used to infer biological knowledge and emit biological hypothesis. Here, based on microarray data, we propose the use of cluster and classification methods that have become very popular and are implemented in freely available software in order to predict the participation in virulence mechanisms of different proteins coded by genes of the pathogen Streptococcus pyogenes. Confidence of predictions is based on classification errors of known genes and repetitive prediction by more than three methods. A special emphasis is done on the nonlinear kernel classification methods used. We propose a list of interesting candidates that could be virulence factors or that participate in the virulence process of S. pyogenes. Biological validations should start using this list of candidates as they show similar behavior to known virulence factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phenotype and multi-omics comparison of Staphylococcus and Streptococcus uncovers pathogenic traits and predicts zoonotic potential

Article Open access 04 February 2021

Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

In silico clustering of Salmonella global gene expression data reveals novel genes co-regulated with the SPI-1 virulence genes through HilD

Article Open access 25 November 2016

Notes

For other interpretations of the values α _i please refer to Schebesch and Stecking (2005).

References

Bisno A, Brito M, Collins CM (2003) Molecular basis of group A streptococcal virulence. Lancet Infect Dis 3:191–200
Article CAS Google Scholar
Bleakley K, Biau G, Vert JP (2007) Supervised reconstruction of biological networks with local models. Bioinformatics 23:i57–i65
Article CAS Google Scholar
Clarke B, Fokoué E, Zhang H (2009) Principles and theory for data mining and machine learning. Springer, New York
Book Google Scholar
Cox KH, Ruiz-Bustos E, Courtney HS, Dale JB, Pence MA, Nizet V, Aziz RK, Gerling I, Price SM, Hasty DL (2009) Inactivation of DltA modulates virulence factor expression in Streptococcus pyogenes. PLoS One 4(4):e5366. doi:10.1371/journal.pone.0005366
Article Google Scholar
Friedman JH (1989) Regularized discriminant analysis. JASA 84:165–175
Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Kohonen T (2000) Self-organizing maps, 3rd edn. Springer-Verlag, Berlin
Google Scholar
Leiva-Valdebenito S, Torres-Avilés F (2010) A review of the most common partition algorithms in cluster analysis: a comparative study. Rev Colomb Estad 33(2):321–339
Google Scholar
López-Kleine L, Monnet V, Pechoux C, Trubuil A (2008) Role of bacterial peptidase F inferred by statistical analysis and further experimental validation. HFSP J 2:29–41
Article Google Scholar
López-Kleine L, Ospina L, Molano N (2012) Using multivariate methods to infer knowledge from genomic data. International Journal of Bioinformatics Research and Applications. (in press)
Qi Y, Klein-Seetharaman J, Bar-Joseph Z (2005) Random forest similarity for protein–protein interaction prediction from multiple sources. Pac Symp Biocomput 10:531–542
Article Google Scholar
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/
Sagar V, Kumar R, Ganguly NK, Chakraborti A (2008) Comparative analysis of emm type pattern of group A Streptococcus throat and skin isolates from India and their association with closely related SIC, a streptococcal virulence factor. BMC Microbiol 16(8):150
Article Google Scholar
Schebesch B, Stecking R (2005) Support vector machines for classifying and describing credit applicants: detecting typical and critical regions. J Oper Res Soc 56(9):1082–1088
Article Google Scholar
Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge
Google Scholar
Shelburne SA, Keith D, Horstmann N, Sumby P, Davenport MT, Graviss EA, Brennan RG, Musser JM (2008) A direct link between carbohydrate utilization and virulence in the major human pathogen group A Streptococcus. PNAS 105(5):1698–1703
Article CAS Google Scholar
Virtaneva K, Porcella SF, Graham MR, Ireland RM, Johnson CA, Ricklefs SM, Babar I, Parkins LD, Romero RA, Corn GJ, Gardner DJ, Bailey JR, Parnell MJ, Musser JM (2005) Longitudinal analysis of the group A Streptococcus transcriptome in experimental pharyngitis in cynomolgus macaques. PNAS 102:9014–9019
Article CAS Google Scholar
Werhli AV, Husmeier D (2007) Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6:15
Google Scholar
Yamanishi Y, Vert JP, Nakaya A, Kanehisa M (2003) Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 19:i323–i330
Article Google Scholar

Download references

Acknowledgments

This work was partially financed by the Fundación para el avance de la ciencia of the Colombian Banco de la República.

Author information

Authors and Affiliations

Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia
Liliana López-Kleine, Fabio H. Tejedor & Luz A. Gordillo
Departamento de Matemática y Ciencia de la Computación, Universidad de Santiago de Chile, Santiago, 8330111, Chile
Francisco Torres-Avilés

Authors

Liliana López-Kleine
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Torres-Avilés
View author publications
You can also search for this author in PubMed Google Scholar
Fabio H. Tejedor
View author publications
You can also search for this author in PubMed Google Scholar
Luz A. Gordillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liliana López-Kleine.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(XLSX 12 kb)

ESM 2

(XLSX 188 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Kleine, L., Torres-Avilés, F., Tejedor, F.H. et al. Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data. Appl Microbiol Biotechnol 93, 2091–2098 (2012). https://doi.org/10.1007/s00253-012-3917-3

Download citation

Received: 28 September 2011
Revised: 12 January 2012
Accepted: 19 January 2012
Published: 04 February 2012
Issue Date: March 2012
DOI: https://doi.org/10.1007/s00253-012-3917-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Abstract

Access this article

Similar content being viewed by others

Phenotype and multi-omics comparison of Staphylococcus and Streptococcus uncovers pathogenic traits and predicts zoonotic potential

Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

In silico clustering of Salmonella global gene expression data reveals novel genes co-regulated with the SPI-1 virulence genes through HilD

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Abstract

Access this article

Similar content being viewed by others

Phenotype and multi-omics comparison of Staphylococcus and Streptococcus uncovers pathogenic traits and predicts zoonotic potential

Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

In silico clustering of Salmonella global gene expression data reveals novel genes co-regulated with the SPI-1 virulence genes through HilD

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation