Clustering Metagenome Short Reads Using Weighted Proteins
This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a high accuracy of organism content.
Unable to display preview. Download preview PDF.
- 2.Chan, C.K., Hsu, A.L., Tang, S., Halgamuge, S.K.: Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. Journal of Biomedicine and Biotechnology (2008)Google Scholar
- 3.Dalevi, D., Ivanova, N.N., Mavromatis, K., Hooper, S.D., Szeto, E., Hugenholtz, P., Kyrpides, N.C., Markowitz, V.M.: Annotation of metagenome short reads using proxygenes. Bioinformatics 24(16) (2008)Google Scholar
- 7.Korf, I., Yandell, M., Bedell, J.: BLAST. O’Reilly & Associates, Inc., Sebastopol (2003)Google Scholar
- 8.Li, W., Wooley, J.C., Godzik, A.: Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE 3(10) (2008)Google Scholar
- 9.Madden, T.: The BLAST Sequence Analysis Tool, ch. 16. Bethesda, MD (2002)Google Scholar
- 10.Marchiori, E., Steenbeek, A.: An evolutionary algorithm for large scale set covering problems with application to airline crew scheduling. In: Oates, M.J., Lanzi, P.L., Li, Y., Cagnoni, S., Corne, D.W., Fogarty, T.C., Poli, R., Smith, G.D. (eds.) EvoIASP 2000, EvoWorkshops 2000, EvoFlight 2000, EvoSCONDI 2000, EvoSTIM 2000, EvoTEL 2000, and EvoROB/EvoRobot 2000. LNCS, vol. 1803, pp. 367–381. Springer, Heidelberg (2000)CrossRefGoogle Scholar