Effect of Hundreds Sequenced Genomes on the Classification of Human Papillomaviruses
The classification of the hundreds of papillomaviruses (PVs) still constitutes a major issue in virology, disease diagnosis, and therapy. Since 2003, PVs are classified within three levels of hierarchical clusters according to their similarity and their position in the phylogenetic tree, using the DNA sequence of the L1 gene. With the increased number of sequenced genomes, the boundaries of the different clusters within the different levels might overlap and the topology of the associated tree could change, thus avoiding a unique and coherent classification. Here, we studied the classification of 560 currently available human PVs (HPV) with respect to the criteria established 10 years ago as well as novel ones. The results highlight that current taxonomic identification does fit with the monophyletic criteria for the L1 gene, but the sequence similarity criteria violates the established boundaries to classify PVs. Finally, we argue that the substitution of L1 gene similarity by the whole genome similarity would allow to have less overlap between the different clusters and provide a better classification.
KeywordsGene Tree Pairwise Similarity Taxonomic Tree Cohesion Index Genome Tree
- Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In 2010 IEEE 10th International Conference on Data Mining (ICDM) (pp. 911–916).Google Scholar
- Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). Newton: Butterworth-Heinemann.Google Scholar