Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines

  • Sun Kim
  • Byoung-Tak Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)


Infection by the human papillomavirus (HPV) is associated with the development of cervical cancer. HPV can be classified to high- and low-risk type according to its malignant potential, and detection of the risk type is important to understand the mechanisms and diagnose potential patients. In this paper, we classify the HPV protein sequences by support vector machines. A string kernel is introduced to discriminate HPV protein sequences. The kernel emphasizes amino acids pairs with a distance. In the experiments, our approach is compared with previous methods in accuracy and F1-score, and it has showed better performance. Also, the prediction results for unknown HPV types are presented.


Support Vector Machine Cervical Cancer Support Vector Machine Method Risk Type Amino Acid Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bosch, F.X., Manos, M.M., et al.: Prevalence of Human Papillomavirus in Cervical Cancer: a Worldwide Perspective. Journal of the National Cancer Institute 87, 796–802 (1995)CrossRefGoogle Scholar
  2. 2.
    Janicek, M.F., Averette, H.E.: Cervical Cancer: Prevention, Diagnosis, and Therapeutics. Cancer Journals for Clinicians 51, 92–114 (2001)CrossRefGoogle Scholar
  3. 3.
    Furumoto, H., Irahara, M.: Human Papillomavirus (HPV) and Cervical Cancer. Journal of Medical Investigation 49, 124–133 (2002)Google Scholar
  4. 4.
    Centurioni, M.G., Puppo, A., et al.: Prevalence of Human Papillomavirus Cervical Infection in an Italian Asymptomatic Population. BMC Infectious Diseases 5(77) (2005)Google Scholar
  5. 5.
    Burk, R.D., Ho, G.Y., et al.: Sexual Behavior and Partner Characteristics Are the Predominant Risk Factors for Genital Human Papillomavirus Infection in Young Women. The Journal of Infectious Diseases 174, 679–689 (1996)CrossRefGoogle Scholar
  6. 6.
    Muñoz, N., Bosch, F.X., et al.: Epidemiologic Classification of Human Papillomavirus Types Associated with Cervical Cancer. New England Journal of Medicin 348, 518–527 (2003)CrossRefGoogle Scholar
  7. 7.
    Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting Protein Function from Sequence and Structural Data. Current Opinion in Structural Biology 15, 275–284 (2005)CrossRefGoogle Scholar
  8. 8.
    Borgwardt, K.M., Ong, C.S., et al.: Protein Function Prediction via Graph Kernels. In: Proceedings of Thirteenth International Conference on Intelligenc Systems for Molecular Biology, pp. 47–56 (2005)Google Scholar
  9. 9.
    Eom, J.-H., Park, S.-B., Zhang, B.-T.: Genetic mining of DNA sequence structures for effective classification of the risk types of human papillomavirus (HPV). In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1334–1343. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Joung, J.-G., O, S.J., Zhang, B.-T.: Prediction of the risk types of human papillomaviruses by support vector machines. In: Zhang, C., Guesgen, H.W., Yeap, W.-K. (eds.) PRICAI 2004. LNCS (LNAI), vol. 3157, pp. 723–731. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Park, S.-B., Hwang, S., Zhang, B.-T.: Mining the risk types of human papillomavirus (HPV) by adaCost. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 403–412. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Vapnik, V.N.: Statistical Learning Theory. Springer, Heidelberg (1998)MATHGoogle Scholar
  13. 13.
    Leslie, C., Eskin, E., Noble, W.S.: The Spectrum Kernel: A String Kernel for SVM Protein Classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)Google Scholar
  14. 14.
    Leslie, C., Eskin, E., et al.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20(4), 467–476 (2004)CrossRefGoogle Scholar
  15. 15.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  16. 16.
    The HPV sequence database in Los Alamos laboratory,
  17. 17.
    Pillai, M., Lakshmi, S., et al.: High-Risk Human Papillomavirus Infection and E6 Protein Expression in Lesions of the Uterine Cervix. Pathobiology 66, 240–246 (1998)CrossRefGoogle Scholar
  18. 18.
    Longuet, M., Beaudenon, S., Orth, G.: Two Novel Genital Human Papillomavirus (HPV) Types, HPV68 and HPV70, Related to the Potentially Oncogenic HPV39. Journal of Clinical Microbiology 34, 738–744 (1996)Google Scholar
  19. 19.
    Meyer, T., Arndt, R., et al.: Association of Rare Human Papillomavirus Types with Genital Premalignant and Malignant Lesions. The Journal of Infectious Diseases 178, 252–255 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sun Kim
    • 1
  • Byoung-Tak Zhang
    • 1
  1. 1.Biointelligence Laboratory, School of Computer Science and EngineeringSeoul National UniversitySeoulSouth Korea

Personalised recommendations