A New Image Data Set and Benchmark for Cervical Dysplasia Classification Evaluation

  • Tao XuEmail author
  • Cheng Xin
  • L. Rodney Long
  • Sameer Antani
  • Zhiyun Xue
  • Edward Kim
  • Xiaolei Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9352)


Cervical cancer is one of the most common types of cancer in women worldwide. Most deaths of cervical cancer occur in less developed areas of the world. In this work, we introduce a new image dataset along with ground truth diagnosis for evaluating image-based cervical disease classification algorithms. We collect a large number of cervigram images from a database provided by the US National Cancer Institute. From these images, we extract three types of complementary image features, including Pyramid histogram in L*A*B* color space (PLAB), Pyramid Histogram of Oriented Gradients (PHOG), and Pyramid histogram of Local Binary Patterns (PLBP). PLAB captures color information, PHOG encodes edges and gradient information, and PLBP extracts texture information. Using these features, we run seven classic machine-learning algorithms to differentiate images of high-risk patient visits from those of low-risk patient visits. Extensive experiments are conducted on both balanced and imbalanced subsets of the data to compare the seven classifiers. These results can serve as a baseline for future research in cervical dysplasia classification using images. The image-based classifiers also outperform results of several other screening tests on the same datasets.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    WHO: Human papillomavirus and related cancers in world. In: ICO Information Centre on HPV and Cancer Summary Report, August 2014Google Scholar
  2. 2.
    Kim, E., Huang, X.: A data driven approach to cervigram image analysis and classification. In: Color Medical Image analysis, Lecture Notes in Computational Vision and Biomechanics, vol. 6, pp. 1–13 (2013)Google Scholar
  3. 3.
    Biscotti, C.V., Dawson, A.E., et al.: Assisted primary screening using the automated thinprep imaging system. AJCP 123(2), 281–287 (2005)Google Scholar
  4. 4.
    Wilbur, D.C., Black-Schaffer, W.S., Luff, R.D., et al.: The becton dickinson focalpoint gs imaging system: Clinical trials demonstrate significantly improved sensitivity for the detection of important cervical lesions. AJCP 132(5), 767–775 (2009)Google Scholar
  5. 5.
    Zhang, J., Liu, Y.: Cervical cancer detection using SVM based feature screening. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 873–880. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  6. 6.
    Herrero, R., Schiffman, M., Bratti, C., et al.: Design and methods of a population-based natural history study of cervical neoplasia in a rural province of costa rica: the guanacaste project. Rev. Panam. Salud Publica 1, 362–375 (1997)CrossRefGoogle Scholar
  7. 7.
    Jeronimo, J., Long, L.R., Neve, L., et al.: Digital tools for collecting data from cervigrams for research and training in colposcopy. Journal of Lower Genital Tract Disease 10(1), 16–25 (2006)CrossRefGoogle Scholar
  8. 8.
    Xu, T., Kim, E., Huang, X.: Adjustable adaboost classifier and pyramid features for image-based cervical cancer diagnosis. In: International Symposium on Biomedical Imaging (ISBI) (2015)Google Scholar
  9. 9.
    Morra, J.H., Tu, Z., Apostolova, L.G., et al.: Comparison of adaboost and support vector machines for detecting alzheimer’s disease through automated hippocampal segmentation. Medical Imaging 29, 30–43 (2010)CrossRefGoogle Scholar
  10. 10.
    Osareh, A., Mirmehdi, M., Thomas, B., Markham, R.: Comparative exudate classification using support vector machines and neural networks. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002, Part II. LNCS, vol. 2489, pp. 413–420. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  11. 11.
    Wei, L., Yang, Y., Nishikawa, R.M., Jiang, Y.: A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications. Medical Imaging 24, 371–380 (2005)CrossRefGoogle Scholar
  12. 12.
    Timoner, S.J., Golland, P., Kikinis, R., Shenton, M.E., Grimson, W.E.L., Wells III, W.M.: Performance issues in shape classification. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002, Part I. LNCS, vol. 2488, pp. 355–362. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  13. 13.
    Alexander, D.C., Zikic, D., Zhang, J., Zhang, H., Criminisi, A.: Image quality transfer via random forest regression: applications in diffusion MRI. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part III. LNCS, vol. 8675, pp. 225–232. Springer, Heidelberg (2014) Google Scholar
  14. 14.
    Hastie, T., Tibshirani, R., Friedman, J., et al.: The elements of statistical learning, vol. 2. Springer (2009)Google Scholar
  15. 15.
    Appel, R., Fuchs, T., Dollr, P., Perona, P.: Quickly boosting decision trees pruning underachieving features early. In: ICML (2013)Google Scholar
  16. 16.
    Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Goodfellow, I.J., Warde-Farley, D., Lamblin, P., et al.: Pylearn2: a machine learning research library (2013). arXiv:1308.4214
  18. 18.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001)Google Scholar
  19. 19.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research 15(1), 3133–3181 (2014)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Tao Xu
    • 1
    Email author
  • Cheng Xin
    • 1
  • L. Rodney Long
    • 2
  • Sameer Antani
    • 2
  • Zhiyun Xue
    • 2
  • Edward Kim
    • 3
  • Xiaolei Huang
    • 1
  1. 1.Computer Science & Engineering DepartmentLehigh UniversityBethlehemUSA
  2. 2.Communications Engineering BranchNLMBethesdaUSA
  3. 3.Computing Sciences DepartmentVillanova UniversityVillanovaUSA

Personalised recommendations