ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography

Song, Bowen; Zhang, Guopeng; Zhu, Wei; Liang, Zhengrong

doi:10.1007/s11548-013-0913-8

ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography

Original Article
Published: 25 June 2013

Volume 9, pages 79–89, (2014)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Bowen Song^1,2,
Guopeng Zhang³,
Wei Zhu² &
…
Zhengrong Liang¹

1039 Accesses
51 Citations
5 Altmetric
Explore all metrics

Abstract

Purpose

Computer-aided detection and diagnosis (CAD) of colonic polyps always faces the challenge of classifying imbalanced data. In this paper, three new operating point selection strategies based on receiver operating characteristic curve are proposed to address the problem.

Methods

Classification on imbalanced data performs inferiorly because of a major reason that the best differentiation threshold shifts due to the degree of data imbalance. To address this decision threshold shifting issue, three operating point selection strategies, i.e., shortest distance, harmonic mean and anti-harmonic mean, are proposed and their performances are investigated.

Results

Experiments were conducted on a class-imbalanced database, which contains 64 polyps in 786 polyp candidates. Support vector machine (SVM) and random forests (RFs) were employed as basic classifiers. Two imbalanced data correcting techniques, i.e., cost-sensitive learning and training data down sampling, were applied to SVM and RFs, and their performances were compared with the proposed strategies. Comparing to the original thresholding method, i.e., 0.488 sensitivity and 0.986 specificity for RFs and 0.526 sensitivity and 0.977 specificity for SVM, our strategies achieved more balanced results, which are around 0.89 sensitivity and 0.92 specificity for RFs and 0.88 sensitivity and 0.90 specificity for SVM. Meanwhile, their performance remained at the same level regardless of whether other correcting methods are used.

Conclusions

Based on the above experiments, the gain of our proposed strategies is noticeable: the sensitivity improved from 0.5 to around 0.88 for RFs and 0.89 for SVM while remaining a relatively high level of specificity, i.e., 0.92 for RFs and 0.90 for SVM. The performance of our proposed strategies was adaptive and robust with different levels of imbalanced data. This indicates a feasible solution to the shifting problem for favorable sensitivity and specificity in CAD of polyps from imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A reliable method for colorectal cancer prediction based on feature selection and support vector machine

Article 26 November 2018

An Investigation of Texture Features Based on Polyp Size for Computer-Aided Diagnosis of Colonic Polyps

A Prediction Survival Model Based on Support Vector Machine and Extreme Learning Machine for Colorectal Cancer

References

American Cancer Society (2012) Cancer facts & figures 2012. American Cancer Society, Atlanta
Eddy D (1990) Screening for colorectal cancer. Ann Intern Med 113:373–384
Article CAS PubMed Google Scholar
Gluecker T, Johnson C, Harmsen W, Offord K, Harris A, Wilson L, Ahlquist D (2003) Colorectal cancer screening with CT colonography, colonoscopy, and double-contrast barium enema examination: prospective assessment of patient perceptions and preferences. Radiology 227(2):378–384
Article PubMed Google Scholar
Pickhardt P, Choi J, Hwang I, Butler J, Puckett M, Hildebrandt H, Wong R, Nugent P, Mysliwiec P, Schindler W (2003) Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults. N Engl J Med 349:2191–2200
Article CAS PubMed Google Scholar
Summers RM, Yao J, Pickhardt P, Franaszek M, Bitter I, Brickman D, Krishna V, Choi R (2005) Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology 129:1832–1844
Article PubMed Central PubMed Google Scholar
Wang S, Zhu H, Lu H, Liang Z (2008) Volume-based feature analysis of mucosa for automatic initial polyp detection in virtual colonoscopy. Int J Comput Assist Radiol Surg 3(1–2):131–142
Article PubMed Central PubMed Google Scholar
Zhu H, Fan Y, Lu H, Liang Z (2010) Improving initial polyp candidate extraction for CT colonography. Phys Med Biol 55:2087– 2102
Article PubMed Central PubMed Google Scholar
Hossain M, Hassan M, Kirley M, Bailey J (2008) ROC-tree: a novel decision tree induction algorithm based on receiver operating characteristics to classify gene expression data. In: Proceedings of the 2008 SIAM international conference on data mining (SDM), pp 455–465
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Article Google Scholar
Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Analysis in Artificial Intelligence, pp 71–80
Zhao P, Hoi SCH, Jin R, Yang T (2011) Online AUC maximization. In: Proceeding of international conference of machine learning
Yoshida H, Nappi J (2001) Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans Med Imag 20(12):1261–1274
Article CAS Google Scholar
Wang Z, Liang Z, Li L, Li X, Li B, Anderson J, Harrington D (2005) Reduction of false positives by internal features for polyp detection in CT-based virtual colonoscopy. Med Phys 32(12):3602–3616
Article PubMed Central PubMed Google Scholar
Liu J, Yao J, Summers R (2008) Scale-based scatter correction for computer-aided polyp detection in CT colonography. Med Phys 35(12):5664–5671
Article PubMed Google Scholar
Zhu H, Duan C, Pickhardt P, Wang S, Liang Z (2009) CAD of colonic polyps with level set-based adaptive convolution in volumetric mucosa to advance CT colonography toward a screening modality. J Cancer Manag Res DOVE Med Press 1:1–13
CAS Google Scholar
Marelo F, Musé P, Aguirre S, Sapiro G (2010) Automatic colon polyp flagging via geometric and texture features. Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pp 3170–3173
Zhu H, Fan Y, Lu H, Liang Z (2011) Improved curvature estimation for computer-aided detection of colonic polyps in CT colonography. Acad Radiol 18(8):1024–1034
Article PubMed Central PubMed Google Scholar
American College of Radiology (2005) ACR practice guideline for the performance of computed tomography (CT) colonography in adults. ACR Pract Guidel 29:295–298
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
Google Scholar
Morik K, Brokhausen P, Joachims T (1999) Combining statistical learning with a knowledge-based approach—a case study in intensive care monitoring. In: Proceedings 16th international conference on machine learning
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~jlin/libsvm
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings computer vision and pattern recognition pp 130–136
Pontil M, Verri A (1998) Object recognition with support vector machines. IEEE Trans Pattern Anal Mach Intell 20:637–646
Google Scholar
Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics. doi:10.1186/1471-2105-7-3
Alexandre LA, Casteleiro J, Nobreinst N (2007) Polyp detection in endoscopic video using SVMs. Lect Notes Comput Sci 4702:358–365
Google Scholar
Zhu H, Liang Z, Barish M, Pickhardt P, You J, Wang S, Fan Y, Lu H, Richards R, Posniak E, Cohen H (2010) Increasing computer-aided detection specificity by projection features for CT colonography. Med Phys 37(4):1468–1481
Article PubMed Google Scholar
Liu M, Lu L, Bi J, Raykar V, Wolf M, Salganicoff M (2011) Robust large scale prone-supine polyp matching using local features: a metric learning approach. Med Image Comput Assist Interv 14(3):75–82
Google Scholar
Liu M, Lu L, Ye X, Yu J, Salganicoff M (2011) Sparse classification for computer aided diagnosis using learned dictionaries. In: Proceedings of the 14th international conference on medical image computing and computer assisted intervention (MICCAI), September 18–22, 2011, Toronto, Canada
http://cran.r-project.org/web/packages/randomForest/index.html
Chen C, Liaw A, Breiman L (2004) Using random forest to learn Imbalanced data. Technical Report of Dept. of Stat., UC, Berkeley
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Blagus R, Lusa L (2010) Class prediction for high-dimensional class-imbalanced data. BMC Bioinformatics 11:523–539
Article PubMed Central PubMed Google Scholar
Maloof M (2003) Learning when data sets are imbalanced and when cost are unequal and unknown. In: Proceedings ICML workshop learn imbalanced data sets, pp 73–80

Download references

Acknowledgments

This work was supported in part by the NIH/NCI under Grants #CA082402 and #CA143111.

Conflict of Interest

Bowen Song, Guopeng Zhang, Wei Zhu and Zhengrong Liang declare that they have no conflict of interest.

Author information

Authors and Affiliations

Department of Radiology, Stony Brook University, Stony Brook, NY, 11790, USA
Bowen Song & Zhengrong Liang
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11790, USA
Bowen Song & Wei Zhu
Department of Biomedical Engineering, Fourth Military Medical University, Xi’an, 710032, Shaanxi, China
Guopeng Zhang

Authors

Bowen Song
View author publications
You can also search for this author in PubMed Google Scholar
Guopeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengrong Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengrong Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, B., Zhang, G., Zhu, W. et al. ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography. Int J CARS 9, 79–89 (2014). https://doi.org/10.1007/s11548-013-0913-8

Download citation

Received: 31 December 2012
Accepted: 10 June 2013
Published: 25 June 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11548-013-0913-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography