Abstract
In this work, we are proposing a new machine learning strategy for classification task for imbalanced data. We are using lung image data by Lung Image Database Consortium (LIDC), since LIDC data is a better example for imbalanced dataset. In this work we are using sufficiently large dataset which contains 4,532 nodules extracted from CT images. Later we consider 55 low level nodule image features and radiologists ratings for experiments. This work is being dealt in two stages. (1) data level learning and (2) algorithm level learning. In first stage, we are balancing the dataset prior to classification process. We are using resampling approach for this task. In second stage, we are using ensemble of classifiers to predict lung nodule rating. We are using wide range of classifier models for constructing an ensemble. We use Bagged Decision Tree, naïve Bayes, Boosted Decision Trees, and Support Vector Machine (SVM) in a classifier library. Stacking algorithm is used to combine the different classifier models in library to construct higher level ensemble. We are evaluating the performance of our model on five metrics: Accuracy, precision, recall, F-score and Kappa statistics. Results show that our method yields much improved scores as we are refining at both, data level and algorithm level.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Varutbangkul E, Mitrovic V, Raichu D, Furst J (2008) Combining boundaries abd rating from multiple observers for predicting lung nodule characteristics. In: IEEE international conference on biocomputing, bioinformatics and biomedical technologies, pp 82–87
Ebadollahi S, Johnson DE, Diao M (2008) Retrieving clinical cases through a concept space representation of text and images. SPIE Medical Imaging 2008: PACS and Imaging Informatics. 6919(7). ISBN: 9780819471031
Nakumura K, Yoshida H, Engelmann R, MacMahon H, Kasturagawa S, Ishida T et al (2000) Computerized analysis of the likelihood of malignancy in solitary pulmonary nodules with use of artificial neural networks. Radiology 214(3):823–830
Zinovev D, Raicu D, Furst J, Armato SG (2009) Predicting radiological panel opinions using a panel of machine learning classifiers. Algorithms 2:1473–1502. doi:10.3390/a2041473
Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inf Fusion 9(1):4–20
Reid S (2007) A review of heterogeneous ensemble methods. Department of Computer Science, University of Colorado at Boulder
Kuncheva LI, Rodriguez JJ (2010) Classifier ensemble for fMRI data analysis: an experiment, magnetic resonance imaging, vol 28. Elsevier Publications, pp 583–593
Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: 21st international conference on machine learning, Banff, Canada
Datta S, Pihur V, Datta S (2010) An adaptive optimal ensemble classifier via bagging and rank aggregation with application to high dimension data. BioMed Central 1471-2105/11/427, BMC Bioinformatics
Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273, Kluwer Academic Publishers
Vinay K, Rao A, Hemantha Kumar G (2011) Comparative study on performance of single classifier with ensemble of classifiers in predicting radiological experts ratings on lung nodules. In: Indian international conference on artificial intelligence (IICAI). ISBN: 978-0-9727412-8-6, pp 393–403
Ting KM, Witten IH (1999) Issues in stacked generalization. J Artificial Intell Res 10:271–289
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov
Vinay K, Rao A, Hemantha Kumar G (2012) Sampling driven approaches for lung nodule characteristic rating predication. In: The 3rd international conference on intelligent information systems and management (IISM), ISBN No.: 978-93-90716-96-1
Chawla NV, Bowye KW, Hal LO, Kegelmeye WP (2002) SMOTE: synthetic minority over-sampling technique. J Artificial Intell Res 16:321–357
Vinay K, Rao A, Hemantha Kumar G (2011) Computerized analysis of classification of lung nodules and comparison between homogeneous and heterogeneous ensemble of classifier model. In: 3rd national conference on computer vision, pattern recognition, image processing and graphics, 978-0-7695-4599-8/11, IEEE doi:10.1109/NCVPRIPG.2011.56, pp 231–234
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45. doi:10.1109/MCAS.2006.1688199
Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Shavlik J (ed) Machine learning: proceedings of the fifteenth international conference. Morgan Kaufmann Publishers, San Francisco
Polikar R (2009) Ensemble learning. Scholarpedia 4(1):2776
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer India
About this paper
Cite this paper
Kumar, V., Rao, A., Hemanthakumar, G. (2013). Stacked Classifier Model with Prior Resampling for Lung Nodule Rating Prediction. In: Swamy, P., Guru, D. (eds) Multimedia Processing, Communication and Computing Applications. Lecture Notes in Electrical Engineering, vol 213. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1143-3_11
Download citation
DOI: https://doi.org/10.1007/978-81-322-1143-3_11
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1142-6
Online ISBN: 978-81-322-1143-3
eBook Packages: EngineeringEngineering (R0)