A Search for the Best Data Mining Method to Predict Melanoma
Our main objective was to decrease the error rate of diagnosis of melanoma, a very dangerous skin cancer. Since diagnosticians routinely use the so-called ABCD formula for melanoma prediction, our main concern was to improve the ABCD formula. In our search for the best coefficients of the ABCD formula we used two different discretization methods, agglomerative and divisive, both based on cluster analysis. In our experiments we used the data mining system LERS (Learning from Examples based on Rough Sets). As a result of more than 30,000 experiments, two optimal ABCD formulas were found, one with the use of the agglomerative method, the other one with divisive. These formulas were evaluated using statistical methods. Our final conclusion is that it is more important to use an appropriate discretization method than to modify the ABCD formula. Also, the divisive method of discretization is better than agglomerative. Finally, diagnosis of melanoma without taking into account results of the ABCD formula is much worse, i.e., the error rate is significantly greater, comparing with any form of the ABCD formula.
KeywordsRough set theory data mining melanoma prediction ABCD formula discretization
Unable to display preview. Download preview PDF.
- 1.Bajcar, S., Grzymala-Busse, J. W., and Hippe. Z. S.: A comparison of six discretization algorithms used for prediction of melanoma. Accepted for the Eleventh International Symposium on Intelligent Information Systems, Poland, June 3–6, 2002.Google Scholar
- 2.Booker, L. B., Goldberg, D. E., and Holland J. F.: Classifier systems and genetic algorithms. In Machine Learning. Paradigms and Methods. Carbonell, J. G. (Ed.), The MIT Press, Boston, MA, 1990, 235–282.Google Scholar
- 5.Grzymala-Busse, J. P., Grzymala-Busse, J. W., and Hippe Z. S.: Melanoma prediction using data mining system LERS. Proceeding of the 25th Anniversary Annual International Computer Software and Applications Conference COMPSAC 2001, October 8–12, 2001, Chicago, IL, 615–620.Google Scholar
- 6.Grzymala-Busse, J. W.: LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Slowinski, R. (ed.), Kluwer Academic Publishers, Dordrecht, Boston, London, 1992, 3–18.Google Scholar
- 8.Grzymala-Busse J. W. and Hippe Z. S.: Postprocessing of rule sets induced from a melanoma data set. Accepted for the COMPSAC 2002, 26th Annual International Conference on Computer Software and Applications, Oxford, England, August 26–29, 2002.Google Scholar
- 9.Hippe, Z. S.: Computer database NEVI on endargement by melanoma. Task Quarterly 4, 1999, 483–488.Google Scholar
- 10.Holland, J. H., Holyoak, K. J., and Nisbett, R. E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Boston, MA, 1986.Google Scholar
- 13.Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London, 1991.Google Scholar
- 14.Peterson, N.: Discretization using divisive cluster analysis and selected postprocessing techniques. Department of Computer Science, University of Kansas, internal report, 1993.Google Scholar
- 15.Stolz, W., Braun-Falco, O., Bilek, P., Landthaler, A. B., Cogneta, A. B.: Color Atlas of Dermatology, Blackwell Science Inc., Cambridge, MA, 1993.Google Scholar