Abstract
The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients’ survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90 % for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame.
Similar content being viewed by others
References
Fabregue, M., Bringay, S., Poncelet, P., Teisseire, M., and Orsetti, B., Mining microarray data to predict the histological grade of a breast cancer. J. Biomed. Inform. 44(1):S12–S16, 2011. doi:10.1016/j.jbi.2011.03.002.
Department of Health, Executive Yuan, R.O.C., 2013. Retrieved from http://www.mohw.gov.tw/cht/DOS/Statistic.aspx?f_list_no=312&fod_list_no=2747.
Hartmann, S., Reimer, T., and Gerber, B., Management of early invasive breast cancer in very young women (<35 years). Clin. Breast Cancer 11(4):196–203, 2011. doi:10.1016/j.clbc.2011.06.001.
Jerez-Aragonés, J. M., Gomez-Ruiz, J. A., Ramos-Jimenez, G., Munoz-Perez, J., and Alba-Conejo, E., A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27(1):45–63, 2003. doi:10.1016/S0933-3657(02)00086-6.
O’Malley, C. D., Le, G. M., Glaser, S. L., Shema, S. J., and West, D. W., Socioeconomic status and breast carcinoma survival in four racial/ethnic groups: A population-based study. Am. Cancer Soc. 97(5):1303–1311, 2003. doi:10.1002/cncr.11160.
Nahar, J., Imam, T., Tickle, K. S., Ali, A. B. M. S., and Chen, Y.-P. P., Computational intelligence for microarray data and biomedical image analysis for the early diagnosis of breast cancer. Expert Syst. Appl. 39(16):12371–12377, 2012. doi:10.1016/j.eswa.2012.04.045.
Keles, A., Keles, A., and Yavuz, U., Expert system based on neuro-fuzzy rules for diagnosis breast cancer. Expert Syst. Appl. 38(5):5719–5726, 2011. doi:10.1016/j.eswa.2010.10.061.
Luo, S. T., and Cheng, B. W., Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J. Med. Syst. 36(2):569–577, 2012. doi:10.1007/s10916-010-9518-8.
Fan, C.-Y., Chang, P.-C., Lin, J.-J., and Hsieh, J. C., A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl. Soft Comput. 11(1):632–644, 2011. doi:10.1016/j.asoc.2009.12.023.
D’Eredita, G., Giardina, C., Martellotta, M., Natale, T., and Ferrarese, F., Prognostic factors in breast cancer: the predictive value of the Nottingham Prognostic Index in patients with a long-term follow-up that were treated in a single institution. Eur. J. Cancer 37(1):591–596, 2001. doi:10.1016/s0959-8049(00)00435-4.
Liao, H. C., and Tsai, J. H., Data mining for DNA viruses with breast cancer, fibroadenoma, and normal mammary tissue. Appl. Math. Comput. 188(1):989–1000, 2007. doi:10.1016/j.amc.2006.10.069.
Chhatwal, J., Alagoz, O., Lindstrom, M. J., Kahn, C. E., Jr., Shaffer, K. A., and Burnside, E. S., A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. Am. J. Roentgenol. 192(4):1117–1127, 2009. doi:10.2214/AJR.07.3345.
Richards, G., Rayward-Smith, V. J., Sonksen, P. H., Carey, S., and Weng, C., Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22(3):215–231, 2001. doi:10.1016/S0933-3657(00)00110-X.
Pendharkar, P. C., Rodger, J. A., Yaverbaum, G., Herman, N., and Benner, M., Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst. Appl. 17(3):223–232, 1999. doi:10.1016/S0957-4174(99)00036-6.
Acharya, U. R., Ng, E. Y., Tan, J. H., and Sree, S. V., Thermography based breast cancer detection using texture features and Support Vector Machine. J. Med. Syst. 36(3):1503–1510, 2012. doi:10.1007/s10916-010-9611-z.
Saritas, I., Prediction of breast cancer using artificial neural networks. J. Med. Syst. 36(5):2901–2907, 2012. doi:10.1007/s10916-011-9768-0.
Shoorehdeli, M. A., Breast cancer classification based on advanced multi dimensional fuzzy neural network. J. Med. Syst. 36(5):2713–2720, 2012. doi:10.1007/s10916-011-9747-5.
Huang, M. L., Hung, Y. H., et al., Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. J. Med. Syst. 36(2):407–414, 2012.
Chen, et al., Support vector machine based diagnostic system for breast cancer using swarm intelligence. J. Med. Syst. 36(4):2505–2519, 2012. doi:10.1007/s10916-011-9723-0.
Huang, M. L., Hung, Y. H., and Chen, W. Y., Neural network classifier with entropy based feature selection on breast cancer diagnosis. J. Med. Syst. 34(5):865–873, 2010. doi:10.1007/s10916-009-9301-x.
Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005. doi:10.1016/j.artmed.2004.07.002.
Lee, Y. J., Mangasarian, O. L., and Wolberg, W. H., Survival-time classification of breast cancer patients. Comput. Optim. Appl. 25(1–3):151–166, 2003. doi:10.1023/A:1022953004360.
Vapnik, V., The nature of statistical learning theory. Springer, New York, 1995.
Stoean, R., Stoean, C., et al., Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C. Artif. Intell. Med. 51(1):53–65, 2011.
Cristianini, N., and Taylor, J., An introduction to support vector machines. Cambridge University Press, Cambridge, UK, 2000.
Quinlan, J. R., C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, 1993.
Mazzocco, T., and Hussain, A., Novel logistic regression models to aid the diagnosis of dementia. Expert Syst. Appl. 39(3):3356–3361, 2012. doi:10.1016/j.eswa.2011.09.023.
Pradhan, B., A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51(1):350–365, 2013.
Petrović, J., Ibrić, S., Betzb, G., and Durić, Z., Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees. Int. J. Pharm. 428(1–2):57–67, 2012. doi:10.1016/j.ijpharm.2012.02.031.
Biggs, D., et al., A method of choosing multiway partitions for classification and decision trees. J. Appl. Stat. 18(1):49–62, 1991.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA, 1984.
Cios, K., and Moore, G., Uniqueness of medical data mining. Artif. Intell. Med. 26(1):1–24, 2002. doi:10.1016/S0933-3657(02)00049-0.
Szalay, A., and Gray, J., Science in an exponential world. Nature 440(1):413–414, 2006.
Acknowledgments
This research was performed under the auspices of Taiwan’s National Science Council (NSC 99-2221-E-224-033-MY2).
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Patient Facing Systems
Rights and permissions
About this article
Cite this article
Chao, CM., Yu, YW., Cheng, BW. et al. Construction the Model on the Breast Cancer Survival Analysis Use Support Vector Machine, Logistic Regression and Decision Tree. J Med Syst 38, 106 (2014). https://doi.org/10.1007/s10916-014-0106-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-014-0106-1