Journal of Medical Systems

, 38:106 | Cite as

Construction the Model on the Breast Cancer Survival Analysis Use Support Vector Machine, Logistic Regression and Decision Tree

  • Cheng-Min Chao
  • Ya-Wen YuEmail author
  • Bor-Wen Cheng
  • Yao-Lung Kuo
Patient Facing Systems
Part of the following topical collections:
  1. Patient Facing Systems


The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients’ survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90 % for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame.


Breast cancer Support vector machine Logistic regression C5.0 decision tree 10-fold cross-validation 



This research was performed under the auspices of Taiwan’s National Science Council (NSC 99-2221-E-224-033-MY2).


  1. 1.
    Fabregue, M., Bringay, S., Poncelet, P., Teisseire, M., and Orsetti, B., Mining microarray data to predict the histological grade of a breast cancer. J. Biomed. Inform. 44(1):S12–S16, 2011. doi: 10.1016/j.jbi.2011.03.002.CrossRefGoogle Scholar
  2. 2.
    Department of Health, Executive Yuan, R.O.C., 2013. Retrieved from
  3. 3.
    Hartmann, S., Reimer, T., and Gerber, B., Management of early invasive breast cancer in very young women (<35 years). Clin. Breast Cancer 11(4):196–203, 2011. doi: 10.1016/j.clbc.2011.06.001.CrossRefGoogle Scholar
  4. 4.
    Jerez-Aragonés, J. M., Gomez-Ruiz, J. A., Ramos-Jimenez, G., Munoz-Perez, J., and Alba-Conejo, E., A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27(1):45–63, 2003. doi: 10.1016/S0933-3657(02)00086-6.CrossRefGoogle Scholar
  5. 5.
    O’Malley, C. D., Le, G. M., Glaser, S. L., Shema, S. J., and West, D. W., Socioeconomic status and breast carcinoma survival in four racial/ethnic groups: A population-based study. Am. Cancer Soc. 97(5):1303–1311, 2003. doi: 10.1002/cncr.11160.Google Scholar
  6. 6.
    Nahar, J., Imam, T., Tickle, K. S., Ali, A. B. M. S., and Chen, Y.-P. P., Computational intelligence for microarray data and biomedical image analysis for the early diagnosis of breast cancer. Expert Syst. Appl. 39(16):12371–12377, 2012. doi: 10.1016/j.eswa.2012.04.045.CrossRefGoogle Scholar
  7. 7.
    Keles, A., Keles, A., and Yavuz, U., Expert system based on neuro-fuzzy rules for diagnosis breast cancer. Expert Syst. Appl. 38(5):5719–5726, 2011. doi: 10.1016/j.eswa.2010.10.061.CrossRefGoogle Scholar
  8. 8.
    Luo, S. T., and Cheng, B. W., Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J. Med. Syst. 36(2):569–577, 2012. doi: 10.1007/s10916-010-9518-8.MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fan, C.-Y., Chang, P.-C., Lin, J.-J., and Hsieh, J. C., A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl. Soft Comput. 11(1):632–644, 2011. doi: 10.1016/j.asoc.2009.12.023.CrossRefGoogle Scholar
  10. 10.
    D’Eredita, G., Giardina, C., Martellotta, M., Natale, T., and Ferrarese, F., Prognostic factors in breast cancer: the predictive value of the Nottingham Prognostic Index in patients with a long-term follow-up that were treated in a single institution. Eur. J. Cancer 37(1):591–596, 2001. doi: 10.1016/s0959-8049(00)00435-4.CrossRefGoogle Scholar
  11. 11.
    Liao, H. C., and Tsai, J. H., Data mining for DNA viruses with breast cancer, fibroadenoma, and normal mammary tissue. Appl. Math. Comput. 188(1):989–1000, 2007. doi: 10.1016/j.amc.2006.10.069.MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Chhatwal, J., Alagoz, O., Lindstrom, M. J., Kahn, C. E., Jr., Shaffer, K. A., and Burnside, E. S., A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. Am. J. Roentgenol. 192(4):1117–1127, 2009. doi: 10.2214/AJR.07.3345.CrossRefGoogle Scholar
  13. 13.
    Richards, G., Rayward-Smith, V. J., Sonksen, P. H., Carey, S., and Weng, C., Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22(3):215–231, 2001. doi: 10.1016/S0933-3657(00)00110-X.CrossRefGoogle Scholar
  14. 14.
    Pendharkar, P. C., Rodger, J. A., Yaverbaum, G., Herman, N., and Benner, M., Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst. Appl. 17(3):223–232, 1999. doi: 10.1016/S0957-4174(99)00036-6.CrossRefGoogle Scholar
  15. 15.
    Acharya, U. R., Ng, E. Y., Tan, J. H., and Sree, S. V., Thermography based breast cancer detection using texture features and Support Vector Machine. J. Med. Syst. 36(3):1503–1510, 2012. doi: 10.1007/s10916-010-9611-z.CrossRefGoogle Scholar
  16. 16.
    Saritas, I., Prediction of breast cancer using artificial neural networks. J. Med. Syst. 36(5):2901–2907, 2012. doi: 10.1007/s10916-011-9768-0.CrossRefGoogle Scholar
  17. 17.
    Shoorehdeli, M. A., Breast cancer classification based on advanced multi dimensional fuzzy neural network. J. Med. Syst. 36(5):2713–2720, 2012. doi: 10.1007/s10916-011-9747-5.CrossRefGoogle Scholar
  18. 18.
    Huang, M. L., Hung, Y. H., et al., Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. J. Med. Syst. 36(2):407–414, 2012.MathSciNetCrossRefGoogle Scholar
  19. 19.
    Chen, et al., Support vector machine based diagnostic system for breast cancer using swarm intelligence. J. Med. Syst. 36(4):2505–2519, 2012. doi: 10.1007/s10916-011-9723-0.CrossRefGoogle Scholar
  20. 20.
    Huang, M. L., Hung, Y. H., and Chen, W. Y., Neural network classifier with entropy based feature selection on breast cancer diagnosis. J. Med. Syst. 34(5):865–873, 2010. doi: 10.1007/s10916-009-9301-x.CrossRefGoogle Scholar
  21. 21.
    Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005. doi: 10.1016/j.artmed.2004.07.002.CrossRefGoogle Scholar
  22. 22.
    Lee, Y. J., Mangasarian, O. L., and Wolberg, W. H., Survival-time classification of breast cancer patients. Comput. Optim. Appl. 25(1–3):151–166, 2003. doi: 10.1023/A:1022953004360.MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Vapnik, V., The nature of statistical learning theory. Springer, New York, 1995.CrossRefzbMATHGoogle Scholar
  24. 24.
    Stoean, R., Stoean, C., et al., Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C. Artif. Intell. Med. 51(1):53–65, 2011.MathSciNetCrossRefGoogle Scholar
  25. 25.
    Cristianini, N., and Taylor, J., An introduction to support vector machines. Cambridge University Press, Cambridge, UK, 2000.Google Scholar
  26. 26.
    Quinlan, J. R., C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, 1993.Google Scholar
  27. 27.
    Mazzocco, T., and Hussain, A., Novel logistic regression models to aid the diagnosis of dementia. Expert Syst. Appl. 39(3):3356–3361, 2012. doi: 10.1016/j.eswa.2011.09.023.CrossRefGoogle Scholar
  28. 28.
    Pradhan, B., A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51(1):350–365, 2013.CrossRefGoogle Scholar
  29. 29.
    Petrović, J., Ibrić, S., Betzb, G., and Durić, Z., Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees. Int. J. Pharm. 428(1–2):57–67, 2012. doi: 10.1016/j.ijpharm.2012.02.031.CrossRefGoogle Scholar
  30. 30.
    Biggs, D., et al., A method of choosing multiway partitions for classification and decision trees. J. Appl. Stat. 18(1):49–62, 1991.CrossRefGoogle Scholar
  31. 31.
    Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA, 1984.zbMATHGoogle Scholar
  32. 32.
    Cios, K., and Moore, G., Uniqueness of medical data mining. Artif. Intell. Med. 26(1):1–24, 2002. doi: 10.1016/S0933-3657(02)00049-0.CrossRefGoogle Scholar
  33. 33.
    Szalay, A., and Gray, J., Science in an exponential world. Nature 440(1):413–414, 2006.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Cheng-Min Chao
    • 1
  • Ya-Wen Yu
    • 2
    Email author
  • Bor-Wen Cheng
    • 2
  • Yao-Lung Kuo
    • 2
  1. 1.Department of Business Administration, National Taichung University of Science and TechnologyTaichungTaiwan
  2. 2.National Yunlin University of Science and TechnologyDouliuTaiwan

Personalised recommendations