Deterministic Classifiers Accuracy Optimization for Cancer Microarray Data

  • Vânia Rodrigues
  • Sérgio DeusdadoEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1005)


The objective of this study was to improve classification accuracy in cancer microarray gene expression data using a collection of machine learning algorithms available in WEKA. State of the art deterministic classification methods, such as: Kernel Logistic Regression, Support Vector Machine, Stochastic Gradient Descent and Logistic Model Trees were applied on publicly available cancer microarray datasets aiming to discover regularities that provide insights to help characterization and diagnosis correctness on each cancer typology. The implemented models, relying on 10-fold cross-validation, parameterized to enhance accuracy, reached accuracy above 90%. Moreover, although the variety of methodologies, no significant statistic differences were registered between them, at significance level 0.05, confirming that all the selected methods are effective for this type of analysis.


Classification Cancer Microarray Datamining Machine learning 


  1. 1.
    Allison, D.B., Cui, X., Page, G.P., Sabripour, M.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006)CrossRefGoogle Scholar
  2. 2.
    Hoheisel, J.D.: Microarray technology: beyond transcript profiling and genotype analysis. Nat. Rev. Microbiol. 7, 200–210 (2006)CrossRefGoogle Scholar
  3. 3.
    Quackenbush, J.: Computational analysis of microarray data: computational genetics. Nat. Rev. Genet. 2, 418–427 (2001)CrossRefGoogle Scholar
  4. 4.
    Talloen, W., Göhlmann, H.: Gene Expression Studies Using Affymetrix Microarrays. Chapman and Hall/CRC (2009)Google Scholar
  5. 5.
    Illumina: Illumina Genes Expression arrays (2009)Google Scholar
  6. 6.
    Exiqon: Exiqon Genes Expression arrays (2009)Google Scholar
  7. 7.
    Zahurak, M., Parmigiani, G., Yu, W., Scharpf, R.B., Berman, D., Schaeffer, E., Shabbeer, S., Cope, L.: Pre-processing agilent microarray data. BMC Bioinform. 8, 142 (2007)CrossRefGoogle Scholar
  8. 8.
    Taqman: Taqman Genes Expression arrays (2009)Google Scholar
  9. 9.
    Castillo, D., Gálvez, J.M., Herrera, L.J., Román, B.S., Rojas, F., Rojas, I.: Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling. BMC Bioinform. 18, 506 (2017)CrossRefGoogle Scholar
  10. 10.
    Kaliyappan, K., Palanisamy, M., Govindarajan, R., Duraiyan, J.: Microarray and its applications. J. Pharm. Bioallied Sci. 4, 310 (2012)CrossRefGoogle Scholar
  11. 11.
    Raghavachari, N.: Microarray technology: basic methodology and application in clinical research for biomarker discovery in vascular diseases. In: Freeman, L.A. (ed.) Lipoproteins and Cardiovascular Disease, pp. 47–84. Humana Press, Totowa (2013)CrossRefGoogle Scholar
  12. 12.
    Scherf, U., Ross, D.T., Waltham, M., Smith, L.H., Lee, J.K., Tanabe, L., Kohn, K.W., Reinhold, W.C., Myers, T.G., Andrews, D.T., Scudiero, D.A., Eisen, M.B., Sausville, E.A., Pommier, Y., Botstein, D., Brown, P.O., Weinstein, J.N.: A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244 (2000)CrossRefGoogle Scholar
  13. 13.
    Wahba, G., Gu, C., Wang, Y., Chappell, R.: Soft classification, A.K.A. risk estimation, via penalized log likelihood and smoothing spline analysis of variance. In: Computational Learning Theory and Natural Learning Systems, pp. 133–162. MIT Press (1995)Google Scholar
  14. 14.
    Smith, B., Wang, S., Wong, A., Zhou, X.: A penalized likelihood approach to parameter estimation with integral reliability constraints. Entropy 17, 4040–4063 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT 1992, pp. 144–152. ACM Press, Pittsburgh (1992)Google Scholar
  16. 16.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  17. 17.
    Cao, J., Zhang, L., Wang, B., Li, F., Yang, J.: A fast gene selection method for multi-cancer classification using multiple support vector data description. J. Biomed. Inform. 53, 381–389 (2015)CrossRefGoogle Scholar
  18. 18.
    Glaab, E., Bacardit, J., Garibaldi, J.M., Krasnogor, N.: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS ONE 7, e39932 (2012)CrossRefGoogle Scholar
  19. 19.
    Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.): Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  20. 20.
    Polaka, I., Tom, I., Borisov, A.: Decision tree classifiers in bioinformatics. Sci. J. Riga Tech. Univ. Comput. Sci. 42, 118–123 (2010)Google Scholar
  21. 21.
    Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific, Hackensack (2015)zbMATHGoogle Scholar
  22. 22.
    Li, Y., Wang, N., Perkins, E.J., Zhang, C., Gong, P.: Identification and optimization of classifier genes from multi-class earthworm microarray dataset. PLoS ONE 5, e13715 (2010)CrossRefGoogle Scholar
  23. 23.
    Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59, 161–205 (2005)CrossRefGoogle Scholar
  24. 24.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28, 337–407 (2000)CrossRefGoogle Scholar
  25. 25.
    Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2001)CrossRefGoogle Scholar
  26. 26.
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002)CrossRefGoogle Scholar
  27. 27.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
  28. 28.
    Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015)CrossRefGoogle Scholar
  29. 29.
    Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018).
  30. 30.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Int. Biom. Soc. 33, 159–174 (1977)zbMATHGoogle Scholar
  31. 31.
    Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning. Springer, Boston (2010)zbMATHGoogle Scholar
  32. 32.
    Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I.H., Turkay, M.: Optimization based tumor classification from microarray gene expression data. PLoS ONE 6, e14579 (2011)CrossRefGoogle Scholar
  33. 33.
    Wessels, L.F.A., Reinders, M.J.T., Hart, A.A.M., Veenman, C.J., Dai, H., He, Y.D., van’t Veer, L.J.: A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics 21, 3755–3762 (2005)CrossRefGoogle Scholar
  34. 34.
    Shen, L., Tan, E.C.: Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2, 166–175 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.USAL – Universidad de SalamancaSalamancaSpain
  2. 2.CIMO – Centro de Investigação de MontanhaInstituto Politécnico de BragançaBragançaPortugal

Personalised recommendations