Progress in Artificial Intelligence

, Volume 8, Issue 1, pp 63–71 | Cite as

Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory

  • Laura Cleofas-Sánchez
  • J. Salvador SánchezEmail author
  • Vicente García
Regular Paper


In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algorithm generates the associative memory, whereas the second level picks the most relevant genes. With the purpose of illustrating the applicability and efficiency of the method proposed here, we use four different gene expression microarray databases and compare their classification performance against that of other renowned classifiers built on the whole (original) feature (gene) space. The experimental results show that the two-stage hetero-associative memory is quite competitive with standard classification models regarding the overall accuracy, sensitivity and specificity. In addition, it also produces a significant decrease in computational efforts and an increase in the biological interpretability of microarrays because worthless (irrelevant and/or redundant) genes are discarded.


Associative memory Gene selection Disease prediction Gene expression microarray 



This study was partially supported by the Valencian Council of Education, Research, Culture and Sport [PROMETEOII/2014/062], the Mexican PRODEP [DSA/103.5/15/7004], and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].


  1. 1.
    Aghajari, Z.H., Teshnehlab, M., Jahed Motlagh, M.R.: A novel chaotic hetero-associative memory. Neurocomputing 167, 352–358 (2015)Google Scholar
  2. 2.
    Aihara, K., Takabe, T., Toyoda, M.: Chaotic neural networks. Phys. Lett. A 144(6), 333–340 (1990)MathSciNetGoogle Scholar
  3. 3.
    Aldape-Pérez, M., Yáñez-Márquez, C., Camacho-Nieto, O., Argüelles-Cruz, A.J.: An associative memory approach to medical decision support systems. Comput. Methods Prog. Biomed. 106(3), 287–307 (2012)Google Scholar
  4. 4.
    Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972)zbMATHGoogle Scholar
  5. 5.
    Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE ACM Trans Comput. Biol. Bioinform. 13(5), 971–989 (2016)Google Scholar
  6. 6.
    Arya, K.V., Singh, V., Mitra, P., Gupta, P.: Face recognition using parallel associative memory. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 1332–1336 (2008)Google Scholar
  7. 7.
    Babu, M., Sarkar, K.: A comparative study of gene selection methods for cancer classification using microarray data. In: Proceedings of the 2nd International Conference on Research in Computational Intelligence and Communication Networks, Kolkata, India, pp. 204–211 (2016)Google Scholar
  8. 8.
    Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences, Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)Google Scholar
  9. 9.
    Berns, A.: Cancer: gene expression in diagnosis. Nature 403, 491–492 (2000)Google Scholar
  10. 10.
    Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)Google Scholar
  11. 11.
    Chartier, S., Lepage, R.: Learning and extracting edges from images by a modified hopfield neural network. In: Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, Canada, vol. 3, pp. 431–434 (2002)Google Scholar
  12. 12.
    Cleofas-Sánchez, L., García, V., Marqués, A., Sánchez, J.: Financial distress prediction using the hybrid associative memory with translation. Appl. Soft Comput. 44, 144–152 (2016)Google Scholar
  13. 13.
    Dougherty, E.R.: Small sample issues for microarray-based classification. Comp. Funct. Genom. 2(1), 28–34 (2001)Google Scholar
  14. 14.
    Dudoit, S., Fridlyand, J.: Classification in microarray experiments. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 93–158. Chapman & Hall/CRC Press, London (2003)Google Scholar
  15. 15.
    Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. 103(15), 5923–5928 (2006)Google Scholar
  16. 16.
    García, V., Sánchez, J.S.: Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform. Sci. 294, 362–375 (2015)MathSciNetGoogle Scholar
  17. 17.
    García, V., Sánchez, J.S., Cleofas-Sánchez, L., Ochoa-Domínguez, H.J., López-Orozco, F.: An insight on the ‘large G, small n’ problem in gene-expression microarray classification. In: Proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, Faro, Portugal, pp. 483–490 (2017)Google Scholar
  18. 18.
    Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)Google Scholar
  19. 19.
    Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015(ID 198363), 1–13 (2015)Google Scholar
  20. 20.
    Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)Google Scholar
  21. 21.
    Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. In: Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 457–464. Proceedings of the National Academy of Sciences USA, Cambridge (1988)Google Scholar
  22. 22.
    Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Towards efficient imputation by nearest-neighbors: a clustering-based approach. In: Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. 513–525 (2004)Google Scholar
  23. 23.
    Hua, J., Xiong, Z., Lowey, J., Suh, E., Dougherty, E.R.: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8), 1509–1515 (2005)Google Scholar
  24. 24.
    Irsoy, O., Yildiz, O.T., Alpaydin, E.: Design and analysis of classifier learning experiments in bioinformatics: survey and case studies. IEEE ACM Trans. Comput. Biol. 9(6), 1663–1675 (2012)Google Scholar
  25. 25.
    Japkowicz, N.: Assessment metrics for imbalanced learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 187–210. Wiley IEEE Press, New York (2013)Google Scholar
  26. 26.
    Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. C–21(4), 353–359 (1972)zbMATHGoogle Scholar
  27. 27.
    Kohonen, T.: Associative Memory. A System—Theoretical Approach. Springer, Berlin (1977)zbMATHGoogle Scholar
  28. 28.
    Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 18(1), 49–60 (1988)MathSciNetGoogle Scholar
  29. 29.
    Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V.: Machine learning in bioinformatics. Brief. Bioinform. 7(1), 86–112 (2011)Google Scholar
  30. 30.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)Google Scholar
  31. 31.
    Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive evaluation of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885 (2005)zbMATHGoogle Scholar
  32. 32.
    Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy K-means clustering method. In: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, pp. 573–579 (2004)Google Scholar
  33. 33.
    Lu, Y., Han, J.: Cancer classification using gene expression data. Inform. Syst. 28(4), 243–268 (2003)zbMATHGoogle Scholar
  34. 34.
    Ma, S., Huang, J.: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21(2), 4356–4362 (2005)Google Scholar
  35. 35.
    Mahata, P., Mahata, K.: Selecting differentially expressed genes using minimum probability of classification error. J. Biomed. Inform. 40(6), 775–786 (2007)Google Scholar
  36. 36.
    Nakano, K.: Associatron—a model on associative memory. IEEE Trans. Syst. Man Cybern. 2(3), 380–388 (1972)Google Scholar
  37. 37.
    Raspe, E., Decraene, C., Berx, G.: Gene expression profiling to dissect the complexity of cancer biology: pitfalls and promise. Semin. Cancer Biol. 22(3), 250–260 (2012)Google Scholar
  38. 38.
    Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3), 252–264 (1991)Google Scholar
  39. 39.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)Google Scholar
  40. 40.
    Sharma, N., Ray, A., Sharma, S., Shukla, K., Pradhan, S., Aggarwal, L.: Segmentation and classification of medical images using texture-primitive features: application of BAM-type artificial neural network. J. Med. Phys. 33(3), 119–126 (2008)Google Scholar
  41. 41.
    Steinbuch, K.: Die lernmatrix. Kybernetik 1(1), 36–45 (1961). In GermanzbMATHGoogle Scholar
  42. 42.
    Sudo, A., Sato, A., Hasegawa, O.: Associative memory for online learning in noisy environments using self-organizing incremental neural network. IEEE Trans. Neural Netw. 20(6), 964–972 (2009)Google Scholar
  43. 43.
    Sun, X., Liu, Y., Wei, D., Xu, M., Chen, H., Han, J.: Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J. Biomed. Inform. 46(2), 252–258 (2013)Google Scholar
  44. 44.
    Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani, V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of hopfield network. In: 2016 IEEE Annual India Conference, Bangalore, India, pp. 1–6 (2016)Google Scholar
  45. 45.
    Villuendas-Rey, Y., Rey-Benguría, C.F., Ferreira-Santiago, A., Camacho-Nieto, O., Yáñez-Márquez, C.: The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265, 105–115 (2017)Google Scholar
  46. 46.
    Weigelt, B., Baehner, F.L., Reis-Filho, J.S.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. J. Pathol. 220(2), 263–280 (2010)Google Scholar
  47. 47.
    Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 8th International Conference on Machine Learning, Williamstown, MA, pp. 601–608 (2001)Google Scholar
  48. 48.
    Yáñez-Márquez, C.: Associative memories based on order relations and binary operators. Ph.D. thesis, Centro de Investigación en Computación - Instituto Politécnico Nacional, Mexico, (In Spanish) (2002)Google Scholar
  49. 49.
    Yoon, Y., Lee, J., Park, S., Bien, S., Chung, H.C., Rha, S.Y.: Direct integration of microarrays for selecting informative genes and phenotype classification. Inf. Sci. 178(1), 88–105 (2008)Google Scholar
  50. 50.
    Zhang, Z., Zhuo, H., Liu, S., de B Harrington, P.: Classification of cancer patients based on elemental contents of serums using bidirectional associative memory networks. Anal. Chim. Acta 436(2), 281–291 (2001)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Institute of Genomic MedicineCiudad de MéxicoMexico
  2. 2.Department of Computer Languages and Systems, Institute of New Imaging TechnologiesUniversitat Jaume ICastelló de la PlanaSpain
  3. 3.Multidisciplinary University DivisionUniversidad Autónoma de Ciudad JuárezCiudad JuárezMexico

Personalised recommendations