Journal of Heuristics

, Volume 22, Issue 2, pp 199–220 | Cite as

A fast meta-heuristic approach for the \((\alpha ,\beta )-k\)-feature set problem

  • Mateus Rocha de Paula
  • Regina Berretta
  • Pablo Moscato
Article

Abstract

The feature selection problem aims to choose a subset of a given set of features that best represents the whole in a particular aspect, preserving the original semantics of the variables on the given samples and classes. In 2004, a new approach to perform feature selection was proposed. It was based on a NP-complete combinatorial optimisation problem called (\(\alpha ,\beta \))-k-feature set problem. Although effective for many practical cases, which made the approach an important feature selection tool, the only existing solution method, proposed on the original paper, was found not to work well for several instances. Our work aims to cover this gap found on the literature, quickly obtaining high quality solutions for the instances that existing approach can not solve. This work proposes a heuristic based on the greedy randomised adaptive search procedure and tabu search to address this problem; and benchmark instances to evaluate its performance. The computational results show that our method can obtain high quality solutions for both real and the proposed artificial instances and requires only a fraction of the computational resources required by the state of the art exact and heuristic approaches which use mixed integer programming models.

Keywords

\((\upalpha , \upbeta )\)-k Feature set problem GRASP Tabu search Metaheuristics Set multi cover problem 

References

  1. Arefin, A., Inostroza-Ponta, M., Mathieson, L., Berretta, R., Moscato, P.: Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms. Algorithms and architectures for parallel processing. Springer, Berlin (2011)CrossRefGoogle Scholar
  2. Benoist, T., Estellon, B., Gardi, F., Megel, R., Nouioua, K.: Localsolver 1. x: a black-box local-search solver for 0-1 programming. 4OR 9(3), 299–316 (2011)MathSciNetCrossRefMATHGoogle Scholar
  3. Berretta, R., Mendes, A., Moscato, P.: Integer programming models and algorithms for molecular classification of cancer from microarray data. In: Proceedings of the Twenty-eighth Australasian conference on Computer Science, vol. 38, pp. 361–370. Australian Computer Society, Inc., (2005)Google Scholar
  4. Berretta, R., Mendes, A., Moscato, P.: Selection of discriminative genes in microarray experiments using mathematical programming. J. Res. Pract. Inf. Technol. 39(4), 287–299 (2007)Google Scholar
  5. Berretta, R., Costa, W., Moscato, P.: Combinatorial optimization models for finding genetic signatures from gene expression datasets. Methods Mol. Biol. 453, 363–377 (2008)CrossRefGoogle Scholar
  6. Bolón-Canedo, V.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34, 483-519. http://link.springer.com/article/10.1007/s10115-012-0487-8 (2013)
  7. Chandran, U., Ma, C., Dhir, R., Bisceglia, M., Lyons-Weiler, M., Liang, W., Michalopoulos, G., Becich, M., Monzon, F.: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1):64, doi:10.1186/1471-2407-7-64, http://www.biomedcentral.com/1471-2407/7/64 (2007)
  8. Charlesworth, J.C., Curran, J.E., Johnson, M.P., Göring, H.H.H., Dyer, T.D., Diego, V.P., Kent, J.W., Mahaney, M.C., Almasy, L., MacCluer, J.W., et al.: Transcriptomic epidemiology of smoking: the effect of smoking on gene expression in lymphocytes. BMC Med. Genomics 3(1), 29 (2010)CrossRefGoogle Scholar
  9. Cotta, C., Moscato, P.: The k-feature set problem is W-complete. J. Comput. Syst. Sci. 67(4), 686–690 (2003)MathSciNetCrossRefMATHGoogle Scholar
  10. Cotta, C., Sloper, C., Moscato, P.: Evolutionary Search of thresholds for robust feature set selection: application to the analysis of microarray data. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Machori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) Applications of Evolutionary Computing, Lecture Notes in Computer Science, vol. 3005, pp. 21–30. Springer, Berlin (2004)Google Scholar
  11. Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance, AAAI Press, pp. 37–39 (1994)Google Scholar
  12. de Rocha, Paula M., Ravetti, M.G., Berretta, R., Moscato, P.: Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PloS One 6(3), e17,481 (2011)CrossRefGoogle Scholar
  13. Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8(1), 87–102 (1992)MATHGoogle Scholar
  14. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann (1993)Google Scholar
  15. Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. Global Optim. 6(2), 109–133 (1995)MathSciNetCrossRefMATHGoogle Scholar
  16. Glover, F.: Tabu search-part I. ORSA J. Comput. 1(3), 190–206 (1989)CrossRefMATHGoogle Scholar
  17. Glover, F.: Tabu search-part II. ORSA J. Comput. 2(1), 4–32 (1990)CrossRefMATHGoogle Scholar
  18. Glover, F., Laguna, M.: Tabu Search, vol. 1. Springer, Heidelberg (1998)MATHGoogle Scholar
  19. Gómez Ravetti, M., Moscato, P.: Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9), e3111 (2008)CrossRefGoogle Scholar
  20. Gómez Ravetti, M., Berretta, R., Moscato, P.: Novel biomarkers for prostate cancer revealed by (\(\alpha \),\(\beta \))-k-feature sets. In: Foundations of Computational Intelligence, chap 7, vol. 5, pp. 149–175. Springer, Berlin (2009)Google Scholar
  21. Gómez Ravetti, M., Rosso, O.A., Berretta, R., Moscato, P.: Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PloS One 5(4), e10,153 (2010)CrossRefGoogle Scholar
  22. Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD Thesis, Department of Computer Science, University of Waikato (1999)Google Scholar
  23. Kohane, I.S., Kho, A., Butte, A.J.: Microarrays for an Integrative Genomics. The MIT Press, Cambridge (2002)Google Scholar
  24. Lesnick, T.G., Papapetropoulos, S., Mash, D.C., Ffrench-Mullen, J., Shehadeh, L., de Andrade, M., Henley, J.R., Rocca, W.A., Ahlskog, J.E., Maraganore, D.M.: A genomic pathway approach to a complex disease: Axon guidance and Parkinson disease. PLoS Genet. 3(6), e98 (2007). doi:10.1371/journal.pgen.0030098 CrossRefGoogle Scholar
  25. Lockstone, H.E., Harris, L.W., Swatton, J.E., Wayland, M.T., Holland, A.J., Bahn, S.: Gene expression profiling in the adult Down syndrome brain. Genomics 90(6):647–660, doi:10.1016/j.ygeno.2007.08.005, http://www.sciencedirect.com/science/article/pii/S0888754307002054 (2007)
  26. Mendes, A., Scott, R.J., Moscato, P.: Microarrays—identifying molecular portraits for prostate tumors with different Gleason patterns. In: Clin. Bioinf. Rev. pp. 131–151 (2008)Google Scholar
  27. Moscato, P., Mathieson, L., Mendes, A., Berretta, R.: The electronic primaries: predicting the U.S. presidency using feature selection with safe data reduction. In: ACSC ’05: Proceedings of the Twenty-eighth Australasian Conference on Computer Science, Australian Computer Society, Inc., Darlinghurst, Australia, pp. 371–379 (2005)Google Scholar
  28. Ray, S., Britschgi, M., Herbert, C., Takeda-Uchimura, Y., Boxer, A., Blennow, K., Friedman, L.F., Galasko, D.R., Jutel, M., Karydas, A., Kaye, J.A., Leszek, J., Miller, B.L., Minthon, L., Quinn, J.F., Rabinovici, G.D., Robinson, W.H., Sabbagh, M.N., So, Y.T., Sparks, D.L., Tabaton, M., Tinklenberg, J., Yesavage, J.A., Tibshirani, R., Wyss-Coray, T.: Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat. Med. 13(11), 1359–1362 (2007)CrossRefGoogle Scholar
  29. Riveros, C., Mellor, D., Gandhi, K.S., McKay, F.C., Cox, M.B., Berretta, R., Vaezpour, S.Y., Inostroza-Ponta, M., Broadley, S.A., Heard, R.N., et al.: A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PloS One 5(12), e14176 (2010)CrossRefGoogle Scholar
  30. Rosso, O.A., Mendes, A., Berretta, R., Rostas, J.A., Hunter, M., Moscato, P.: Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): a combinatorial optimization approach for electrode selection. J. Neurosci. Methods 181(2), 257–267 (2009)CrossRefGoogle Scholar
  31. Scherzer, C.R., Eklund, A.C., Morse, L.J., Liao, Z., Locascio, J.J., Fefer, D., Schwarzschild, M.A,, Schlossmacher, M.G., Hauser, M.A., Vance, J.M., Sudarsky, L.R., Standaert, D.G., Growdon, J.H., Jensen, R.V., Gullans, S.R.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3):955–960,doi:10.1073/pnas.0610204104, http://www.pnas.org/content/104/3/955.abstract (2007)
  32. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5:1205–1224, http://dl.acm.org/citation.cfm?id=1044700 (2004)

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Mateus Rocha de Paula
    • 1
  • Regina Berretta
    • 2
  • Pablo Moscato
    • 2
  1. 1.School of Mathematical and Physical SciencesThe University of NewcastleCallaghanAustralia
  2. 2.School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia

Personalised recommendations