Advertisement

A VNS-Based Heuristic for Feature Selection in Data Mining

  • A. Mucherino
  • L. Liberti
Part of the Studies in Computational Intelligence book series (SCI, volume 434)

Abstract

The selection of features that describe samples in sets of data is a typical problem in data mining. A crucial issue is to select a maximal set of pertinent features, because the scarce knowledge of the problem under study often leads to consider features which do not provide a good description of the corresponding samples. The concept of consistent biclustering of a set of data has been introduced to identify such a maximal set. The problem can be modeled as a 0–1 linear fractional program, which is NP-hard. We reformulate this optimization problem as a bilevel program, and we prove that solutions to the original problem can be found by solving the reformulated problem. We also propose a heuristic for the solution of the bilevel program, that is based on the meta-heuristic Variable Neighborhood Search (VNS). Computational experiments show that the proposed heuristic outperforms previously proposed heuristics for feature selection by consistent biclustering.

Keywords

Ovarian Cancer Data Mining Feature Selection Computational Experiment Variable Neighborhood Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belotti, P.: Couenne: a user’s manual. Technical report, Lehigh University (2009)Google Scholar
  2. 2.
    Busygin, S., Prokopyev, O.A., Pardalos, P.M.: Feature selection for consistent biclustering via fractional 0–1 programming. Journal of Combinatorial Optimization 10, 7–21 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A., Petricoin III, E.F., Ardekani, A.M.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)CrossRefGoogle Scholar
  4. 4.
    Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL: A Modeling Language for Mathematical Programming. Brooks/Cole Publishing Company, Cengage Learning (2002)Google Scholar
  5. 5.
    Hansen, P., Mladenovic, N.: Variable neighborhood search: Principles and applications. European Journal of Operational Research 130(3), 449–467 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Hartigan, J.: Clustering Algorithms. John Wiles & Sons, New York (1975)zbMATHGoogle Scholar
  7. 7.
  8. 8.
  9. 9.
    Kundakcioglu, O.E., Pardalos, P.M.: The complexity of feature selection for consistent biclustering. In: Butenko, S., Pardalos, P.M., Chaovalitwongse, W.A. (eds.) Clustering Challenges in Biological Networks. World Scientific Publishing (2009)Google Scholar
  10. 10.
    Mladenovic, M., Hansen, P.: Variable neighborhood search. Computers and Operations Research 24, 1097–1100 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Mucherino, A.: Extending the definition of β-consistent biclustering for feature selection. In: Proceedings of the Federated Conference on Computer Science and Information Systems, FedCSIS 2011. IEEE (2011)Google Scholar
  12. 12.
    Mucherino, A., Cafieri, S.: A new heuristic for feature selection by consistent biclustering. Technical Report arXiv:1003.3279v1 (March 2010)Google Scholar
  13. 13.
    Mucherino, A., Papajorgji, P., Pardalos, P.M.: Data Mining in Agriculture. Springer (2009)Google Scholar
  14. 14.
    Mucherino, A., Papajorgji, P., Pardalos, P.M.: A survey of data mining techniques applied to agriculture. Operational Research: An International Journal 9(2), 121–140 (2009)zbMATHGoogle Scholar
  15. 15.
    Mucherino, A., Urtubia, A.: Consistent biclustering and applications to agriculture. In: Proceedings of the Industrial Conference on Data Mining, ICDM 2010, Workshop on Data Mining and Agriculture DMA 2010, IbaI Conference Proceedings, pp. 105–113. Springer, Berlin (2010)Google Scholar
  16. 16.
    Mucherino, A., Urtubia, A.: Feature selection for datasets of wine fermentations. In: Proceedings of the 10th International Conference on Modeling and Applied Simulation, MAS 2011. I3A (2011)Google Scholar
  17. 17.
    Nahapatyan, A., Busygin, S., Pardalos, P.M.: An improved heuristic for consistent biclustering problems, vol. 102, pp. 185–198. SpringerGoogle Scholar
  18. 18.
    Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)Google Scholar
  19. 19.
    Sahinidis, N.V., Tawarmalani, M.: BARON 9.0.4: Global Optimization of Mixed-Integer Nonlinear Programs. User’s Manual (2010)Google Scholar
  20. 20.
    Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Mathematical Programming 103, 225–249 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J., Alon, U., Barkai, N.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  22. 22.
    Urtubia, A., Perez-Correa, J.R., Meurens, M., Agosin, E.: Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64(3), 778–784 (2004)CrossRefGoogle Scholar
  23. 23.
    Urtubia, A., Perez-Correa, J.R., Soto, A., Pszczolkowski, P.: Using data mining techniques to predict industrial wine problem fermentations. Food Control 18, 1512–1517 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.IRISA, University of RennesRennesFrance
  2. 2.LIX, École PolytechniquePalaiseauFrance

Personalised recommendations