Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

  • Hans A. Kestler
  • Wolfgang Lindner
  • André Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4087)


Microarray technologies are increasingly being used in biological and medical sciences for high throughput analyses of genetic information on the genome, transcriptome and proteome levels. The differentiation between cancerous and benign processes in the body often poses a difficult diagnostic problem in the clinical setting while being of major importance for the treatment of patients. In this situation, feature reduction techniques capable of reducing the dimensionality of data are essential for building predictive tools based on classification. We extend the set covering machine of Marchand and Shawe-Taylor to data dependent rays in order to achieve a feature reduction and direct interpretation of the found conjunctions of intervals on individual genes. We give bounds for the generalization error as a function of the amount of data compression and the number of training errors achieved during training. In experiments with artificial data and a real world data set of gene expression profiles from the pancreas we show the utility of the approach and its applicability to microarray data classification.


Feature Selection Data Compression Boolean Variable Generalization Error Compression Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Haussler, D.: Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence 36, 177–221 (1988)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Marchand, M., Shawe-Taylor, J.: The set covering machine. Journal of Machine Learning Research 3, 723–746 (2002)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Littlestone, N., Warmuth, M.: Relating data compression and learnability. Technical report, University of California, Santa Cruz (1986)Google Scholar
  4. 4.
    Marchand, M., Shah, M., Shawe-Taylor, J., Sokolova, M.: The set covering machine with data-dependent half-spaces. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML), pp. 520–527 (2003)Google Scholar
  5. 5.
    Marchand, M., Shah, M.: PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 881–888. MIT Press, Cambridge (2005)Google Scholar
  6. 6.
    McAllester, D.: Some PAC-Bayesian theorems. Machine Learning 37, 355–363 (1999)MATHCrossRefGoogle Scholar
  7. 7.
    Garey, M., Johnson, D.: Computers and Intractability – A Guide to the Theory of NP-Completeness. Freeman and Company, New York (1979)MATHGoogle Scholar
  8. 8.
    Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)Google Scholar
  9. 9.
    Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)MATHGoogle Scholar
  10. 10.
    Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36, 929–965 (1989)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Trans. on Information Theory 44, 1926–1940 (1998)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Floyd, S., Warmuth, M.: Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21, 269–304 (1995)Google Scholar
  13. 13.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006) ISBN 3-900051-07-0Google Scholar
  14. 14.
    Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-13 (2006)Google Scholar
  15. 15.
    Therneau,T.M.: port by Brian Ripley <ripley@stats.ox.ac.uk>, B.A.R.: rpart: Recursive Partitioning. R package version 3.1-27 (2005)Google Scholar
  16. 16.
    Buchholz, M., Kestler, H.A., Bauer, A., Bock, W., Rau, B., Leder, G., Kratzer, W., Bommer, M., Scarpa, A., Schilling, M., Adler, G., Hoheisel, J., Gress, T.: Specialized DNA arrays for the differentiation of pancreatic tumors. Clin. Cancer Res. 11, 8048–8054 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hans A. Kestler
    • 1
    • 2
  • Wolfgang Lindner
    • 3
  • André Müller
    • 2
  1. 1.Neural Information ProcessingUniversity of UlmUlm
  2. 2.Internal Medicine IUniversity Hospital UlmUlm
  3. 3.Theoretical Computer ScienceUniversity of UlmUlmGermany

Personalised recommendations