Three Transductive Set Covering Machines

Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

We propose three transductive versions of the set covering machine with data dependent rays for classification in the molecular high-throughput setting. Utilizing both labeled and unlabeled samples, these transductive classifiers can learn information from both sample types, not only from labeled ones. These transductive set covering machines are based on modified selection criteria for their ensemble members. Via counting arguments we include the unlabeled information into the base classifier selection. One of the three methods we developed, uniformly increased the classification accuracy, the other two showed mixed behaviour for all data sets. Here, we could show that only by observing the order of unlabeled samples, not distances, we were able to increase classification accuracies, making these approaches useful even when very few information is available.

Notes

Acknowledgements

This work was funded in part by a Karl-Steinbuch grant to Florian Schmid, the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; Project ID PKB-01GS08) and the framework GERONTOSYS 2 (Forschungskern SyStaR, Project ID 0315894A), and by the German Science Foundation (SFB 1074, Project Z1) to Hans A. Kestler. The responsibility for the content lies exclusively with the authors.

References

  1. Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., et al. (2002). Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1), 41–47.CrossRefGoogle Scholar
  2. Bishop, C. M. (2006). Pattern recognition and machine learning. Secaucus: Springer.MATHGoogle Scholar
  3. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont: Wadsworth.MATHGoogle Scholar
  4. Buchholz, M., Kestler, H. A., Bauer, A., B \(\ddot{\mathrm{o}}\) ck, W., Rau, B., Leder, G., et al. (2005). Specialized DNA arrays for the differentiation of pancreatic tumors. Clinical Cancer Research, 11(22), 8048–8054.Google Scholar
  5. Herbrich, R., Graepel, T., & Obermayer, K. (1999). Regression Models for Ordinal Data: A Machine Learning Approach. Technical report, TU Berlin.Google Scholar
  6. Jolliffe, I. T. (2002). Principal component analysis. New York: Springer.MATHGoogle Scholar
  7. Kestler, H. A., Lausser, L., Lindner, W., & Palm, G. (2011). On the fusion of threshold classifiers for categorization and dimensionality reduction. Computational Statistics, 26, 321–340.MathSciNetCrossRefGoogle Scholar
  8. Lausser, L., Schmid, F., & Kestler, H. A. (2011). On the utility of partially labeled data for classification of microarray data. In F. Schwenker & E. Trentin (Eds.), Partially supervised learning (pp. 96–109). Berlin: Springer.Google Scholar
  9. Marchand, M., & Taylor, J. S. (2003). The set covering machine. Journal of Machine Learning Research, 3, 723–746.MATHGoogle Scholar
  10. Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., et al. (2001). Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61(20), 7388–7393.Google Scholar
  11. Valk, P. J., Verhaak, R. G., Beijen, M. A., Erpelinck, C. A., Barjesteh van Waalwijk van Doorn-Khosrovani, S., Boer, J. M., et al. (2004). Prognostically useful gene-expression profiles in acute myeloid leukemia. New England Journal of Medicine, 16(350), 1617–1628.CrossRefGoogle Scholar
  12. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATHGoogle Scholar
  13. Weston, J., Pérez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Sch\(\ddot{\mathrm{o}}\) lkopf, B., et al. (2003). Feature selection and transduction for prediction of molecular bioactivity for drug design. Bioinformatics, 19(6), 764–771.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Florian Schmid
    • 1
  • Ludwig Lausser
    • 1
  • Hans A. Kestler
    • 1
  1. 1.Research Group Bioinformatics and Systems Biology, Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations