Skip to main content

Mathematical Models of Supervised Learning and Application to Medical Diagnosis

  • Chapter
  • First Online:
Optimization and Data Analysis in Biomedical Informatics

Part of the book series: Fields Institute Communications ((FIC,volume 63))

  • 1124 Accesses

Abstract

Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVM), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVM, in conjunction with the use of kernel methods, provide non-linear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the experiments really useful and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia.

Mathematics Subject Classification (2010): Primary 68T10, Secondary 62H30

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 (1995)

    Google Scholar 

  2. T. Barrett, D.B. Troup, S.E., Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, R.N. Muertter, M. Holko, O. Ayanbule, A. Yefanov, A. Soboleva, NCBI GEO: Archive for functional genomics data sets–10 years on. Nucl. Acids Res. 39, D1005–D1010 (2011)

    Google Scholar 

  3. Parkinson et al., ArrayExpress update – an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl. Acids Res. (2010)

    Google Scholar 

  4. A. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  5. Golub et al., Molecular classifcation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Google Scholar 

  6. I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pttalunga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.P. Kallioniemi, A. Borg, J. Trent, Gene-expression profiles in hereditary breast cancer. New Engl. J. Med. 344, 539–548 (2001)

    Article  Google Scholar 

  7. D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D’Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)

    Article  Google Scholar 

  8. L.J. van’t Veer, H. Dai, M.J. Van De Vijver, T.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002)

    Google Scholar 

  9. C.L. Nutt, D.R. Mani, R.A. Betensky, P. Tamayo, J.G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M.F. McLaughlin, T.T. Batchelor, P.M. Black, A. von Deimling, S.L. Pomeroy, T.R. Golub, D.N. Louis, Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)

    Google Scholar 

  10. N. Iizuka, M. Oka, H. Yamada Okabe, M. Nishida, Y. Maeda, N. Mori, T. Takao, T. Tamesa, A. Tangoku, H. Tabuchi, K. Hamada, H. Nakayama, H. Ishitsuka, T. Miyamoto, A. Hirabayashi, S. Uchimura, Y. Hamamoto, Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet 361, 923–929 (2003)

    Article  Google Scholar 

  11. S. Baginsky, L. Henning, P. Zimmermann, W. Gruissem, Gene expression analysis, proteomics, and network discovery. Plant Physiol. 152, 402–410 (2010); American Society of Plant Biologists

    Google Scholar 

  12. V. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995)

    MATH  Google Scholar 

  13. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  14. B.E. Boser, I.M. Guyon, V.N. Vapnik, A Training Algorithm for Optimal Margin Classifiers. 5th Annual ACM Workshop on COLT, Pittsburgh, PA, 1992, pp. 144–152

    Google Scholar 

  15. O.L. Mangasarian, E.W. Wild, Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 27(12) (2005)

    Google Scholar 

  16. B. Schölop, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT, MA, 2001)

    Google Scholar 

  17. M.R. Guarracino, C. Cifarelli, O. Seref, P.M. Pardalos, A classification method based on generalized eigenvalue problems. Optim. Meth. Software 22, 73–81 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. C. Cifarelli, M.R. Guarracino, O. Seref, S. Cuciniello, P.M. Pardalos, Incremental classifcation with generalized eigenvalues. J. Class. 24(2), 205–219 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  20. E.S. Lander et al., Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  21. D. Wheeler et al., The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)

    Article  Google Scholar 

  22. Ten Years of Genetics and Genomics: What Have We Achieved and Where are We Heading? Nature Reviews Genetics, AOP, published online (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberta De Asmundis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

De Asmundis, R., Guarracino, M.R. (2013). Mathematical Models of Supervised Learning and Application to Medical Diagnosis. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_3

Download citation

Publish with us

Policies and ethics