Abstract
Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVM), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVM, in conjunction with the use of kernel methods, provide non-linear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the experiments really useful and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia.
Mathematics Subject Classification (2010): Primary 68T10, Secondary 62H30
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 (1995)
T. Barrett, D.B. Troup, S.E., Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, R.N. Muertter, M. Holko, O. Ayanbule, A. Yefanov, A. Soboleva, NCBI GEO: Archive for functional genomics data sets–10 years on. Nucl. Acids Res. 39, D1005–D1010 (2011)
Parkinson et al., ArrayExpress update – an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucl. Acids Res. (2010)
A. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)
Golub et al., Molecular classifcation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pttalunga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.P. Kallioniemi, A. Borg, J. Trent, Gene-expression profiles in hereditary breast cancer. New Engl. J. Med. 344, 539–548 (2001)
D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D’Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
L.J. van’t Veer, H. Dai, M.J. Van De Vijver, T.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 (2002)
C.L. Nutt, D.R. Mani, R.A. Betensky, P. Tamayo, J.G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M.F. McLaughlin, T.T. Batchelor, P.M. Black, A. von Deimling, S.L. Pomeroy, T.R. Golub, D.N. Louis, Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
N. Iizuka, M. Oka, H. Yamada Okabe, M. Nishida, Y. Maeda, N. Mori, T. Takao, T. Tamesa, A. Tangoku, H. Tabuchi, K. Hamada, H. Nakayama, H. Ishitsuka, T. Miyamoto, A. Hirabayashi, S. Uchimura, Y. Hamamoto, Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet 361, 923–929 (2003)
S. Baginsky, L. Henning, P. Zimmermann, W. Gruissem, Gene expression analysis, proteomics, and network discovery. Plant Physiol. 152, 402–410 (2010); American Society of Plant Biologists
V. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995)
C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20, 273–297 (1995)
B.E. Boser, I.M. Guyon, V.N. Vapnik, A Training Algorithm for Optimal Margin Classifiers. 5th Annual ACM Workshop on COLT, Pittsburgh, PA, 1992, pp. 144–152
O.L. Mangasarian, E.W. Wild, Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 27(12) (2005)
B. Schölop, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT, MA, 2001)
M.R. Guarracino, C. Cifarelli, O. Seref, P.M. Pardalos, A classification method based on generalized eigenvalue problems. Optim. Meth. Software 22, 73–81 (2007)
C. Cifarelli, M.R. Guarracino, O. Seref, S. Cuciniello, P.M. Pardalos, Incremental classifcation with generalized eigenvalues. J. Class. 24(2), 205–219 (2007)
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
E.S. Lander et al., Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
D. Wheeler et al., The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)
Ten Years of Genetics and Genomics: What Have We Achieved and Where are We Heading? Nature Reviews Genetics, AOP, published online (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
De Asmundis, R., Guarracino, M.R. (2013). Mathematical Models of Supervised Learning and Application to Medical Diagnosis. In: Pardalos, P., Coleman, T., Xanthopoulos, P. (eds) Optimization and Data Analysis in Biomedical Informatics. Fields Institute Communications, vol 63. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4133-5_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4133-5_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4132-8
Online ISBN: 978-1-4614-4133-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)