Supervised Learning

  • Lior RokachEmail author
  • Oded Maimon


This chapter summarizes the fundamental aspects of supervised methods. The chapter provides an overview of concepts from various interrelated fields used in subsequent chapters. It presents basic definitions and arguments from the supervised machine learning literature and considers various issues, such as performance evaluation techniques and challenges for data mining tasks.

Key words

Attribute Classifier Inducer Regression Training Set Supervised Methods Instance Space Sampling Generalization Error 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier.CrossRefGoogle Scholar
  2. Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Contextsensitive medical information retrieval, The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286.Google Scholar
  3. Boutella R. M., Luob J., Shena X., Browna C. M., Learning multi-label scene classification, Pattern Recognition, 37(9), pp. 1757-1771, 2004.CrossRefGoogle Scholar
  4. Buja, A. and Lee, Y.S., Data Mining criteria for tree based regression and classification, Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, (pp 27-36), San Diego, USA, 2001.Google Scholar
  5. Clare, A., King R.D., Knowledge Discovery in Multi-label Phenotype Data, Lecture Notes in Computer Science, Vol. 2168, Springer, Berlin, 2001.CrossRefGoogle Scholar
  6. Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.CrossRefGoogle Scholar
  7. Cohen, W. W., Schapire R.E., and Singer Y., Learning to order things. Journal of Artificial Intelligence Research, 10:243270, 1999.MathSciNetGoogle Scholar
  8. Dietterich, T. G., Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7): 1895-1924, 1998.CrossRefGoogle Scholar
  9. Dietterich, T. G., Lathrop, R. H. , and Perez, T. L., Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence, 89(1-2), pp. 31-71, 1997.zbMATHCrossRefGoogle Scholar
  10. Duda, R., and Hart, P., Pattern Classification and Scene Analysis, New-York, Wiley, 1973.zbMATHGoogle Scholar
  11. Dunteman, G.H., Principal Components Analysis, Sage Publications, 1989.Google Scholar
  12. Fayyad, U., Piatesky-Shapiro, G. & Smyth P., From Data Mining to Knowledge Discovery: An Overview. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds), Advances in Knowledge Discovery and Data Mining, pp 1-30, AAAI/MIT Press, 1996.Google Scholar
  13. Friedman, J.H. & Tukey, J.W., A Projection Pursuit Algorithm for Exploratory Data Analysis, IEEE Transactions on Computers, 23: 9, 881-889, 1973.CrossRefGoogle Scholar
  14. Fukunaga, K., Introduction to Statistical Pattern Recognition. San Diego, CA: Academic, 1990.zbMATHGoogle Scholar
  15. Fürnkranz J. and Hüllermeier J., Pairwise preference learning and ranking. In Proc. ECML03, pages 145156, Cavtat, Croatia, 2003.Google Scholar
  16. Grumbach S., Milo T., Towards Tractable Algebras for Bags. Journal of Computer and System Sciences 52(3): 570-588, 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  17. Har-Peled S., Roth D., and Zimak D., Constraint classification: A new approach to multiclass classification. In Proc. ALT02, pages 365379, Lubeck, Germany, 2002, Springer.Google Scholar
  18. Hunter L., Klein T. E., Finding Relevant Biomolecular Features. ISMB 1993, pp. 190-197, 1993.Google Scholar
  19. Hwang J., Lay S., and Lippman A., Nonparametric multivariate density estimation: A comparative study, IEEE Transaction on Signal Processing, 42(10): 2795-2810, 1994.CrossRefGoogle Scholar
  20. Janikow, C.Z., Fuzzy Decision Trees: Issues and Methods, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issue 1, pp. 1-14. 1998.Google Scholar
  21. Jimenez, L. O., & Landgrebe D. A., Supervised Classification in High- Dimensional Space: Geometrical, Statistical, and Asymptotical Properties of Multivariate Data. IEEE Transaction on Systems Man, and Cybernetics - Part C: Applications and Reviews, 28:39-54, 1998.CrossRefGoogle Scholar
  22. Jin, R. , & Ghahramani Z., Learning with Multiple Labels, The Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002) Vancouver, Canada, pp. 897-904, December 9-14, 2002.Google Scholar
  23. Kim J.O. & Mueller C.W., Factor Analysis: Statistical Methods and Practical Issues. Sage Publications, 1978.Google Scholar
  24. Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.Google Scholar
  25. Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002.Google Scholar
  26. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial Intelligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005.Google Scholar
  27. Mitchell, T., Machine Learning, McGraw-Hill, 1997.Google Scholar
  28. Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008.zbMATHCrossRefMathSciNetGoogle Scholar
  29. Pfahringer, B., Controlling constructive induction in CiPF, In Bergadano, F. and De Raedt, L. (Eds.), Proceedings of the seventh European Conference on Machine Learning, pp. 242-256, Springer-Verlag, 1994.Google Scholar
  30. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, 1993.Google Scholar
  31. Ragavan, H. and Rendell, L., Look ahead feature construction for learning hard concepts. In Proceedings of the Tenth International Machine Learning Conference: pp. 252-259, Morgan Kaufman, 1993.Google Scholar
  32. Rastogi, R., and Shim, K., PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning, Data Mining and Knowledge Discovery, 4(4):315-344, 2000.zbMATHCrossRefGoogle Scholar
  33. Rokach, L., Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.CrossRefMathSciNetGoogle Scholar
  34. Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.zbMATHCrossRefGoogle Scholar
  35. Rokach L., Mining manufacturing data using genetic algorithm-based feature set decomposition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.CrossRefGoogle Scholar
  36. Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001.Google Scholar
  37. Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intelligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.Google Scholar
  38. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer.Google Scholar
  39. Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–299, 2006, Springer.CrossRefGoogle Scholar
  40. Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications,World Scientific Publishing, 2008.Google Scholar
  41. Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.Google Scholar
  42. Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer-Verlag, 2004.Google Scholar
  43. Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350.CrossRefGoogle Scholar
  44. Schapire R., Singer Y., Boostexter: a boosting-based system for text categorization, Machine Learning 39 (2/3):135168, 2000.CrossRefGoogle Scholar
  45. Schmitt , M., On the complexity of computing and learning with multiplicative neural networks, Neural Computation 14: 2, 241-301, 2002.zbMATHCrossRefGoogle Scholar
  46. Shafer, J. C., Agrawal, R. and Mehta, M. , SPRINT: A Scalable Parallel Classifier for Data Mining, Proc. 22nd Int. Conf. Very Large Databases, T. M. Vijayaraman and Alejandro P. Buchmann and C. Mohan and Nandlal L. Sarda (eds), 544-555, Morgan Kaufmann, 1996.Google Scholar
  47. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM 1984, pp. 1134-1142.Google Scholar
  48. Vapnik, V.N., The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.zbMATHGoogle Scholar
  49. Wolpert, D. H., The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In D. H. Wolpert, editor, The Mathematics of Generalization, The SFI Studies in the Sciences of Complexity, pages 117-214. AddisonWesley, 1995.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Information System EngineeringBen-Gurion UniversityBeer-ShebaIsrael
  2. 2.Department of Industrial EngineeringTel-Aviv UniversityRamat-AvivIsrael

Personalised recommendations